Setup prediction scenarios

Usage

setup_scenarios(
  myPheno,
  scenario,
  envs.train = NULL,
  envs.pred = NULL,
  ignore.genos = NULL,
  traits = NULL,
  genos = NULL,
  prop.CVS1 = 0.8
)

Arguments

myPheno: data frame containing at least the following columns: gid (genotype identifier), env, value (trait phenotype)
scenario: one of knEnv, knLoc.knYr, knLoc.nYr, nLoc.knYr, nLoc.nYr. knEnv is known environment, including a sample of observations from the testing environment, as well as all other environments, knLoc.knYr is known location, known year, knLoc.nYr is known location / unknown year, nLoc.knYr is unknown location, known year, nLoc.nYr is unknown location / unknown year.
envs.train: (optional) restriction of environments to be in training set, if not provided all environments are considered
envs.pred: (optional) restriction of environments to be in testing set, if not provided all environments are considered
ignore.genos: (optional) character vector of genotypes to exclude from testing set, while including them in training set, typically check genotypes
traits: (optional) character vector for trait column(s)
genos: (optional) character vector of restricted genotype list, typically genotyped lines.
prop.CVS1: numeric, proportion of genotypes in testing set for scenario knEnv, default is 0.8.

Value

list with one lement per testing (environment) set. Within each testing set, list of training environment, testing environment, genotypes in training and testing sets, and phenotypic data in training and testing sets.

Author

Charlotte Brault