Format phenotypic data from GrainGenes (Excel tables)
format_phenot.Rd
Load Excel files containing phenotypic data from GrainGenes, from multiple locations and years. Combine them into one data frame and separate genotype information with phenotypic data.
Arguments
- p2d
path to directory where tables are saved
- years
numeric vector of years to look for
- locs
character vector of tab names to look for (including Entry) or to location names to identify the trial. If several names are corresponding to one trial, repeat the different versions in the vector and add the final name as vector name.
- traits
character vector of trait names to look for. If several names are corresponding to one trait, name the different versions and use as vector name the sought version. for example: traits=c("VSK","Heading","FDK"); names(traits)=c("VSK","HD","VSK")
- cols2rem
character vector of column names to remove, to avoid bad matching
- distMatchTrait
numeric value, distance for string matching. Default is 8. Increased distance would lead to more matching and is more prone to errors.
Value
list of 4 components:
var.match.info: data frame of variable matching
sheet.match.info: data frame of sheet matching (finding the relevant tabs)
phenot: data frame of combined phenotypic data for all years and locations
entry.info: data frame of combined genotype information