Format and curate vcf file
format_curate_vcf.Rd
Format and curate vcf file
Usage
format_curate_vcf(
vcf.p2f = NULL,
matrix.gt = NULL,
mrk.info = NULL,
corresp.geno.name = NULL,
p2f.export.vcf = NULL,
IDnum = FALSE,
remove.chrUkn = TRUE,
check.mrk.dups = TRUE,
remove.nonPolyMrk = TRUE,
tresh.heterozygous = NULL,
check.geno.dups = TRUE,
thresh.NA.ind = 0.5,
thresh.NA.mrk = 0.5,
format.names = FALSE,
concordance.function = FALSE,
corresp.geno.list = NULL,
imputation = NULL,
p2f.beagle = NULL,
thresh.MAF = NULL,
verbose = 1
)
Arguments
- vcf.p2f
path to the vcf file
- matrix.gt
matrix of genotypes, must contains CHROM and POS columns, default is NULL
- mrk.info
data frame with marker information pulled from the vcf, with columns CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, default is NULL. Needed if
vcf.p2f
is not provided andp2f.export.vcf
is not null.- corresp.geno.name
table of correspondence between genotype names, with the first column being the name to be matched and the second one being the updated name.
- p2f.export.vcf
path to export curated vcf file, default is NULL (no export), useful for further Beagle imputation.
- IDnum
logical, remove the prefix idxx. from the genotype name, default is FALSE
- remove.chrUkn
logical, remove markers from unknown chromosome, default is TRUE
- check.mrk.dups
logical, check for duplicated markers, default is TRUE
- remove.nonPolyMrk
logical, remove non-segregating markers, default is TRUE
- tresh.heterozygous
numeric, threshold of the maximum proportion of heterozygous genotypes, default is NULL
- check.geno.dups
logical, check for duplicated genotypes, default is TRUE
- thresh.NA.ind
numeric, threshold of the maximum proportion missing values for individuals, default is 0.5
- thresh.NA.mrk
numeric, threshold of the maximum proportion of missing values for markers, default is 0.5
- format.names
logical, use a custom function to transform genotype names, default is FALSE
- concordance.function
logical, use a custom function to match genotype names, default is FALSE
- corresp.geno.list
named list of correspondence between genotype names, with correct spelling as name and possible synonym as character vector, default is NULL.
- imputation
character, impute missing values with kNNI or Beagle or don't impute, default is NULL (no imputation)
- p2f.beagle
path to export file for Beagle imputation, default is NULL
- thresh.MAF
numeric, threshold of minor allele frequency, default is NULL
- verbose
numeric, level of verbosity, default is 1