Skip to contents

Format and curate vcf file

Usage

format_curate_vcf(
  vcf.p2f = NULL,
  matrix.gt = NULL,
  mrk.info = NULL,
  corresp.geno.name = NULL,
  p2f.export.vcf = NULL,
  IDnum = FALSE,
  remove.chrUkn = TRUE,
  check.mrk.dups = TRUE,
  remove.nonPolyMrk = TRUE,
  tresh.heterozygous = NULL,
  check.geno.dups = TRUE,
  thresh.NA.ind = 0.5,
  thresh.NA.mrk = 0.5,
  format.names = FALSE,
  concordance.function = FALSE,
  corresp.geno.list = NULL,
  imputation = NULL,
  p2f.beagle = NULL,
  thresh.MAF = NULL,
  verbose = 1
)

Arguments

vcf.p2f

path to the vcf file

matrix.gt

matrix of genotypes, must contains CHROM and POS columns, default is NULL

mrk.info

data frame with marker information pulled from the vcf, with columns CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, default is NULL. Needed if vcf.p2f is not provided and p2f.export.vcf is not null.

corresp.geno.name

table of correspondence between genotype names, with the first column being the name to be matched and the second one being the updated name.

p2f.export.vcf

path to export curated vcf file, default is NULL (no export), useful for further Beagle imputation.

IDnum

logical, remove the prefix idxx. from the genotype name, default is FALSE

remove.chrUkn

logical, remove markers from unknown chromosome, default is TRUE

check.mrk.dups

logical, check for duplicated markers, default is TRUE

remove.nonPolyMrk

logical, remove non-segregating markers, default is TRUE

tresh.heterozygous

numeric, threshold of the maximum proportion of heterozygous genotypes, default is NULL

check.geno.dups

logical, check for duplicated genotypes, default is TRUE

thresh.NA.ind

numeric, threshold of the maximum proportion missing values for individuals, default is 0.5

thresh.NA.mrk

numeric, threshold of the maximum proportion of missing values for markers, default is 0.5

format.names

logical, use a custom function to transform genotype names, default is FALSE

concordance.function

logical, use a custom function to match genotype names, default is FALSE

corresp.geno.list

named list of correspondence between genotype names, with correct spelling as name and possible synonym as character vector, default is NULL.

imputation

character, impute missing values with kNNI or Beagle or don't impute, default is NULL (no imputation)

p2f.beagle

path to export file for Beagle imputation, default is NULL

thresh.MAF

numeric, threshold of minor allele frequency, default is NULL

verbose

numeric, level of verbosity, default is 1

Value

data frame with 0/1/2 values from the curated vcf file, with genotypes in row and markers in columns.

Author

Charlotte Brault