ATAV (Association Tests for Annotated Variants)

Whole genome/exome association analysis toolset for annotated variants in next-generation sequencing studies

Introduction | Definitions | Availability | Analysis on Single Variants | Analysis on Group of Variants | Population Stratification | Trios Analysis | Linkage Analysis
  1. Introduction
  2. Definitions
  3. Availability
  4. Analysis on Single Variants
  5. Analysis on Single Variants
  6. Analysis on Group of Variants
  7. Population Stratification
  8. Trios Analysis
  9. Linkage Analysis
  10. Citations

ATAV Input Parameters of Fisher's Exact Tests


  1. --memory {50000}:assign 50000 MB (around 50 GB) memory to run ATAV. The necessary memory size is dependent on your dataset. You need to assign more than 50 GB memory to run a whole genome dataset, and about 30 GB memory to run a whole exome dataset.

  2. --project {$PROJECT.gsap}: specify an SVA project gasp file, with gsap.sva and gsap.sva.data files in the same folder; $PROJECT.gsap is the SVA filename.

  3. --fisher: a Fisher's exact test with allelic, dominant, recessive, and genotypic models will be performed.

  4. --fisher-allelic [optional]: a Fisher's exact test with allelic model only.

  5. --fisher-dom [optional]: a Fisher's exact test with dominant model only.

  6. --fisher-rec [optional]: a Fisher's exact test with recessive model only.

  7. --fisher-gen [optional]: a Fisher's exact test with genotypic model only.

  8. --min-variant-present {1} [optional]: consider variants only if observed n or more times in either heterozygotes or homozygotes; the default value is 1 (all variants are considered).

  9. --min-coverage {3} [optional]: specify a minimum coverage (read depth); the default value is 3.

  10. --out {foldername & fileroot} [optional]: specify output foldername and output root filename; the default value is the project name. A combined result is saved into the output path. And ATAV generates one sub-folder for each genetic model (allelic/ dominant/recessive/genotypic) of the Fisher's test to save results. There is a full-list result for each genetic model in the sub-folder of every genetic model. It includes all of variants in all functional categories of the variant type (SNV/INDEL). There is a grand sub-folder named as "functions" to save results that are associated with different functional categories.

  11. --exclude-male-het [optional]: when "--exclude-male-het" is specified, variants on sex chromosomes that have one or more male(s) with heterozygous mutations will be excluded. By default, these variants are included but the questionable males are set to missing.

  12. --ctrlMAF {0.05} [optional]: specify a maximum variant allele frequency in controls; the default value is 0.05. For example, if one user specifies "--ctrlMAF 0.05", ATAV will load variants that their frequencies are either <= 0.05 or >= 0.95. This is for loading rare variants and calculation of significance threshold for rare variants.

  13. --snvFunctionList { STOP_GAINED, STOP_LOST,ESSENTIAL_SPLICE_SITE, NON_SYNONYMOUS_CODING} [optional]

  14. : specify snv functional list, using comma (,) to separate them (NOTE: don't add blank after comma); the default value is STOP_GAINED, STOP_LOST, ESSENTIAL_SPLICE_SITE, NON_SYNONYMOUS_CODING. The available snv functional list are in the following: STOP_GAINED, STOP_LOST, FRAMESHIFT_CODING, NON_SYNONYMOUS_CODING, ESSENTIAL_SPLICE_SITE, SPLICE_SITE, REGULATION_REGION, INTRONIC_EXON_BOUNDARY, 5PRIME_UTR, 3PRIME_UTR, EXONIC_NON_CODING_RNA, UPSTREAM, DOWNSTREAM, INTRONIC, SYNONYMOUS_CODING, INTERGENIC, REFERENCE, CANNOT_ANNOTATE.

  15. --indelFunctionList {CODING_DISRUPTED_FRAMESHIFT,CODING_DISRUPTED_OTHER} [optional]: specify indel functional list, using comma (,) to separate them (NOTE: don't add blank after comma); the default value is CODING_DISRUPTED_FRAMESHIFT, CODING_DISRUPTED_OTHER. The available indel functional list are in the following: CODING_DISRUPTED_FRAMESHIFT, CODING_DISRUPTED_OTHER, TRANSCRIPT_INCLUDED, 5PRIME_UTR, 3PRIME_UTR, INTRONIC_EXON_BOUNDARY, UPSTREAM, DOWNSTREAM, INTRONIC, INTERGENIC, CANNOT_ANNOTATE, SPLICE_SITE.

  16. --region [optional]: if a region is specified, e.g. "--region chr22:17257787-19792353", the combined results of a Fisher's exact test will only output results for the specified region.

  17. --without-non-carrier [optional]: use a flag "--without-non-carrier" plus other parameters to run analysis for data sets without non-carrier information.

  18. --exclude-tolerant [optional]: use a flag "--exclude-tolerant" to exclude NON_SYNONYMOUS_CODING variants that are predicted as "tolerant".

  19. --threshold-sort [optional]: specify a threshold of p-values for sorting in single variant analysis; the default value is 0.0001. It means variants with p-values less than 0.0001 will be generated in a separate file sorted by their p-values.

  20. --ctrl-maf-recessive [optional]: minor allele frequency in controls for recessive model.

  21. --ctrl-mhgf-recessive [optional]: minor homozygous genotype frequency in controls for recessive model. If ctrl-maf-recessive=0.15 and ctrl-mhgf-recessive=0.05, then a variant would be removed if it has either ctrlVAF > ctrlMAF (0.15) or ctrlVHGF > ctrlMHGF (0.05).