ATAV (Association Tests for Annotated Variants)

Whole genome/exome association analysis toolset for annotated variants in next-generation sequencing studies

Introduction | Definitions | Availability | Analysis on Single Variants | Analysis on Group of Variants | Population Stratification | Trios Analysis | Linkage Analysis
  1. Introduction
  2. Definitions
  3. Availability
  4. Analysis on Single Variants
  5. Analysis on Single Variants
  6. Analysis on Group of Variants
  7. Population Stratification
  8. Trios Analysis
  9. Linkage Analysis
  10. Citations

ATAV Input Parameters of Logistic/Linear Regression


  • --memory {50000}:assign 50000 MB (around 50 GB) memory to run ATAV. The necessary memory size is dependent on the your dataset. You need to assign more than 50 GB memory to run a whole genome dataset, and about 30 GB memory to run a whole exome dataset.

  • --project {$PROJECT.gsap}: specify an SVA project gasp file, with gsap.sva and gsap.sva.data files in the same folder; $PROJECT.gsap is the SVA filename.

  • --single-variant: single variant tests for incorporating covariates. Note: if without any covariates specified, ATAV would run a Fisher's exact test.

  • --covar {$COV_FILE} [optional]: specify a covariate file. The "$COV_FILE can be a regular covariate file in flat text format (space-delimited, tab-delimited, or mixed both space and tab delimited), or a .evec file if no other covariates are included. When a .evec file is specified, users may specify the number of eigen axes to be included by using a parameter "--ncov $N" where the default of $N is 3. Note: (1) in a covariate file, the first column should be subject IDs. From the second column, they should be covariates; (2) adding a number sign ""#" before the header line if there is a header line in the covariate file.

  • --nov {3} [optional]: number of eigenvectors/covariates from a covariate file to be included in multivariate regression.

  • --linear [optional]: linear regression for continuous traits; the default value is logistic regression (for dichotomous traits).

  • --min-variant-present {1} [optional]: consider variants only if observed n or more times in either heterozygotes or homozygotes; the default value is 1 (all variants are considered).

  • --min-coverage {3} [optional]: specify a minimum coverage (read depth); the default value is 3.

  • --out {foldername & fileroot} [optional]: specify output foldername and output root filename; the default value is the project name. A combined result is saved into the output path. And ATAV generates one sub-folder for each genetic model (allelic/dominant/recessive/genotypic) of the Fisher's test to save results. There is a full-list result in the sub-folder of every genetic model. It includes all of variants in all functional categories of the variant type (SNV/INDEL/SV). There is a grand sub-folder named as "functions" to save results that are associated with different functional categories.

  • --exclude-male-het [optional]: when "--exclude-male-het" is specified, variants on sex chromosomes that have one or more male(s) with heterozygous mutations will be excluded. By default, these variants are included but the questionable males are set to missing.

  • --ctrlMAF {0.05} [optional]: specify a maximum variant allele frequency in controls; the default value is 0.05. For example, if one user specifies "--ctrlMAF 0.05", ATAV will load variants that their frequencies are either <= 0.05 or >= 0.95. This is for loading rare variants and calculation of significance threshold for rare variants.

  • --snvFunctionList { STOP_GAINED, STOP_LOST,ESSENTIAL_SPLICE_SITE, NON_SYNONYMOUS_CODING} [optional]

  • : specify snv functional list, using comma (,) to separate them (NOTE: don't add blank after comma); the default value is STOP_GAINED, STOP_LOST, ESSENTIAL_SPLICE_SITE, NON_SYNONYMOUS_CODING. The available snv functional list are in the following: STOP_GAINED, STOP_LOST, FRAMESHIFT_CODING, NON_SYNONYMOUS_CODING, ESSENTIAL_SPLICE_SITE, SPLICE_SITE, REGULATION_REGION, INTRONIC_EXON_BOUNDARY, 5PRIME_UTR, 3PRIME_UTR, EXONIC_NON_CODING_RNA, UPSTREAM, DOWNSTREAM, INTRONIC, SYNONYMOUS_CODING, INTERGENIC, REFERENCE, CANNOT_ANNOTATE.

  • --indelFunctionList {CODING_DISRUPTED_FRAMESHIFT,CODING_DISRUPTED_OTHER} [optional]: specify indel functional list, using comma (,) to separate them (NOTE: don't add blank after comma); the default value is CODING_DISRUPTED_FRAMESHIFT, CODING_DISRUPTED_OTHER. The available indel functional list are in the following: CODING_DISRUPTED_FRAMESHIFT, CODING_DISRUPTED_OTHER, TRANSCRIPT_INCLUDED, 5PRIME_UTR, 3PRIME_UTR, INTRONIC_EXON_BOUNDARY, UPSTREAM, DOWNSTREAM, INTRONIC, INTERGENIC, CANNOT_ANNOTATE, SPLICE_SITE.

  • --region [optional]: if a region is specified, e.g. "--region chr22:17257787-19792353", the combined results of a Fisher's exact test will only output results for the specified region.

  • --without-non-carrier [optional]: use a flag "--without-non-carrier" plus other parameters to run analysis for data sets without non-carrier information.

  • --exclude-tolerant [optional]: use a flag "--exclude-tolerant" to exclude NON_SYNONYMOUS_CODING variants that are predicted as ""tolerant".

  • --threshold-sort [optional]: specify a threshold of p-values for sorting in single variant analysis; the default value is 0.0001. It means variants with p-values less than 0.0001 will be generated in a separate file sorted by their p-values.

  • --ctrl-maf-recessive [optional]: minor allele frequency in controls for recessive model.

  • --ctrl-mhgf-recessive [optional]: minor homozygous genotype frequency in controls for recessive model. If ctrl-maf-recessive=0.15 and ctrl-mhgf-recessive=0.05, then a variant would be removed if it has either ctrlVAF > ctrlMAF (0.15) or ctrlVHGF > ctrlMHGF (0.05).