## 1.  Introduction

This is implementation of the fixed threshold collapsing methods for both disease and quantitative traits. Collapsing method for rare variants treats a genetic region as a test unit; based on observed genotype it assigns a numeric coding to the region {$X$}:{$$X = I_{(0,N)}(\sum_i^N X_i)$$} i.e., the observed genotype will be coded as {$1$} if there exists at least one mutation, and {$0$} otherwise. This coding theme has been used in Li and Leal 20081 and Bhatia et al 20102.

Advantages in using collapsing methods instead of aggregation methods is in its robustness to LD of multiple rare variants in the region under investigation, which would potentially inflate type I error. However under additive assumptions of genetic effects, collapsing methods may be less powerful than aggregation methods.

Our program implements the collapsing coding in a logistic regression framework for disease traits analysis (case control data) as CollapseBt method, and a linear regression framework for quantitative traits analysis as CollapseQt method. {$p$} value for collapsing method is based on asymptotic normal distribution of the Wald statistic in generalized linear models. One could incorporate a number of phenotype covariates in collapsing tests and evaluate the significance of the genetics component.

### 1.1  Adjust for missing genotypes

If the pattern of missing genotypes is not random in sample (e.g., missing ratio in cases is different from in controls), then type I error can be inflated. For small proportion of missing data, this issue can be alleviated using methods proposed by Auer et al 20133, which is implemented as an option --NA_adjust.

## 2.  Details

### 2.1  Command interface

% vtools show test CollapseBt
Name:          CollapseBt
Description:   Collapsing method for disease traits, Li & Leal 2008
usage: vtools associate --method CollapseBt [-h] [--name NAME]
[--mafupper MAFUPPER]
[--alternative TAILED]

Fixed threshold collapsing method for disease traits (Li & Leal 2008). p-value
is based on the significance level of the regression coefficient for
genotypes. If --group_by option is specified, variants within a group will be
collapsed into a single binary coding using an indicator function (coding will
be "1" if ANY locus in the group has the alternative allele, "0" otherwise)

optional arguments:
-h, --help            show this help message and exit
--name NAME           Name of the test that will be appended to names of
output fields, usually used to differentiate output of
different tests, or the same test with different
parameters.
--mafupper MAFUPPER   Minor allele frequency upper limit. All variants
having sample MAF<=m1 will be included in analysis.
Default set to 0.01
--alternative TAILED  Alternative hypothesis is one-sided ("1") or two-sided
("2"). Default set to 1
--NA_adjust           This option, if evoked, will replace missing genotype
values with a score relative to sample allele
frequencies. The association test will be adjusted to
incorporate the information. This is an effective
approach to control for type I error due to
differential degrees of missing genotypes among
samples.
Mode of inheritance. Will code genotypes as 0/1/2/NA
for additive mode, 0/1/NA for dominant or recessive
% vtools show test CollapseQt
Name:          CollapseQt
Description:   Collapsing method for quantitative traits, Li & Leal 2008
usage: vtools associate --method CollapseQt [-h] [--name NAME]
[--mafupper MAFUPPER]
[--alternative TAILED]

Fixed threshold collapsing method for quantitative traits (Li & Leal 2008).
p-value is based on the significance level of the regression coefficient for
genotypes. If --group_by option is specified, variants within a group will be
collapsed into a single binary coding using an indicator function (coding will
be "1" if ANY locus in the group has the alternative allele, "0" otherwise)

optional arguments:
-h, --help            show this help message and exit
--name NAME           Name of the test that will be appended to names of
output fields, usually used to differentiate output of
different tests, or the same test with different
parameters.
--mafupper MAFUPPER   Minor allele frequency upper limit. All variants
having sample MAF<=m1 will be included in analysis.
Default set to 0.01
--alternative TAILED  Alternative hypothesis is one-sided ("1") or two-sided
("2"). Default set to 1
--NA_adjust           This option, if evoked, will replace missing genotype
values with a score relative to sample allele
frequencies. The association test will be adjusted to
incorporate the information. This is an effective
approach to control for type I error due to
differential degrees of missing genotypes among
samples.
Mode of inheritance. Will code genotypes as 0/1/2/NA
for additive mode, 0/1/NA for dominant or recessive