1. 1. Introduction
2. 2. Details
1. 2.1 Command interface
2. 2.2 Application

1.  Introduction

The method proposed by Madsen and Browning 20091 first introduced the idea of assigning "weights" to rare variants within a genetic region before they are collapsed. In this case the variants having higher weights will have more substantial contribution to the collapsed variant score. In the Madsen & Browning paper the "weights" are defined as {$\sqrt{n_iq_i(1-q_i)}$} with the assumption that the "rarer" the variant, the larger the risk effect it is to a phenotype. The {$q_i$} in the original paper was based on observed control sample, which might result in inflated type I error2. Implementation of the WSS statistic in the WSSRankTest method uses the same definition for {$q_i$} but the Mann-Whitney U test (definition and C++ implementation for this program) now relies on a full permutation procedure rather than normal approximation, such that the bias is correctly accounted for.

As with the Varible Thresholds strategy, the idea of weighting can be applied to many other rare variant methods. The WeightedBurdenBt and WeightedBurdenQt methods implements the Madsen & Browning weighting based on controls (or samples with low quantitative phenotypic values) or the entire population, and tests for association for both case control and quantitative traits with/without presence of phenotype co-variates.

2.  Details

2.1  Command interface

vtools show test WSSRankTest

Name:          WSSRankTest
Description:   Weighted sum method using rank test statistic, Madsen & Browning 2009
usage: vtools associate --method WSSRankTest [-h] [--name NAME] [-q1 MAFUPPER]
[-q2 MAFLOWER]
[--alternative TAILED] [-p N]

Weighted sum method using rank test statistic, Madsen & Browning 2009. p-value
is based on the significance level of the Wilcoxon rank-sum test. Two methods
are available for evaluating p-value: a semi-asymptotic p-value based on
normal distribution, or permutation based p-value. Variants will be weighted
by 1/sqrt(nP*(1-P)) and the weighted codings will be summed up for rank test.
Two-sided test is available for the asymptotic version, which will calculate
two p-values based on weights from controls and cases respectively, and use
the smaller of them with multiple testing adjustment. For two-sided
permutation based p-value please refer to "vtools show test WeightedBurdenBt"

optional arguments:
-h, --help            show this help message and exit
--name NAME           Name of the test that will be appended to names of
output fields, usually used to differentiate output of
different tests, or the same test with different
parameters.
-q1 MAFUPPER, --mafupper MAFUPPER
Minor allele frequency upper limit. All variants
having sample MAF<=m1 will be included in analysis.
Default set to 0.01
-q2 MAFLOWER, --maflower MAFLOWER
Minor allele frequency lower limit. All variants
having sample MAF>m2 will be included in analysis.
Default set to 0.0
--alternative TAILED  Alternative hypothesis is one-sided ("1") or two-sided
("2"). Note that two-sided test is only available for
asymptotic version of the test. Default set to 1
-p N, --permutations N
Number of permutations. Set it to zero to use the
asymptotic version. Default is zero
confidence interval for binomial distribution. The
program will compute a p-value every 1000 permutations
and compare the lower bound of the 95 percent CI of
p-value against "C", and quit permutations with the
p-value if it is larger than "C". It is recommended to
specify a "C" that is slightly larger than the
significance level for the study. To disable the
adaptive procedure, set C=1. Default is C=0.1
Mode of inheritance. Will code genotypes as 0/1/2/NA
for additive mode, 0/1/NA for dominant or recessive

Example using snapshot vt_ExomeAssociation
 % vtools associate rare status -m "WSSRankTest --name wss -p 5000" --group_by name2 --to_db w\ ss -j8 > wss.txt  INFO: 3180 samples are found INFO: 2632 groups are found INFO: Starting 8 processes to load genotypes Loading genotypes: 100% [=========================================] 3,180 33.7/s in 00:01:34 Testing for association: 100% [================================================] 2,632/591 10.7/s in 00:04:06 INFO: Association tests on 2632 groups have completed. 591 failed. INFO: Using annotation DB wss in project test. INFO: Annotation database used to record results of association tests. Created on Wed, 30 Jan 2013 16:18:43  % vtools show fields | grep wss  wss.name2 name2 wss.sample_size_wss sample size wss.num_variants_wss number of variants in each group (adjusted for specified MAF wss.total_mac_wss total minor allele counts in a group (adjusted for MOI) wss.statistic_wss test statistic. wss.pvalue_wss p-value wss.std_error_wss Empirical estimate of the standard deviation of statistic wss.num_permutations_wss number of permutations at which p-value is evaluated  % head wss.txt  name2 sample_size_wss num_variants_wss total_mac_wss statistic_wss pvalue_wss std_error_wss num_permutations_wss AADACL4 3180 5 138 34206 0.911089 11215.6 1000 ABCD3 3180 3 42 12967 0.63037 6602.73 1000 ABCG5 3180 6 87 37794 0.248751 8912.03 1000 AAMP 3180 3 35 16160 0.290709 5777.64 1000 ABCB10 3180 6 122 56091 0.145854 10409.2 1000 ABHD1 3180 5 29 9825 0.605395 5363.56 1000 ABCB6 3180 7 151 49949 0.608392 11831.6 1000 ABL2 3180 4 41 16097 0.438561 6499.52 1000 ACADM 3180 4 103 19070 0.967033 9782.51 1000 
1 Bo Eskerod Madsen and Sharon R. Browning (2009) A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic. PLoS Genetics doi:10.1371/journal.pgen.1000384. http://dx.plos.org/10.1371/journal.pgen.1000384
2 Mathieu Lemire (2011) Defining rare variants by their frequencies in controls may increase type I error. Nature Genetics doi:10.1038/ng.818. http://www.nature.com/doifinder/10.1038/ng.818