**On this page...** (hide)

- 1. Introduction
- 2. Details
- 2.1 Command interface
- 2.2 Application

## 1. Introduction

This is implementation for the KBAC statistic in Liu and Leal 2010^{1}. It carries out case-control association testing for rare variants for whole exome association studies. Briefly, consider a gene of length n which harbors m rare variants. Genotype on the m variant sites & the disease status (case/control) are known for each individual. The program takes as input the m-site genotype and disease status (case/control) data files, and computes a p-value indicating the significance of association. Permutation has to be used to obtain valid p-values.

An R package is also available for use with standalone text dataset.

Note a couple of differences between this implementation and the original version:

- The original paper provides 3 kernel options: hypergeometric, binomial and Gaussian kernels. The hypergeometric kernel generally performs best and is implemented. Other kernels are not implemented.
- The
`--alternative 2`

option implements the spirit of the RBT test^{2}by performing two KBAC tests under both*protective*and*deleterious*assumptions and use the larger of the two statistics thus calculated as the final KBAC statistic.

## 2. Details

### 2.1 Command interface

vtools show test KBAC

Name: KBAC Description: Kernel Based Adaptive Clustering method, Liu & Leal 2010 usage: vtools associate --method KBAC [-h] [--name NAME] [-q1 MAFUPPER] [-q2 MAFLOWER] [--alternative TAILED] [-p N] [--adaptive C] [--moi {additive,dominant,recessive}] Kernel Based Adaptive Clustering method, Liu & Leal 2010. Genotype pattern frequencies, weighted by a hypergeometric density kernel function, is compared for differences between cases and controls. p-value is calculated using permutation for consistent estimate with different sample sizes (the approximation method of the original publication is not implemented). Two- sided KBAC test is implemented by calculating a second statistic with case/ctrl label swapped, and the larger of the two statistic is used as two- sided test statistic optional arguments: -h, --help show this help message and exit --name NAME Name of the test that will be appended to names of output fields, usually used to differentiate output of different tests, or the same test with different parameters. -q1 MAFUPPER, --mafupper MAFUPPER Minor allele frequency upper limit. All variants having sample MAF<=m1 will be included in analysis. Default set to 0.01 -q2 MAFLOWER, --maflower MAFLOWER Minor allele frequency lower limit. All variants having sample MAF>m2 will be included in analysis. Default set to 0.0 --alternative TAILED Alternative hypothesis is one-sided ("1") or two-sided ("2"). Default set to 1 -p N, --permutations N Number of permutations --adaptive C Adaptive permutation using Edwin Wilson 95 percent confidence interval for binomial distribution. The program will compute a p-value every 1000 permutations and compare the lower bound of the 95 percent CI of p-value against "C", and quit permutations with the p-value if it is larger than "C". It is recommended to specify a "C" that is slightly larger than the significance level for the study. To disable the adaptive procedure, set C=1. Default is C=0.1 --moi {additive,dominant,recessive} Mode of inheritance. Will code genotypes as 0/1/2/NA for additive mode, 0/1/NA for dominant or recessive model. Default set to additive

### 2.2 Application

Example using **snapshot** `vt_ExomeAssociation`

▸

vtools associate rare status -m "KBAC --name kbac -p 5000" --group_by refGene.name2 --to_db\ kbac -j8 > kbac.txt INFO: 3180 samples are found INFO: 2632 groups are found INFO: Starting 8 processes to load genotypes Loading genotypes: 100% [=====================] 3,180 34.4/s in 00:01:32 Testing for association: 100% [=====================] 2,632/591 18.9/s in 00:02:19 INFO: Association tests on 2632 groups have completed. 591 failed. INFO: Using annotation DB kbac in project test. INFO: Annotation database used to record results of association tests. Created on Wed, 30 Jan 2013 05:26:43 vtools show fields | grep kbac kbac.refGene_name2 refGene_name2 kbac.sample_size_kbac sample size kbac.num_variants_kbac number of variants in each group (adjusted for specified MAF kbac.total_mac_kbac total minor allele counts in a group (adjusted for MOI) kbac.statistic_kbac test statistic. kbac.pvalue_kbac p-value kbac.std_error_kbac Empirical estimate of the standard deviation of statistic kbac.num_permutations_kbac number of permutations at which p-value is evaluated head kbac.txt refGene_name2 sample_size_kbac num_variants_kbac total_mac_kbac statistic_kbac pvalue_kbac std_error_kbac num_permutations_kbac ABCG5 3180 6 87 0.00610092 0.353646 0.00629806 1000 ABCB6 3180 7 151 0.00375831 0.633367 0.00807416 1000 ABCB10 3180 6 122 0.0157014 0.0973805 0.00733189 5000 ABCG8 3180 12 152 -0.00160383 0.876124 0.00861691 1000 ABCA4 3180 43 492 0.0293608 0.387612 0.0142427 1000 ABHD1 3180 5 29 -0.000709548 0.732268 0.00400521 1000 ABCA12 3180 28 312 0.015846 0.509491 0.011858 1000 ABL2 3180 4 41 0.000628395 0.553447 0.00456862 1000 ACADL 3180 5 65 0.00239811 0.501499 0.00545028 1000 |

^{1} Dajiang J. Liu and Suzanne M. Leal (2010) **A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions**. *PLoS Genetics* doi:`10.1371/journal.pgen.1001156`

. http://dx.plos.org/10.1371/journal.pgen.1001156 ⇑

^{2} Iuliana Ionita-Laza, Joseph D. Buxbaum, Nan M. Laird and Christoph Lange (2011) **A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease**. *PLoS Genetics* doi:`10.1371/journal.pgen.1001289`

. http://dx.plos.org/10.1371/journal.pgen.1001289 ⇑