How variants are imported and stored in variant tools

variant tools import different types of variants as follows:

TypeReferenceAlternativeImported Variant(s)Note
SNVAGA,G 
 TCTGC,Gpos + 1
DeletionTCTC,-pos + 1
 TCGTGC,-pos + 1
 TCGCTCGC,-pos + 2, *
 TC- or .TC,-Not VCF compatible
InsertionTCGTCAG-,Apos + 2
 TCTCA-,Apos + 2
 - or .A-,Anot VCF compatible
MNPAAATAACA,TAACpos + 1
 TACTTCTAACT,CTApos + 1
MixedAC,GA,C A,GTwo single nucleotide variants
 TCTCGG,T-,GG C,-A deletion and an insertion

Note that

  1. - or . are treated as missing allele and can be used to import indels.
  2. When reference and alternative variants have common leading alleles, variant positions are adjusted. For example, 10, ACG, A will be imported as variant CG,- at position 11. The Common ending alleles are also removed. We remove common leading alleles greedily to avoid ambiguity. For example, deletion TCGC->GC (case * in the table) can be intepretted as a deletion of GC at pos + 2 and CG at pos + 1, variant tools uses the first interpretation.
  3. When there are multiple alternative variants, they are treated as multiple variants. If a sample with two alternative variants are imported, the sample will have other type for this variant, in contrast to homozygote (two identical alternative variant) and heterozygote (one reference and one alternative variant.
  4. Although indels could be imported, annotation database for these kinds of variants are, essentially, non-exist at this time. Using command vtools import --format ANNOVAR to import annotation for ANNOVAR might be a good choice.