The default version of our dbSNP annotation is currently referring to dbSNP138 (using hg19 coordinates) as shown below. However, users can also retrieve older versions of dbSNP: db135, dbSNP129, dbSNP130, dbSNP131 and dbSNP132. The 129 and 130 versions use hg18 as a reference genome and 131, 132, 135 and later use hg19. The archived versions can be used by a variant tools project by referring to their specific names - for example: dbSNP-hg18_129.
- dbSNP 138 has many more flags and fields than previous versions. It also does not contain all variants that are defined in dbSNP 135 and earlier.
- A dbSNP entry might match multiple variants. For example,
% vtools show annotation dbSNP -v2
Annotation database dbSNP (version hg19_138) Description: dbSNP version 138, created using vcf file downloaded from NCBI Database type: variant Reference genome hg19: chr, pos, ref, alt chr pos name DB SNP ID (rsname) ref Reference allele (as on the + strand) alt Alternative allele (as on the + strand) FILTER Inconsistent Genotype Submission For At Least One Sample RS dbSNP ID (i.e. rs number) RSPOS Chr position reported in dbSNP RV RS orientation is reversed VP Variation Property. Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_la test.pdf GENEINFO Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|) dbSNPBuildID First dbSNP Build for RS SAO Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both SSR Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other WGT Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more VC Variation Class PM_flag Variant is Precious(Clinical,Pubmed Cited) TPA_flag Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will give phenotype data) PMC_flag Links exist to PubMed Central article S3D_flag Has 3D structure - SNP3D table SLO_flag Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out NSF_flag Has non-synonymous frameshift A coding region variation where one allele in the set changes all downstream amino acids. FxnClass = 44 NSM_flag Has non-synonymous missense A coding region variation where one allele in the set changes protein peptide. FxnClass = 42 NSN_flag Has non-synonymous nonsense A coding region variation where one allele in the set changes to STOP codon (TER). FxnClass = 41 REF_flag_flag Has reference A coding region variation where one allele in the set is identical to the reference sequence. FxnCode = 8 SYN_flag Has synonymous A coding region variation where one allele in the set does not change the encoded amino acid. FxnCode = 3 U3_flag In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53 U5_flag In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55 ASS_flag In acceptor splice site FxnCode = 73 DSS_flag In donor splice-site FxnCode = 75 INT_flag In Intron FxnCode = 6 R3_flag In 3' gene region FxnCode = 13 R5_flag In 5' gene region FxnCode = 15 OTH_flag Has other variant with exactly the same set of mapped positions on NCBI refernce assembly. CFL_flag Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies. ASP_flag Is Assembly specific. This is set if the variant only maps to one assembly MUT_flag Is mutation (journal citation, explicit fact): a low frequency variation that is cited in journal and other reputable sources VLD_flag Is Validated. This bit is set if the variant has 2+ minor allele count based on frequency or genotype data. G5A_flag >5% minor allele frequency in each and all populations G5_flag >5% minor allele frequency in 1+ populations HD_flag Marker is on high density genotyping kit (50K density or greater). The variant may have phenotype associations present in dbGaP. GNO_flag Genotypes available. The variant has individual genotype (in SubInd table). KGValidated_flag 1000 Genome validated KGPhase1_flag 1000 Genome phase 1 (incl. June Interim phase 1) KGPilot123_flag 1000 Genome discovery all pilots 2010(1,2,3) KGPROD_flag Has 1000 Genome submission OTHERKG_flag non-1000 Genome submission PH3_flag HAP_MAP Phase 3 genotyped: filtered, non-redundant CDA_flag Variation is interrogated in a clinical diagnostic assay LSD_flag Submitted from a locus-specific database MTP_flag Microattribution/third-party annotation(TPA:GWAS,PAGE) OM_flag Has OMIM/OMIA NOC_flag Contig allele not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation. WTD_flag Is Withdrawn by submitter If one member ss is withdrawn by submitter, then this bit is set. If all member ss' are withdrawn, then the rs is deleted to SNPHistory NOV_flag Rs cluster has non-overlapping allele sets. True when rs set has more than 2 alleles from different submissions and these sets share no alleles in common. CAF An ordered, comma delimited list of allele frequencies based on 1000Genomes, starting with the reference allele followed by alternate alleles as ordered in the ALT column. Where a 1000Genomes alternate allele is not in the dbSNPs alternate allele set, the allele is added to the ALT column. The minor allele is the second largest value in the list, and was previuosly reported in VCF as the GMAF. This is the GMAF reported on the RefSNP and EntrezSNP pages and VariationReporter COMMON RS is a common SNP. A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more founders contribute to that minor allele frequency. CLNHGVS Variant names from HGVS. The order of these variants corresponds to the order of the info in the other clinical INFO tags. CLNALLE Variant alleles from REF or ALT columns. 0 is REF, 1 is the first ALT allele, etc. This is used to match alleles with other corresponding clinical (CLN) INFO tags. A value of -1 indicates that no allele was found to match a corresponding HGVS allele name. CLNSRC Variant Clinical Chanels CLNORIGIN Allele Origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other CLNSRCID Variant Clinical Channel IDs CLNSIG Variant Clinical Significance, 0 - unknown, 1 - untested, 2 - non-pathogenic, 3 - probable-non- pathogenic, 4 - probable-pathogenic, 5 - pathogenic, 6 - drug-response, 7 - histocompatibility, 255 - other CLNDSDB Variant disease database name CLNDSDBID Variant disease database ID CLNDBN Variant disease name CLNACC Variant Accession and Versions
version 135 and earlier
vtools show annotation dbSNP-hg19_135 -v2
Annotation database dbSNP (version hg19_137) Description: dbSNP version 137 Database type: variant Number of records: 58,008,911 Number of distinct variants: 56,738,705 Reference genome hg19: ['chr', 'start', 'refNCBI', 'alt'] Field: chr Type: string Missing entries: 0 Unique Entries: 93 Field: start Type: integer Comment: start position in chrom (1-based) Missing entries: 0 Unique Entries: 46,982,076 Range: 55 - 249239663 Field: end Type: integer Comment: end position in chrom (1-based). start=end means zero-length feature Missing entries: 0 Unique Entries: 46,756,124 Range: 55 - 249239663 Field: name Type: string Comment: dbSNP reference SNP identifier Missing entries: 0 Unique Entries: 53,109,372 Field: strand Type: string Comment: which DNA strand contains the observed alleles Missing entries: 0 Unique Entries: 2 Field: refNCBI Type: string Comment: Reference genomic sequence from dbSNP Missing entries: 0 Unique Entries: 165,544 Field: refUCSC Type: string Comment: Reference genomic sequence from UCSC lookup of chrom,chromStart,chromEnd Missing entries: 0 Unique Entries: 160,817 Field: observed Type: string Comment: Strand-specific observed alleles Missing entries: 0 Unique Entries: 250,691 Field: alt Type: string Comment: alternate allele on the '+' strand Missing entries: 0 Unique Entries: 112,097 Field: molType Type: string Comment: sample type, can be one of unknown, genomic or cDNA Missing entries: 0 Unique Entries: 3 Field: class Type: string Comment: Class of variant (single, in-del, het, named, mixed, insertion, deletion etc Missing entries: 0 Unique Entries: 9 Field: valid Type: string Comment: validation status, can be unknown, by-cluster, by-frequency, by-submitter, by-2hit-2allele, by-hapmap, and by- 1000genomes Missing entries: 0 Unique Entries: 62 Field: avHet Type: float Comment: Average heterozygosity from all observations Missing entries: 0 Unique Entries: 158,839 Range: 0 - 0.904364 Field: avHetSE Type: float Comment: Standard error for the average heterozygosity Missing entries: 0 Unique Entries: 106,224 Range: 0 - 0.305748 Field: func Type: string Comment: Functional cetegory of the SNP (coding-synon, coding-nonsynon, intron, etc.) Missing entries: 0 Unique Entries: 445 Field: locType Type: string Comment: Type of mapping inferred from size on reference. Missing entries: 0 Unique Entries: 7