After some preliminary analysis, we find one particular variant (
179248034 on chromosome 5) in gene
SQSTM1 that is likely to be associated with a phenotype. Let us try to find more information about this variant, which is in a variant table named
MyVar in our project.
Gene and mRNA
We first would like to download and use the ref gene database to our project. To learn what fields are available in this annotation database, we can use command
vtools show annotation to print the details of it.
vtools use refGene vtools show annotation refGene
The following command output the corresponding records in refGene database for this variant:
vtools output myVar chr pos ref alt refGene.name txStart txEnd cdsStart cdsEnd exonCount name2
5 179248034 C T NM_001142298 179233388 179265077 179250005 179263593 9 SQSTM1 5 179248034 C T NM_001142299 179234003 179265077 179250005 179263593 9 SQSTM1 5 179248034 C T NM_003900 179247842 179265077 179247937 179263593 8 SQSTM1
If we use the
ccdsGene database, we would get:
vtools use ccdsGene vtools output myVar chr pos ref alt ccdsGene.name ccdsGene.txStart ccdsGene.txEnd ccdsGene.cdsStart ccdsGene.cdsEnd ccdsGene.exonCount ccdsGene.name2
5 179248034 C T CCDS34317.1 179247937 179263593 179247937 179263593 8
We had to use full name
ccdsGene.txStart etc to avoid ambiguity because these fields also exist in the
The results show that
- This variant locates in consensus CDS gene CCDS34317.1, with common name
- This gene can transcribed two three mRNAs
NM_003900, which corresponds to CCDS id
CCDS4735.1(first two), and
CCDS34317.1. Searching in CCDS gene only displays
CCDS34317.1because this variant falls out of the coding region of
Now we know that this variant is within the coding region of
CCDS34317.1, is it in one of the exon region? We can use the
ccdsGene_exon database to find this out.
vtools use ccdsGene_exon vtools output MyVar chr pos ref alt ccdsGene_exon.exon_start ccdsGene_exon.exon_end
5 179248034 C T 179247937 179248141
So we know this variant stays in an exon starting from
179247937, according to here, this is the first exon of this gene.
Potential of damaging?
Because this variant belongs to one of the CCDS genes, we can find the SIFT score and PolyPhen2 (and other scores) of this variant from the dbNSFP database.
vtools use dbNSFP vtools output MyVar chr pos ref alt SIFT_score polyphen2_score
5 179248034 C T 0.49 0.893
This database also tells you how this mutation changes the aminoacid:
vtools output MyVar chr pos ref alt aaref aaalt refcodon codonpos aapos
5 179248034 C T A V GCG 2 33
So this variant mutates
T, and changes aminoacid
- Protein identifier: leave blank or use name NP_003891.1
- Protein sequence: copy and paste with leading line
>NP_003891.1or leave blank
- Position: 33
- From A -> V
Note that we can use either protein name or sequence, but not both.
The results show that
RecName: Full=Sequestosome-1; AltName: Full=EBI3-associated protein of 60 kDa; Short=EBIAP; Short=p60; AltName: Full=Phosphotyrosine-independent ligand for the Lck SH2 domain of 62 kDa; AltName: Full=Ubiquitin-binding protein p62; LENGTH: 440 AA HumDiv: (preferred model of evaluating rare alleles, dense mapping and analysis of natural selection) This mutation is predicted to be POSSIBLY DAMAGING with a score of 0.915 (sensitivity: 0.81; specificity: 0.94) HumVar: (preferred model for diagnostics of Mendelian disease) This mutation is predicted to be BENIGN with a score of 0.399 (sensitivity: 0.84; specificity: 0.78)
The PolyPhen2 score obtained from PolyPhen2 website differs from what is reported by dbNSFP, which used a previous version of PolyPhen2 to generate scores for these variants.