Advanced Search
Home
Statistics
Advanced Comparison
Useful Links
Help
Contact Us

MTB - PCDB Help / FAQs

 

 

1. Why use MTB-PCDB?
2. Is the database organism specific or not?
3. How can I use MTB-PCDB?
4. What can I do with MTB-PCDB?
5. What type of sequences does MTB-PCDB align?
6. Is this database applicable for proteome comparison?
7. Is there any limitation in size of strains that can be compared in MTB-PCDB?
8. How many strains of Mycobacterium tuberculosis are available which are completely sequenced (whose genome sequence is complete)?
9. Are these different completely sequenced genome strains differ from each-other in any context and how?
10. What are the other recent tools available for whole Proteome and Genome Comparison?
11. What are the advantages of our tool in comparison to other available tools?
12. Which algorithm is used in MTB-PCDB for comparison between the Query and the Comparison strains?
13. What is StandAlone WWW BLAST Server and Why use StandAlone BLAST?
14. How does MTB-PCDB works?
15. What are the different parameters found in MTB-PCDB?
16. What does the terms Identity, Similarity, Gaps, Query coverage, Query Length, and Subject Length means? How the scores like Score, Bits and E-Value are generated in MTB-PCDB by using the above terms?
17. What is the difference between "Search", "Advanced Search" and "Advanced Comparison"?

 

 

1. Why use MTB-PCDB?

Multiple comparisons of protein sequences are important for studying sequences. The basic information they provide is identification of conserved sequence regions which is very useful in designing experiments to predict the function and structure of proteins. Proteomics also plays an important role in the detailed understanding of the role of proteins in health and disease. This database introduces the proteomics field, illustrating the link to the similarity and dissimilarity found while comparing different completely sequenced genome strains of Mycobacterium tuberculosis. This is very useful in designing further experiments in research field.

2. Is the database organism specific or not?

Yes, this database is organism-specific, it only includes the details about the strains of the bacterium species, Mycobacterium tuberculosis, having available fully sequenced genome. Mycobacterium Tuberculosis is the causative agent of tuberculosis and is a chronic infectious disease in humans with a growing incidence worldwide.

3. How can I use MTB-PCDB?

MTB-PCDB consists of many links such as, Home, Statistics, Advanced Search, Advanced Comparison, Tools and Analysis, etc. By using various links, we can get detailed information about the use of this tool. The homepage of this tool itself contains the brief idea about the species Mycobacterium Tuberculosis along with the image.

4. What can I do with MTB-PCDB?

MTB-PCDB is a very useful and user-friendly proteome comparison tool, which is mainly used for comparison of various strains of Mycobacterium tuberculosis on basis of their similarity and dissimilarity. The user can compare pair wise strains or multiple strains along with each-other according to the need. One can view the comparison just by selecting the different parameters available in the database, such as, E-Value, Bits, Score, Query coverage, Sequence length, Query length, Identities, Similarities, etc.

5. What type of sequences does MTB-PCDB align?

It can align only protein Sequences and not nucleotide sequences.

6. Is this database applicable for proteome comparison?

Yes, this database is applicable for whole proteome comparison.

7. Is there any limitation in size of strains that can be compared in MTB-PCDB?

No, there is no limitation in strains size that to be compared in MTB-PCDB as it compares the whole proteome and proteome is the full complement of proteins produced by a particular genome.

8. How many strains of Mycobacterium tuberculosis are available which are completely sequenced (whose genome sequence is complete)?

There are mainly five strains available till date in NCBI, whose complete  genome sequences are available and they are:- Mycobacterium tuberculosis H37Rv, Mycobacterium tuberculosis H37Ra, Mycobacterium tuberculosis CDC1551, Mycobacterium tuberculosis F11, and, Mycobacterium tuberculosis KZN 1435.

9. Are these different completely sequenced genome strains differ from each-other in any context and how?

Yes, the strains are phenotypically and genotypically different from each-other. Also they vary in their virulence power in infecting human body. Mycobacterium tuberculosis also exhibit very little phenotypic variation in immunologic and virulence factors. The genotypic variation includes factors such as Single-nucleotide polymorphism (SNP) or Mutation, Indels, Repeats, found in them. The virulence power of different strains can be compared in terms of their growth rates in the livers and lungs of human, their ability to cause lung pathology, and the time taken for them to cause death.

10. What are the other recent tools available for whole Proteome and Genome Comparison?

Recent tools available for proteome and genome comparison are:- ABWGAT tool, GenomeVISTA tool, LAGAN, PiPMaker, PROCOM, MGDD, JCVI, GenoMycDB, etc.

11. What are the advantages of our tool in comparison to other available tools?

Our tool is very user-friendly and also much different from other available tools, as except JCVI tool, in no other tools the measure of similarity and dissimilarity can be found. By studying the percent similarity and dissimilarity between the strains, it would be able to find out the percent variations found between them and the reasons for causing the variations and to know what makes the virulence power different between the strains in causing disease in humans in further research studies. It is different from  JCVI tool, as many different parameters are used in our tool on basis of which the user can view many comparisons output in a while, according to the need, which is not available in any other tool. This makes MTB-PCDB more advantageous.  

12. Which algorithm is used in MTB-PCDB for comparison between the Query and the Comparison strains?

Stand-alone BLAST algorithm is used here for comparison between the query Strain and comparison strains. By using this algorithm, pairwise alignment between each sequence of different strains is also done by displaying the positions of gaps, identities, positivities found in them in the consensus sequence and calculating the score for E-Value, bits, score, etc, for each alignment.

13. What is StandAlone WWW BLAST Server and Why use StandAlone BLAST?

The StandAlone WWW BLAST Server allows you to set up your own in-house version of the NCBI BLAST Web pages. This can be accessed through web browsers on intranet web servers. Stand-alone BLAST is very useful as it allows creation of custom databases. It also increases computational efficiency and increases specificity of results. It facilitates high-throughput analyses and can automate searches.

14. How does MTB-PCDB works?

This database is really going to be very helpful for the users to get a general idea about different available completely sequenced strains of Mycobacterium tuberculosis. In this database, there are many links which are very useful. When one clicks the link "Statistics", it shows the overall statistics about the strains comparison. The genome size, total number of genes, total number of proteins, and total number of identities found, comparing different strains with each-other at different ranges, all datas are available at this link.
Then by using the links, "search", "advanced search" and, "advanced comparison", one can compare two or more strains among each other and can view the result as how much these strains are similar or different from each-other or where, at which position, the similarity or difference is found and how many gaps are added to make the alignment better. In “advanced comparison” it is so well-structured that one can view the number of identities between the strains, between any ranges.
"Useful Links" option shows all the links to other available tools that also compares whole proteome of different organisms. By clicking on each individual link, one would be able to differentiate our tool from others and to view how our tool is beneficial for proteome comparison.

15. What are the different parameters found in MTB-PCDB?

The parameters found are to be set by the user only during the search. The user can choose one or all the parameters at a time according to the need. The parameters available are:-  identities, similarities, query_coverage, query_length, subject_length, Bits, Score and E-Value.

16. What does the terms Identity, Similarity, Gaps, Query coverage, Query Length, and Subject Length means? How the scores like Score, Bits and E-Value are generated in MTB-PCDB by using the above terms?

For each match, mismatch and gap, there is a score assigned to them by using PAM, BLOSUM Scoring Matrix.
Similarity is the extent to which nucleotide or protein sequences are related.
Identity is the extent to which two nucleotide or amino acid sequences are invariant.
Gaps are a space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another.
Query_coverage is the measure of how much the query sequence is aligned with comparison sequence.
Query_length is the measure of the total length of the query sequence.
Subject_length is the measure of the total length of the subject sequence.
The Raw score is the score of an alignment S, calculated as the sum of match, mismatch and gap scores. Gap scores are typically calculated as the sum of G, the gap opening penalty and L, the gap extension penalty. The formula for Raw Score(S):-


S = sum (match) + sum (mismatch) - sum (gap costs)

Bit Score is the value derived from the raw alignment score (S), in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.  The formulae for calculation of Bit Score(S'):-


S' = λS-log(K)/log2

Here, "λ" and "K" are Karlin-Altschul statistical parameters used in calculating BLAST scores that can be thought of as a natural scale for scoring system. The value of lambda and K is used in converting a raw score (S) to a bit score (S'), where λ = 0.347 and K = 0.711 and value of Log2 = 0.3010.
E-Value is the Expectation value.  The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. The formula for calculation of E-Value (E):-


E = mn2-S

Here, "S" is the alignment or raw score,
"m" is the query sequence length, and,
"n" is the length of database repository.

17. What is the difference between "Search", "Advanced Search" and "Advanced Comparison"?

In the "Search" option, you can select any two strains at a time and can view the comparison on basis of either identities or similarities. The result page  shows the total numbers of comparisons found and also shows detailed information about them, i.e., for example:- suppose one chooses "MTB H37Rv vs MTB H37Ra"  where similarities <  99, then the output will display '56 items found' and all the comparison ids, having similarities < 99 along with their %similarities, query coverage, bits score and e-value. At the top of left side, there is an option "sort by", by selecting any one parameters among six, one can view the comparison in a arranged manner. Then there is another option "show" besides "sort by", which by selecting the correct number, one can view as many output as one wants to see at a time in a single page. Then by clicking on any comparison ID, it shows all the properties of the two strains. It gives details about the strains Protein name, Protein length, Locus, Strand, Start, End, Accession number, Gene ID. It also shows the whole alignment between the sequences, and also shows the consensus sequence present between the query sequence and the subject sequence along with describing the identities, similarities and gaps found in them.

In the "Advanced Search" option, its output is as same as "Search", the only difference is that, in "advanced search", one can select any or all of the six parameters at  a time to view the comparison, for example, if a user wants to view the comparison among two strains by only means of their 'Bit Score' value, then he can select the option and can write the value he exactly wants to see in the space provided, and the output will show all the comparisons available with the specified value.
In both the "Search" and "Advanced Search" option, one can only view pairwise alignment among the strains. By using the option, "Advanced Comparison", one can compare all the strains. The user have to select one reference strain and can select one or all the four comparison strains to compare with the reference strain and then can select any one or all the three parameters available at the web page to view the comparison between any range. Here, the output page only shows the total number of comparisons found for the selected parameters and display only the names of the proteins of each strain comprising same identities, similarities or query coverage being selected.

 

 
Copyright © BIC,JBTDRC 2010. All Rights Reserved

Bioinformatics Centre
Jamnalal Bajaj Tropical Disease Research Centre
Mahatma Gandhi Institute of Medical Sciences, Sevagram - 442 102
 
| JBTDRC |
| MGIMS |