Top-Rated Free Essay
Preview

Microbiology

Satisfactory Essays
384 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Microbiology
Open Reading Frames * (ORFs) are regions with no stop codons. All genes reside in long open reading frames * Note that stop codons in other reading frames have no effect on the gene. * Can be searched in the genome sequence. Valid only for prokaryotes and lower eukaryotes.

Protein Sequencing vs. DNA sequencing * We compare protein sequences, not DNA, because protein is more conserved in evolution than DNA * The organism’s survival depends on the protein being functional, which means having the proper amino acids sequence * Since the genetic code is degenerate, many different DNA sequences will give identical proteins. * The protein 3-dimensional structure is even more conserved, because it is more closely related to enzyme activity than the amino acid sequence is.
BLAST
* standard sequence alignment tool (BLAST = Basic Local Alignment Search Tool) * BLAST is based on the concept that if you compare the same (that is, homologous) protein from many different species, you can see that some amino acids readily substitute for each other and others almost never do. * Results are arranged with the best ones on top * The most important score is the Expect value, or E-value, which can be defined as the number of hits any random sequence (with the same length as yours) would have in the database. * E-values for good hits are usually written something like: 3e-42, which is the same as 3 x 10-42 , a very small number * Bad hits are very common, and they have e-values in a more familiar form: for example, 0.004 or 1.2 * A really good e-values is less than 1e-180, which underflows the computer’s processing capabilities, so it written as 0.0 * E-values are affected by the length of the query sequence as well as the size of the database, so even perfect matches with short sequences give poor e-values * Before we can conclude that our protein is a homologue of the proteins BLAST matches it with, we would like them to have roughly the same length and have a high percentage of identical amino acids. * the lengths of the query and subject sequences should be within 20% of each other * There should be at least 30% identical amino acids * In this case we can be quite sure we have a good match

You May Also Find These Documents Helpful

Related Topics