msa project

Topics: DNA, Sequence alignment, Bioinformatics Pages: 14 (2861 words) Published: June 1, 2014
COBALT, webPRANK, DbClustal

Kamer Burak İŞÇİ*
Dept. of Molecular Biology and Genetics
Izmir Institute of Technology
Izmir, Turkey

Dept. of Molecular Biology and Genetics
Izmir Institute of Technology
Izmir, Turkey

Dept. of Molecular Biology and Genetics
Izmir Institute of Technology
Izmir, Turkey

Abstract—Multiple sequence alignment tools provide opportunities to identify sequence similarities of two and more biological sequences such as DNA, RNA or proteins. Wide range of MSA tools help to get any needed information and compare them to obtain results with precision as much as possible. This study aims to inform about general working principles of three multiple sequence alignment tools; COBALT, webPRANK and DbClustal and compare their results internally also with each other. Index Terms—COBALT, webPRANK, DbClustal

Sequence alignment of two or more biological sequences, which may belong to protein, DNA or RNA is called multiple sequence alignment (MSA) [1]. Generally multiple sequence alignment is used to identify evolutionary relationship by shares of lineages and descending to common ancestor. Thus, computational algorithms are used to produce and analyze the alignments.

Most MSA tools use heuristic methods rather than global optimization because of computationally expensiveness of describing the optimal alignment between more than a few sequences of moderate length. There are two main approaches to MSA, which include progressive and iterative. Progressive multiple alignment method begins with a sequence and progressively aligns the others one by one creating a distance matrix and guide tree from the matrices, which is used to determine the next sequence to be added to the alignment. Progressive MSA is a faster approach when compared to pair-wise alignment to multiple sequences, which could be very slow for a few sequences [2]. COBALT

One of the latest algorithms to be announced is COBALT (constraint-based alignment tool). COBALT permits the user to enter constraints, which the user can directly identify. And also the user can ask COBALT to provide the constraints, which is using sequence similarity, CDD searches and PROSITE (protein-motif database) pattern searches. Besides, COBALT will alternatively form partial profiles based on any CDD (conserved domain database) search result [3]. Additionally, CDD also contains standby information, which allows forming partial profiles for input sequences before the initiation of progressive alignment. This situation provides computationally cheaper procedures for building profiles.

As we searched that COBALT has a general framework by using progressive multiple alignments, in order to incorporate pairwise constraints from different sources into a multiple alignment. COBALT is used only for high scoring consistent subset, which also can be called consistent of set of constraints in case all of the constraints in the set could be concurrently fit a multiple alignment [4]. COBALT uses an all-vs.-all collection of pairwise constraints to show each group of conserved columns. These columns may include gaps. However, sequences that contain gaps in a conserved column don’t join in pairwise constraints for that column. Thus, these conserved columns are used for most profile-profile alignments. COBALT finds pairwise constraints reproduced from database search, combines these found pairwise constraints and incorporates them into progressive multiple alignment.

Researchers showed constraints derived from the CDD and PROSITE are used in order to improve COBALT’s alignment quality. And also they found out that COBALT has reasonable runtime performance and alignment accuracy. The alignments reported by different alignment algorithms vary significantly that means the importance of conception [5]. The runtime...

References: Budd, Aidan (10 February 2009). "Multiple sequence alignment exercises and demonstrations". European Molecular Biology Laboratory. Retrieved June 30, 2010.
Mount DM. (2004). Bioinformatics: Sequence and Genome Analysis 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
Papadopolous, J. S. and Agarwala, R. (2007) COBALT: a constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9): 1073-1079.
Zhang, X and Kahveci, T(2006).ANewApproach forAlignment of multiple proteins. Pac. Symp. Biocomput., 11: 339350.
Ogden,T.H. and Rosenberg, M.S. (2006) Multiple sequence alignment accuracy and phylogenetic inference. Systematic Biol., 55, 314–328.
[1] Bahr,A. et al. (2001) BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res., 29, 323–326.
[2] Kececioglu,J.D. and Starrett,D. (2004) Aligning alignments exactly. In Proceedings of the 8th ACM Conference Research in Computational Molecular Biology, pp. 85–96.
[4] Loytynoja A, Goldman N. Webprank: A phylogenyaware multiple sequence aligner with interactive alignment browser. BMC Bioinformatics, 2010, 11(1): 579.
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Abandoned Construction Project Essay
  • Project Pickings Essay
  • Prioritizing Projects at D.D. Williamson Essay
  • The Role of Project Management in Today Research Paper
  • Project and Investigatory Projects Essay
  • Project Monitoring Essay
  • Presenting the Project Essay
  • Project Management and Investigatory Project Essay

Become a StudyMode Member

Sign Up - It's Free