´ ` ´ ¨ Cedric Debes1, Minglei Wang2, Gustavo Caetano-Anolles2*, Frauke Grater1,3*
1 Heidelberg Institute for Theoretical Studies, Heidelberg, Germany, 2 Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America, 3 CAS-MPG Partner Institute and Key Laboratory for Computational Biology, Shanghai, China
Abstract
Nature has shaped the make up of proteins since their appearance, *3.8 billion years ago. However, the fundamental drivers of structural change responsible for the extraordinary diversity of proteins have yet to be elucidated. Here we explore if protein evolution affects folding speed. We estimated folding times for the present-day catalog of protein domains directly from their size-modified contact order. These values were mapped onto an evolutionary timeline of domain appearance derived from a phylogenomic analysis of protein domains in 989 fully-sequenced genomes. Our results show a clear overall increase of folding speed during evolution, with known ultra-fast downhill folders appearing rather late in the timeline. Remarkably, folding optimization depends on secondary structure. While alpha-folds showed a tendency to fold faster throughout evolution, beta-folds exhibited a trend of folding time increase during the last *1.5 billion years that began during the ‘‘big bang’’ of domain combinations. As a consequence, these domain structures are on average slow folders today. Our results suggest that fast and efficient folding of domains shaped the universe of protein structure. This finding supports the hypothesis that optimization of the kinetic and thermodynamic accessibility of the native fold reduces protein aggregation propensities that hamper cellular functions.
`s ´s ¨ Citation: Debe C, Wang M, Caetano-Anolle G, Grater F (2013) Evolutionary Optimization of Protein Folding. PLoS Comput Biol 9(1): e1002861. doi:10.1371/
References: 1. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJP, et al. (2008) Data growth and its impact on the scop database: new developments. Nucleic Acids Res 36: D419–D425. 2. Qiu L, Pabit SA, Roitberg AE, Hagen SJ (2002) Smaller and faster: the 20residue trp-cage protein folds in 4 micros. J Am Chem Soc 124: 12952–12953. 3. Goldberg ME, Semisotnov GV, Friguet B, Kuwajima K, Ptitsyn OB, et al. (1990) An early immunoreactive folding intermediate of the tryptophan synthase 2 subunit is a molten globule. FEBS Letters 263: 51–56. 4. Matagne A, Chung EW, Ball LJ, Radford SE, Robinson CV, et al. (1998) The origin of the alphadomain intermediate in the folding of hen lysozyme. J Mol Biol 277: 997–1005. 5. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol 14: 70–75. 6. Levinthal C (1969) How to fold graciously. In: Debrunnder JTP, Munck E, editors. Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. University of Illinois Press. pp. 22–24. 7. Nlting B, Schlike W, Hampel P, Grundig F, Gantert S, et al. (2003) Structural determinants of the rate of protein folding. J Theor Biol 223: 299–307. 8. Thirumalai D, Klimov DK (1999) Emergence of stable and fast folding protein structures. Technical Report cond-mat/9910248. 9. Govindarajan S, Recabarren R, Goldstein RA (1999) Estimating the total number of protein folds. Proteins 35: 408–414. 10. Cossio P, Trovato A, Pietrucci F, Seno F, Maritan A, et al. (2010) Exploring the universe of protein structures beyond the protein data bank. PLoS Comput Biol 6: e1000957. 11. Mirny LA, Shakhnovich EI (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291: 177–196. 12. Xia Y, Levitt M (2004) Simulating protein evolution in sequence and structure space. Curr Opin Struct Biol 14: 202–207. 13. Ortiz AR, Skolnick J (2000) Sequence evolution and the mechanism of protein folding. Biophys J 79: 1787–1799. 14. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) Scop: a structural classi_cation of proteins database for the investigation of sequences and structures. J Mol Biol 247: 536–540. 15. Caetano-Anolls G, Caetano-Anolls D (2003) An evolutionarily structured universe of protein architecture. Genome Res 13: 1563–1571. 16. Caetano-Anolls G, Caetano-Anolls D (2005) Universal sharing patterns in proteomes and evolution of protein fold architecture and life. J Mol Evol 60: 484–498. 17. Wang M, Jiang YY, Kim KM, Qu G, Ji HF, et al. (2011) A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol 28: 567–582. 18. Caetano-Anolls G, Kim KM, Caetano-Anolls D (2012) Erratum to: The phylogenomic roots of modern biochemistry: Origins of proteins, cofactors and protein biosynthesis. J Mol Evol. Epub ahead of print. 19. Wang M, Caetano-Anolls G (2009) The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 17: 66–78. 20. Bowman GR, Voelz VA, Pande VS (2011) Taming the complexity of protein folding. Current Opinion in Structural Biology 21: 4–11. 21. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. Science 334: 517–520. 22. Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277: 985–994. 23. Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, et al. (2003) Contact order revisited: inuence of protein size on the folding rate. Protein Sci 12: 2057–2062. 24. Bogatyreva NS, Osypov AA, Ivankov DN (2009) Kineticdb: a database of protein folding kinetics. Nucleic Acids Res 37: D342–D346. 25. Ouyang Z, Liang J (2008) Predicting protein folding rates from geometric contact and amino acid sequence. Protein Sci 17: 1256–1263. 26. Vendruscolo M, Dokholyan NV, Paci E, Karplus M (2002) Small-world view of the amino acids that play a key role in protein folding. Phys Rev E Stat Nonlin Soft Matter Phys 65: 061910. 27. Kubelka J, Hofrichter J, Eaton WA (2004) The protein folding ‘speed limit’. Curr Opin Struct Biol 14: 76–88. 28. Sancho DD, Doshi U, Muoz V (2009) Protein folding rates and stability: how much is there beyond size? J Am Chem Soc 131: 2074–2075. 29. Portman JJ (2010) Cooperativity and protein folding rates. Curr Opin Struct Biol 20: 11–15. 30. Cieplak M, Xuan Hoang T (2000) Scaling of folding properties in go models of proteins. Journal of Biological Physics 26: 273–294. 31. Felice FGD, Vieira MNN, Meirelles MNL, Morozova-Roche LA, Dobson CM, et al. (2004) Formation of amyloid aggregates from human lysozyme and its disease-associated variants using hydrostatic pressure. FASEB J 18: 1099–1101. 32. Tanzi RE, Bertram L (2005) Twenty years of the alzheimer’s disease amyloid hypothesis: a genetic perspective. Cell 120: 545–555. 33. Ross CA, Poirier MA (2004) Protein aggregation and neurodegenerative disease. Nat Med 10 Suppl: S10–S17. 34. Monsellier E, Chiti F (2007) Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep 8: 737–742. 35. Ramanathan A, Agarwal PK (2011) Evolutionarily conserved linkage between enzyme fold, exibility, and catalysis. PLoS Biol 9: e1001193. 36. Hagen SJ, Hofrichter J, Szabo A, Eaton WA (1996) Diffusion-limited contact formation in unfolded cytochrome c: estimating the maximum rate of protein folding. Proc Natl Acad Sci U S A 93: 11615–11617. 37. Jaenicke R (1991) Protein stability and molecular adaptation to extreme conditions. Eur J Biochem 202: 715–728. 38. Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J (2007) The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 8: 319–330. PLOS Computational Biology | www.ploscompbiol.org 8 January 2013 | Volume 9 | Issue 1 | e1002861 Evolutionary Optimization of Protein Folding 39. Pauwels K, Molle IV, Tommassen J, Gelder PV (2007) Chaperoning anfinsen: the steric foldases. Mol Microbiol 64: 917–922. 40. Bogumil D, Landan G, Ilhan J, Dagan T (2012) Chaperones divide yeast proteins into classes of expression level and evolutionary rate. Genome Biol Evol 4: 618–625. 41. Vendruscolo M (2012) Proteome folding and aggregation. Curr Opin Struct Biol 22: 138–143. 42. Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, et al. (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 4: 805–809. 43. Li L, Shakhnovich EI (2001) Different circular permutations produced different folding nuclei in proteins: a computational study. J Mol Biol 306: 121–132. 44. Jung J, Lee B (2001) Circularly permuted proteins in the protein structure database. Protein Sci 10: 1881–1886. 45. Bliven S, Prli A (2012) Circular permutation in proteins. PLoS Comput Biol 8: e1002445. 46. Coles M, Hulko M, Djuranovic S, Truffault V, Koretke K, et al. (2006) Common evolutionary origin of swapped-hairpin and double-psi beta barrels. Structure 14: 1489–1498. 47. Wolf YI, Grishin NV, Koonin EV (2000) Estimating the number of protein folds and families from complete genome data. J Mol Biol 299: 897–905. 48. Muoz V, Serrano L (1996) Local versus nonlocal interactions in protein folding and stability an experimentalist’s point of view. Folding and Design 1: R71– R77. 49. Kim KM, Caetano-Anolls G (2012) The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. BMC Evol Biol 12: 13. 50. Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. J Mol Biol 313: 903–919. 51. Swofford DL (2003) PAUP* Phylogenetic Analysis Using Parsimony (*and Other Methods) Version 4.04beta. Sunderland, Massachusetts: Sinauer Associates. 52. Shank EA, Cecconi C, Dill JW, Marqusee S, Bustamante C (2010) The folding cooperativity of a protein is controlled by its chain topology. Nature 465: 637– 640. 53. Wang G, Dunbrack RL (2005) Pisces: recent improvements to a pdb sequence culling server. Nucleic Acids Res 33: W94–W98. 54. Cleveland WS (1981) Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician 35: p. 54. 55. Cleveland WS, Devlin SJ, Wagenaar JB (1988) Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83: 596–610. 56. Bairoch A, Apweiler R (1999) The swiss-prot protein sequence data bank and its supplement tremble in 1999. Nucleic Acids Res 27: 49–54. 57. Shi Y, Zhou J, Arndt D, Wishart DS, Lin G (2008) Protein contact order prediction from primary sequences. BMC Bioinformatics 9: 255. PLOS Computational Biology | www.ploscompbiol.org 9 January 2013 | Volume 9 | Issue 1 | e1002861