Protein classification

Proteins today are very diverse and ordering them into a taxonomy based on natural descent (homology), akin to the taxonomy of organisms, would be of great benefit, as homology offers a rich source of structural, functional and mechanistic information. Current classification schemes often include analogous traits and lack hierarchical depth. It is tempting, but unfortunately not possible, to extend the taxonomy of organisms to their constituent proteins because proteins, or rather the genes encoding them, are no more tied to an organism than organisms are tied to an ecosystem. Thus, many genetic events (displacement, duplication, lateral transfer) cause protein phylogenies to differ from those of their parent organisms.

We pursue a classification approach based on natural descent, proceeding from structural and sequence similarities. Challenges arise in detecting these similarities and interpreting them phylogenetically. There are further difficulties in coordinating taxonomic levels (subfamilies, families, superfamilies, clans) across proteins with vastly different evolutionary histories and in handling the huge amount of proteins in need of classification.

Towards this goal, we have developed a Perl-based phylome generation and analysis tool, PhyloGenie [1], and a Java application, CLANS (Cluster Analysis of Sequences), for the rapid visualization of pairwise relationships in large datasets [2]. In its most common application, CLANS uses all-against-all BLAST scores of unaligned, even non-homologous, proteins and displays their pairwise similarities as clusters in 2D- or 3D-graphs [Figure 1]. CLANS is however a versatile tool that can be used for the visualization of similarities in many other kinds of data, for example languages [3].

With these tools in hand, we have classified the superfamily of AAA+ ATPases [4,5] [Figure 2], HAMP domains [6], β-propellers [7], TULIP domains [8] and porins [9]. We have also used a combined computational and experimental approach to classify the domains of Trimeric Autotransporter Adhesins [10] and to explore the evolutionary relationships among cradle-loop barrels [11]. This work has led us to propose the concept of "metafold" to classify proteins that are homologous, but have evolved into topologically different folds [11]. More generally, we have found through the analysis of sequence similarity between proteins of known structure that many evolutionary relationships can be inferred among protein domains currently considered to be analogous [12].

Most recently, we have defined a new fold, the β-tent, based on an in-depth analysis of the Thalidomide-binding domain of Cereblon, CULT [13]. 

Figure 1: Galaxy of folds colored by classes.
Figure 2: Phylogenetic tree of AAA proteins.


[1] Frickey T, Lupas AN. (2004) PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res 32(17):5231-8.
PMID: 15459293

[2] Frickey T., Lupas AN. (2004) CLANS: A Java Application for Visualizing Protein Families based on pairwise similarity. Bioinformatics 20(18):3702-3704
PMID: 15284097

[3] Buch A., Erschler D., Jäger G., Lupas A. (2013) Towards automated language classification: A clustering approach. In “Approaches to Measuring
Linguistic Differences”
(Lars Borin and Anju Saxena, eds.), pp. 303-328 De Gruyter

[4] Frickey T., Lupas AN. (2004) Phylogenetic analysis of AAA proteins. J Struct Biol 146(1-2):2-10 
PMID: 15037233

[5] Ammelburg M., Frickey T., Lupas AN. (2006) Classification of AAA+ proteins. J Struct Biol 156(1):2-11
PMID: 16828312

[6] Dunin-Horkawicz S., Lupas AN. (2010) Comprehensive Analysis of HAMP Domains: Implications for Transmembrane Signal Transduction. 
J MolBiol 397(5):1156-74
PMID: 20184894

[7] Chaudhuri I., Söding J., Lupas AN. (2008) Evolution of the beta-propeller fold. Proteins 71(2):795-803
PMID: 17979191

[8] Kopec KO., Alva V., Lupas AN. (2011) Bioinformatics of the TULIP domain superfamily. Biochem Soc Trans 39(4):1033-8
PMID: 21787343

[9] Remmert M., Biegert A., Linke D., Lupas AN., Söding J. (2010) Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin.
Mol Biol Evol 27(6):1348-58
PMID: 20106904

[10] Szczesny P., Lupas A. (2008) Domain annotation of trimeric autotransporter adhesins--daTAA. Bioinformatics 24(10):1251-6
PMID: 18397894

[11] Alva V., Koretke KK., Coles M., Lupas AN. (2008) Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. 
Curr Opin Struct Biol (3):358-65
PMID: 18457946

[12] Alva V., Remmert M., Biegert A., Söding J., Lupas AN. (2008) A Galaxy of Folds. Protein Sci 19(1):124-30
PMID: 19937658

[13] Lupas AN., Zhu H., Korycinski M. (2015) The thalidomide-binding domain of cereblon defines the CULT domain family and is a new member of
the β-tent fold.
  PLoS Comput Biol 11(1):124-30
PMID: 25569776