CAZypedia needs your help! We have many unassigned GH, PL, CE, AA, GT, and CBM pages in need of Authors and Responsible Curators.
Scientists at all career stages, including students, are welcome to contribute to CAZypedia. Read more here, and in the 10th anniversary article in Glycobiology.
New to the CAZy classification? Read this first.
*
Consider attending the 15th Carbohydrate Bioengineering Meeting in Ghent, 5-8 May 2024.

Difference between revisions of "Sequence-based classification"

From CAZypedia
Jump to navigation Jump to search
(21 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
* Responsible Curator:  ^^^Spencer Williams^^^
 
* Responsible Curator:  ^^^Spencer Williams^^^
 
----
 
----
Sequence classification methods require knowledge of at least part of the amino acid sequence for an protein. Algorithmic methods are then used to compare sequences. Each of the resulting families contain proteins that are related by sequence, and by corollary, 3D fold. An obvious shortcoming of sequence-based classifications is that they can only be applied to proteins for which sequence information is available. On the other hand sequence-based classification schemes allow classification of proteins for which no biochemical evidence has been obtained such as the thousands of uncharacterized sequences of [[carbohydrate-active enzymes]] that originate from genome sequencing efforts worldwide. Sequence based classification methods are rather different (and in many ways complimentary) to the Enzyme Commission classification scheme, which assigns proteins to groups based on the nature of the reactions that they catalyze.  
+
Sequence classification methods require knowledge of at least part of the amino acid or nucleotide sequence for a protein. Algorithmic methods are then used to compare and classify sequences (''e.g.'' the original classification of [[Glycoside Hydrolase Families]] relied largely on hydrophobic cluster analysis and multiple sequence alignment <cite>Henrissat1989 Henrissat1991</cite>, while sequence alignment and Hidden Markov Model methods have become dominant with the evolution of the [[carbohydrate-active enzymes]] classification <cite>Cantarel2009 Lombard2013</cite>). Each of the resulting sequence-based families contain proteins that are related by sequence, and by corollary, three-dimensional fold. An obvious shortcoming of sequence-based classifications is that they can only be applied to proteins for which sequence information is available. On the other hand sequence-based classification schemes allow classification of proteins for which no biochemical evidence has been obtained such as the thousands of uncharacterized sequences of [[carbohydrate-active enzymes]] that originate from genome sequencing efforts worldwide. Sequence-based classification methods are rather different (and in many ways complementary) to the Enzyme Commission classification scheme, which assigns proteins to groups based on the nature of the reactions that they catalyze <cite>ECWikipedia</cite>.  
  
 
== Classification of glycoside hydrolases ==
 
== Classification of glycoside hydrolases ==
 
=== Families ===
 
=== Families ===
Using a combination of comparison algorithms the [[glycoside hydrolases]] have been classified into more than 100 GH families <cite>1</cite>. This classification is permanently available through the Carbohydrate Active enZyme database<cite>2</cite>. Classification of [[glycoside hydrolases]] into families allows many useful predictions to be made since it has long been noted that the catalytic machinery and molecular mechanism is conserved for the vast majority of the GH families <cite>3</cite> as well as the geometry around the glycosidic bond (irrespective of naming conventions)<cite>4</cite>. Usually, the mechanism used (ie [[retaining]] or [[inverting]]) is conserved within a GH family. One notable exception is the [[glycoside hydrolases]] of family [[GH97]], which contains both [[retaining]] and [[inverting]] enzymes; a glutamate acts as a [[general base]] in inverting members, whereas an aspartate likely acts as a [[catalytic nucleophile]] in retaining members <cite>9</cite>. Another mechanistic curiosity are the [[glycoside hydrolases]] of familes [[GH4]] and [[GH109]] which operate through an [[NAD-dependent hydrolysis]] mechanism that proceeds through oxidation-elimination-addition-reduction steps via anionic [[transition state]]s <cite>10</cite>. This allows a single enzyme to hydrolyze both alpha- and beta-glycosides.
+
Using a combination of comparison algorithms the [[glycoside hydrolases]] have been classified into more than 100 GH families <cite>Henrissat1991</cite>. This classification is permanently available through the [http://www.cazy.org/ Carbohydrate Active enZyme database] <cite>Lombard2013</cite>. Classification of [[glycoside hydrolases]] into families allows many useful predictions to be made since it has long been noted that the catalytic machinery and molecular mechanism is conserved for the vast majority of the GH families <cite>Gebler1992</cite> as well as the geometry around the glycosidic bond (irrespective of naming conventions) <cite>Henrissat1995</cite>. Usually, the mechanism used (ie [[retaining]] or [[inverting]]) is conserved within a GH family. One notable exception is the [[glycoside hydrolases]] of family [[GH97]], which contains both [[retaining]] and [[inverting]] enzymes; a glutamate acts as a [[general base]] in inverting members, whereas an aspartate likely acts as a [[catalytic nucleophile]] in retaining members <cite>Gloster2008</cite>. Another mechanistic curiosity are the [[glycoside hydrolases]] of familes [[GH4]] and [[GH109]] which operate through an [[NAD-dependent hydrolysis]] mechanism that proceeds through oxidation-elimination-addition-reduction steps via anionic [[transition state]]s <cite>Yip2007</cite>. This allows a single enzyme to hydrolyze both alpha- and beta-glycosides.
 +
 
 +
As a consequence of the evolution of the classification, several GH families have been deleted. Once deleted, family numbers are never reused in order to prevent confusion.
 +
 
 +
A current list of all GH families is available on the [[Glycoside Hydrolase Families]] page.  Also see the [http://www.cazy.org/Glycoside-Hydrolases.html list of GH pages on the CAZy Database].
  
 
=== Clans ===
 
=== Clans ===
 +
Classification of GH families into larger groups, termed "clans", has been proposed <cite>Henrissat1996 DaviesSinnott2008</cite>. A clan is a group of families that possess significant similarity in their tertiary structure, catalytic residues and mechanism. Thus knowledge of three-dimensional structure and the functional assignment of catalytic residues is required for classification into clans. Families within clans are thought to have a common evolutionary ancestry. Please  see the CAZy Database for a current table of [http://www.cazy.org/Glycoside-Hydrolases.html glycoside hydrolase clans].
 +
 +
== Classification of glycosyltransferases ==
 +
Using sequence comparison algorithms [[glycosyltransferases]] that use nucleotide diphospho-sugar, nucleotide monophospho-sugars and sugar phosphates have been grouped into over 90 GT families <cite>Campbell1997 Countinho2003</cite>. This classification is permanently available through the Carbohydrate Active enZyme database<cite>CAZyURL</cite>. As for the GH families above, the same three-dimensional fold is expected to occur within each of the GT families. Just as for the glycoside hydrolases, several of the families defined on the basis of sequence similarities turn out to have similar three-dimensional structures.
  
Classification of GH families into larger groups, termed "clans", has been proposed <cite>5</cite>. A clan is a group of families that possess significant similarity in their tertiary structure, catalytic residues and mechanism. Thus knowledge of three-dimensional structure and the functional assignment of catalytic residues is required for classification into clans. Families within clans are thought to have a common evolutionary ancestry. For an updated table of glycoside hydrolase clans see the CAZy Database <cite>6</cite>.
+
As a consequence of the evolving classification, GT families may be deleted; one example is [[GT36]], which has been reclassified as [[GH94]]. Once deleted, family numbers are never reused in order to prevent confusion.
  
== Classification of glycosyltransferases ==
+
A current list of all GT families covered in ''CAZypedia'' is available on the [[Glycosyltransferase Families]] page.  Also see the [http://www.cazy.org/GlycosylTransferases.html list of GT pages on the CAZy Database].
Using sequence comparison algorithms [[glycosyltransferases]] that use nucleotide diphospho-sugar, nucleotide monophospho-sugars and sugar phosphates have been grouped into over 90 GT families <cite>7 8</cite>. This classification is permanently available through the Carbohydrate Active enZyme database<cite>2</cite>. As for the GH families above, the same three-dimensional fold is expected to occur within each of the GT families. Just as for the glycoside hydrolases, several of the families defined on the basis of sequence similarities turn out to have similar three-dimensional structures.  
 
  
 
== References ==
 
== References ==
 
<biblio>
 
<biblio>
 +
#Henrissat1991 pmid=1747104
 +
#Gebler1992 pmid=1618761
 +
#Henrissat1995 pmid=7624375
 +
#Henrissat1996 pmid=8687420
 +
#Campbell1997 pmid=9334165
 +
#Countinho2003 pmid=12691742
 +
#Gloster2008 pmid=18848471
 +
#Yip2007 pmid=17676871
 +
#ECWikipedia http://en.wikipedia.org/wiki/Enzyme_Commission_number
 +
#Henrissat1989 pmid=2806912
 +
#Cantarel2009 pmid=18838391
 +
#Lombard2013 pmid=24270786
 +
#DaviesSinnott2008 Davies, G.J. and Sinnott, M.L. (2008) Sorting the diverse: the sequence-based classifications of carbohydrate-active enzymes. ''The Biochemist'', vol. 30, no. 4., pp. 26-32. [http://www.biochemist.org/bio/03004/0026/030040026.pdf Download PDF version].
 +
</biblio>
  
#1 pmid=1747104
 
#2 Carbohydrate Active Enzymes database; URL http://www.cazy.org/
 
#3 pmid=1618761
 
#4 pmid=7624375
 
#5 pmid=8687420
 
#6 Carbohydrate Active Enzymes database, glycoside hydrolase classification; URLhttp://www.cazy.org/Glycoside-Hydrolases.html
 
#7 pmid=9334165
 
#8 pmid=12691742
 
#9 pmid=18848471
 
 
#10 pmid=17676871
 
</biblio>
 
 
[[Category:Definitions and explanations]]
 
[[Category:Definitions and explanations]]
[[Category:Curator approved]]
 

Revision as of 09:27, 6 June 2019

Approve icon-50px.png

This page has been approved by the Responsible Curator as essentially complete. CAZypedia is a living document, so further improvement of this page is still possible. If you would like to suggest an addition or correction, please contact the page's Responsible Curator directly by e-mail.

  • Authors: ^^^Steve Withers^^^, ^^^Spencer Williams^^^
  • Responsible Curator: ^^^Spencer Williams^^^

Sequence classification methods require knowledge of at least part of the amino acid or nucleotide sequence for a protein. Algorithmic methods are then used to compare and classify sequences (e.g. the original classification of Glycoside Hydrolase Families relied largely on hydrophobic cluster analysis and multiple sequence alignment [1, 2], while sequence alignment and Hidden Markov Model methods have become dominant with the evolution of the carbohydrate-active enzymes classification [3, 4]). Each of the resulting sequence-based families contain proteins that are related by sequence, and by corollary, three-dimensional fold. An obvious shortcoming of sequence-based classifications is that they can only be applied to proteins for which sequence information is available. On the other hand sequence-based classification schemes allow classification of proteins for which no biochemical evidence has been obtained such as the thousands of uncharacterized sequences of carbohydrate-active enzymes that originate from genome sequencing efforts worldwide. Sequence-based classification methods are rather different (and in many ways complementary) to the Enzyme Commission classification scheme, which assigns proteins to groups based on the nature of the reactions that they catalyze [5].

Classification of glycoside hydrolases

Families

Using a combination of comparison algorithms the glycoside hydrolases have been classified into more than 100 GH families [2]. This classification is permanently available through the Carbohydrate Active enZyme database [4]. Classification of glycoside hydrolases into families allows many useful predictions to be made since it has long been noted that the catalytic machinery and molecular mechanism is conserved for the vast majority of the GH families [6] as well as the geometry around the glycosidic bond (irrespective of naming conventions) [7]. Usually, the mechanism used (ie retaining or inverting) is conserved within a GH family. One notable exception is the glycoside hydrolases of family GH97, which contains both retaining and inverting enzymes; a glutamate acts as a general base in inverting members, whereas an aspartate likely acts as a catalytic nucleophile in retaining members [8]. Another mechanistic curiosity are the glycoside hydrolases of familes GH4 and GH109 which operate through an NAD-dependent hydrolysis mechanism that proceeds through oxidation-elimination-addition-reduction steps via anionic transition states [9]. This allows a single enzyme to hydrolyze both alpha- and beta-glycosides.

As a consequence of the evolution of the classification, several GH families have been deleted. Once deleted, family numbers are never reused in order to prevent confusion.

A current list of all GH families is available on the Glycoside Hydrolase Families page. Also see the list of GH pages on the CAZy Database.

Clans

Classification of GH families into larger groups, termed "clans", has been proposed [10, 11]. A clan is a group of families that possess significant similarity in their tertiary structure, catalytic residues and mechanism. Thus knowledge of three-dimensional structure and the functional assignment of catalytic residues is required for classification into clans. Families within clans are thought to have a common evolutionary ancestry. Please see the CAZy Database for a current table of glycoside hydrolase clans.

Classification of glycosyltransferases

Using sequence comparison algorithms glycosyltransferases that use nucleotide diphospho-sugar, nucleotide monophospho-sugars and sugar phosphates have been grouped into over 90 GT families [12, 13]. This classification is permanently available through the Carbohydrate Active enZyme database[14]. As for the GH families above, the same three-dimensional fold is expected to occur within each of the GT families. Just as for the glycoside hydrolases, several of the families defined on the basis of sequence similarities turn out to have similar three-dimensional structures.

As a consequence of the evolving classification, GT families may be deleted; one example is GT36, which has been reclassified as GH94. Once deleted, family numbers are never reused in order to prevent confusion.

A current list of all GT families covered in CAZypedia is available on the Glycosyltransferase Families page. Also see the list of GT pages on the CAZy Database.

References

  1. Henrissat B, Claeyssens M, Tomme P, Lemesle L, and Mornon JP. (1989). Cellulase families revealed by hydrophobic cluster analysis. Gene. 1989;81(1):83-95. DOI:10.1016/0378-1119(89)90339-9 | PubMed ID:2806912 [Henrissat1989]
  2. Henrissat B (1991). A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem J. 1991;280 ( Pt 2)(Pt 2):309-16. DOI:10.1042/bj2800309 | PubMed ID:1747104 [Henrissat1991]
  3. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, and Henrissat B. (2009). The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 2009;37(Database issue):D233-8. DOI:10.1093/nar/gkn663 | PubMed ID:18838391 [Cantarel2009]
  4. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, and Henrissat B. (2014). The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42(Database issue):D490-5. DOI:10.1093/nar/gkt1178 | PubMed ID:24270786 [Lombard2013]
  5. [ECWikipedia]
  6. Gebler J, Gilkes NR, Claeyssens M, Wilson DB, Béguin P, Wakarchuk WW, Kilburn DG, Miller RC Jr, Warren RA, and Withers SG. (1992). Stereoselective hydrolysis catalyzed by related beta-1,4-glucanases and beta-1,4-xylanases. J Biol Chem. 1992;267(18):12559-61. | Google Books | Open Library PubMed ID:1618761 [Gebler1992]
  7. Henrissat B, Callebaut I, Fabrega S, Lehn P, Mornon JP, and Davies G. (1995). Conserved catalytic machinery and the prediction of a common fold for several families of glycosyl hydrolases. Proc Natl Acad Sci U S A. 1995;92(15):7090-4. DOI:10.1073/pnas.92.15.7090 | PubMed ID:7624375 [Henrissat1995]
  8. Gloster TM, Turkenburg JP, Potts JR, Henrissat B, and Davies GJ. (2008). Divergence of catalytic mechanism within a glycosidase family provides insight into evolution of carbohydrate metabolism by human gut flora. Chem Biol. 2008;15(10):1058-67. DOI:10.1016/j.chembiol.2008.09.005 | PubMed ID:18848471 [Gloster2008]
  9. Yip VL, Thompson J, and Withers SG. (2007). Mechanism of GlvA from Bacillus subtilis: a detailed kinetic analysis of a 6-phospho-alpha-glucosidase from glycoside hydrolase family 4. Biochemistry. 2007;46(34):9840-52. DOI:10.1021/bi700536p | PubMed ID:17676871 [Yip2007]
  10. Henrissat B and Bairoch A. (1996). Updating the sequence-based classification of glycosyl hydrolases. Biochem J. 1996;316 ( Pt 2)(Pt 2):695-6. DOI:10.1042/bj3160695 | PubMed ID:8687420 [Henrissat1996]
  11. Davies, G.J. and Sinnott, M.L. (2008) Sorting the diverse: the sequence-based classifications of carbohydrate-active enzymes. The Biochemist, vol. 30, no. 4., pp. 26-32. Download PDF version.

    [DaviesSinnott2008]
  12. Campbell JA, Davies GJ, Bulone V, and Henrissat B. (1997). A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem J. 1997;326 ( Pt 3)(Pt 3):929-39. DOI:10.1042/bj3260929u | PubMed ID:9334165 [Campbell1997]
  13. Coutinho PM, Deleury E, Davies GJ, and Henrissat B. (2003). An evolving hierarchical family classification for glycosyltransferases. J Mol Biol. 2003;328(2):307-17. DOI:10.1016/s0022-2836(03)00307-3 | PubMed ID:12691742 [Countinho2003]

All Medline abstracts: PubMed