Home About Browse Search
Svenska


Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases

Gado, Japheth E. and Harrison, Brent E. and Sandgren, Mats and Ståhlberg, Jerry and Beckham, Gregg T. and Payne, Christina M. (2021). Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases. Journal of Biological Chemistry. 297 , 100931
[Research article]

[img] PDF
3MB

Abstract

Family 7 glycoside hydrolases (GH7) are among the principal enzymes for cellulose degradation in nature and industrially. These enzymes are often bimodular, including a catalytic domain and carbohydrate-binding module (CBM) attached via a flexible linker, and exhibit an active site that binds cello-oligomers of up to ten glucosyl moieties. GH7 cellulases consist of two major subtypes: cellobiohydrolases (CBH) and endoglucanases (EG). Despite the critical importance of GH7 enzymes, there remain gaps in our understanding of how GH7 sequence and structure relate to function. Here, we employed machine learning to gain data-driven insights into relation-ships between sequence, structure, and function across the GH7 family. Machine-learning models, trained only on the number of residues in the active-site loops as features, were able to discriminate GH7 CBHs and EGs with up to 99% ac-curacy, demonstrating that the lengths of loops A4, B2, B3, and B4 strongly correlate with functional subtype across the GH7 family. Classification rules were derived such that specific residues at 42 different sequence positions each predicted the functional subtype with accuracies surpassing 87%. A random forest model trained on residues at 19 positions in the catalytic domain predicted the presence of a CBM with 89.5% accuracy. Our machine learning results recapitulate, as top-performing features, a substantial number of the sequence positions determined by previous experimental studies to play vital roles in GH7 activity. We surmise that the yet-to-be-explored sequence positions among the top-performing features also contribute to GH7 functional variation and may be exploited to understand and manipulate function.

Authors/Creators:Gado, Japheth E. and Harrison, Brent E. and Sandgren, Mats and Ståhlberg, Jerry and Beckham, Gregg T. and Payne, Christina M.
Title:Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases
Series Name/Journal:Journal of Biological Chemistry
Year of publishing :2021
Volume:297
Article number:100931
Number of Pages:18
Publisher:ELSEVIER
ISSN:0021-9258
Language:English
Publication Type:Research article
Article category:Scientific peer reviewed
Version:Published version
Copyright:Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
Full Text Status:Public
Subjects:(A) Swedish standard research categories 2011 > 1 Natural sciences > 106 Biological Sciences (Medical to be 3 and Agricultural to be 4) > Biochemistry and Molecular Biology
URN:NBN:urn:nbn:se:slu:epsilon-p-113533
Permanent URL:
http://urn.kb.se/resolve?urn=urn:nbn:se:slu:epsilon-p-113533
Additional ID:
Type of IDID
DOI10.1016/j.jbc.2021.100931
Web of Science (WoS)000690879300001
ID Code:25329
Faculty:NJ - Fakulteten för naturresurser och jordbruksvetenskap
Department:(NL, NJ) > Department of Molecular Sciences
Deposited By: SLUpub Connector
Deposited On:14 Sep 2021 14:25
Metadata Last Modified:14 Sep 2021 14:31

Repository Staff Only: item control page

Downloads

Downloads per year (since September 2012)

View more statistics

Downloads
Hits