Expanding Genome3D and disseminating the structural annotations via InterPro and PDBe Grant uri icon

description

  • The structure of a protein dictates the manner in which it interacts with other proteins and whether or how it binds and changes the compounds it is exposed to. Knowing a protein's structure can help rationalise the mechanism by which it performs its biological role. It is also important for understanding how genetic changes such as mutations in the residues that make up the protein, can destroy or modify the way in which it performs that role. Revolutionary new technologies in biology, known as next generation sequencing, are now allowing biologists to collect vast amounts of genetic variation data. For example, information on changes in the sequences of proteins collected from humans suffering from different diseases like cancer or heart disease. Alternatively, sequences of proteins from species important in an agricultural context. For example different strains of wheat that may be more resistant to frost or produce higher yields. However, it is much harder and more expensive to determine the 3D structure of a protein than its sequence. It is particularly difficult for human, mouse, chicken, plants and other eukaryotic organisms that we need to study to understand disease or ensure food security. Currently, on average less than 15% of proteins from these important model organisms have an experimentally determined 3D structure. To address this deficit of structural data, algorithms have been developed for predicting the structure of a protein. The most successful approaches identify a relative having a known structure and inherit 3D information by exploiting the known conservation of structural features between evolutionary related proteins. Five of the top world-leading resources generating such annotations are based in the UK (SUPERFAMILY, Gene3D, Phyre, Fugure, pDomTHREADER). These exploit structural relatives in the SCOP and CATH structural classification - the two world leading resources capturing information on domain structures - to use as templates for predicting structures of uncharacterised relatives. The Genome3D resource, which was launched in 2012, integrates domain structure predictions from all five resources for ten model organisms used to study biological systems and important for the study of human health (e.g. human, mouse) or agriculture and food security (e.g. plant). Although the algorithms used by the resources are powerful for recognising even very remote relationships and inheriting structural information between relatives, their accuracy is < 90%. However, by combining all the data in a single resource and identifying positions in the protein where all the methods agree, it is possible to provide much more reliable annotations. Since it is easier to find these consensus regions if equivalent sets of relatives (i.e. families) in SCOP and CATH have been identified, a large part of the project involves mapping between these resources. We now wish to continue this project, improving the mapping of SCOP and CATH and using this to increase the amount of reliable consensus data that Genome3D provides. We will include additional organisms important for health and agriculture. However, a major benefit from this project will be the integration of the Genome3D structural data with structurally uncharacterised sequences in InterPro, a world-leading resource that combines information on protein families from 11 different resources worldwide. By including Genome3D data for families in InterPro we will be able to increase the number of proteins for which we can provide structural data ten-fold. In addition we will provide a very intuitive web-based viewer for looking at the structures and assessing the likely impacts of any changes in the sequence on the function of the protein. Since many biologists are unfamiliar with the value of structural data in assessing genetic variations we will develop web-based training material and arrange workshops both in our institutes and at international meetings.

date/time interval

  • July 1, 2016 - June 30, 2019

total award amount

  • 386521 GBP

sponsor award ID

  • BB/N019253/1