A Reference Transcript Database for improved analysis of RNA-seq data from barley Completed Project uri icon

description

  • The term 'gene expression' refers to the biological process by which a gene gives rise to a protein. In eukaryotes, gene expression is complex. The DNA sequence of the gene is first copied into a precursor messenger RNA (pre-mRNA) by the process of transcription and the pre-mRNA subjected to several processing steps to form a mature messenger RNA (mRNAs) that is the template for synthesis of the corresponding protein. The post-transcriptional processing steps can generate different mRNA transcripts from the same gene (i.e. transcript isoforms), effectively modulating individual transcript abundance and potentially protein function. Having multiple transcript isoforms from a single gene is problematic in terms of 1) defining the expression levels of individual transcript isoforms and how they change under different conditions, and 2) determining their characteristics - such as whether they encode protein isoforms or not. As gene expression data is widely used to derive biological inference, for example, by grouping genes according to common patterns of expression, failure to take account of the relative abundance of alternative transcripts will unavoidably generate false conclusions. In this project, we focus on the development of a resource/tool that will allow the accurate detection and quantification of mRNA transcript isoforms in barley. The tool will enable high resolution analysis of dynamic changes in gene expression at the individual transcript level and as a recognised and accessible reference will help unify and structure such analyses across a research community. One of the main approaches scientists use to associate genes with functions is to monitor patterns of gene expression: i.e. where and when genes are switched on or off, and at what level. Current approaches provide an overall measure of gene expression by counting the frequency of occurrence of very specific sequences that correspond to a given mRNA relative to the whole population of mRNAs in a particular sample and transforming these counts into relative abundance levels. However these methods are unable to distinguish the abundance of individual isoform variants, in particular those that determine protein levels, structures and activities. We call the tool a 'Reference Transcript Database' or RTD. The RTD is effectively a library of all of the transcript isoforms that exist in a diverse range of tissues from a single organism. By using the RTD in gene expression studies we can identify and determine the abundance of different transcript isoforms easily and quickly, and these can be used in subsequent functional analyses. We focus this project on the crop plant, barley, a model for the small grain Triticeae cereals that include wheat and rye. The RTD will allow effects on global and specific gene expression to be easily analysed at the transcript level in plants subjected to a range of conditions or treatments, improving our community's ability to explore and understand a wide range of biological processes. The RTD will be refreshed and maintained longer term by the barley and computational sciences groups at the James Hutton Institute.

date/time interval

  • September 30, 2018 - September 29, 2020

participant