description
- The start of the 21st Century saw the landmark publication of the human genome, changing the way we do biology and having a huge impact on medicine. This heralded a new era of genomics that was initially dominated by the generation and analysis of genomes of model organisms and more economically important species. Concurrently, genome technologies have enabled advances in microbiology, such as disentangling complex communities or, as has been seen in the pandemic, identifying new emerging variants of SARS-CoV-2. These rapid advances have been driven by innovation in high-throughput sequencing technologies and software to assemble and analyse genomes. Recently, step changes in these areas enable the generation of high-quality genomes at scale, making ambitious projects like the Earth Biogenome Project, with the goal of generating genomes for all eukaryotic life, feasible. Furthermore, it means that rather than being limited to a single genome for a species, it is now possible to generate multiple genomes, helping to capture the diversity of the species. However, the scale and complexity of this genomic data presents an analytical challenge and there is a pressing need across the public and private sectors (our stakeholders) for tools, expertise and capacity to translate genomes and long-read technologies into discoveries. The outputs of the Decoding Biodiversity (DECODE) research programme will deliver to this need, to the BBSRC Transformative Technologies theme, and to the government prioritization of investment and innovation in genomics and bioinformatics (UK Innovation Strategy). DECODE brings together expertise in computational biology, mathematics and genomics. It builds on innovations from our previous core strategic programme "Genomics for Food Security", the cross Institute Strategic Programme (ISP) "Designing Future Wheat", and the Quadram ISP "Gut Microbes and Health". In addition, it draws on the experience and networks gained through the research capacity-building programme "Grow Colombia", and as a partner in the Darwin Tree of Life (DToL) consortium. DECODE is delivered through three interconnected work packages: Work package 1 will develop tools and techniques to investigate biodiversity. Specifically, this includes developing methods for: comparing multiple genomes within and across species to identify structural changes; using multiple genomes to improve annotation of coding and regulatory regions in the genome; resolving complexity of bacteria communities and biological roles within those communities; the deployment of sequences as real-time sensors of environmental communities. With our partners IBM and Eagle genomics, we will make the software and workflows developed are robust, deployable and scalable. Work package 2 will use the tools developed in WP1 to investigate biodiversity in publicly available genomes. We will use multiple analytical approaches to: assign function to genomic "dark matter"-genes of currently unknown function; investigate mechanisms underpinning chemical diversity in plants; and identify mechanisms driving genetic diversity in key agricultural crops and aquaculture species. Work Package 3 will use long read sequencing technologies and the tools developed in WP1 to uncover and explore biodiversity. Specifically, how community structures change over time in increasingly complex systems (the gut, anaerobic digesters and soil) will be investigated. Furthermore, through quantifying gene content changes, WP3 will aim to identify how biological functions change in a community and link these to community health. To deliver this programme, we have established four key strategic partnership: RBG Kew will provide expertise in plant metabolism, pangenomics and crop wild relatives, IBERS brings expertise in UK orphan crops, the UK Center for Ecology & Hydrology will provide soil samples and access to contextual datasets, and IBM Research will support deployment and scalability of tools.