A tool for identifying causative mutations from sequencing data without a reference genome

description

Forward genetic screens are essential to identify target genes behind desirable traits and their beneficial application. Traditional map-based cloning approaches are extremely labour intensive and years can elapse between the mutagenesis and the detection of the polymorphism responsible for the phenotype. The arrival of high throughput sequencing (HTS) technologies has raised the importance of genomics and offers a number of ways to accelerate discoveries using forward genetic screens. A primary application of HTS for genetics is the detection of DNA sequence polymorphisms among different genotypes within a species, as these polymorphisms can be directly associated with phenotypic variation. HTS approaches have accelerated forward genetic screens through the rate at which mutations are mapped. An important advance includes mapping-by-sequencing (MBS), which enables mapping and identification of causal mutation in a single step by providing allele frequency from pools and the identification of causal mutations at single-nucleotide resolution. MBS requires a complete genome assembly and cannot be used in non-sequenced species or those with draft genomes. Hence, there is a need for computational tools to identify mutations directly from a general, whole genome HTS datasets for organisms with a draft or pre-draft genome assembly. Even though the ability to cause mutations and manipulate non-model genomes to test and characterise the candidate mutations are available, lack of or limited genetic and genomic resources are restricting the application of HTS methods to forward genetic screens of non-model organisms. Therefore new methods are necessary that can provide fast and cost-effective ways to order genome assemblies for causative mutation mapping using sequencing data from forward genetics screen on non-model plants and animals. We have exciting preliminary data from an algorithm we have developed that can order contigs based on the expected density distribution of SNPs from forward genetic mutant data. We have devised a genetic algorithm that can effectively traverse the space to find an optimum arrangement that maximises the SNP density distribution according to the expected distribution from the initial genetic screen. We have implemented and tested Genetic Algorithm To Re-order Contigs (GATROC) using a small simulated dataset generated from Arabidopsis. The major objective of the work proposed here is to develop our proof-of-principle fragment arrangement algorithm to be applicable to sequence data from genomes of any size and using data generated from different sequencing technologies. We also would evaluate the algorithms performance using various published studies to provide benchmarks and opportunities to extend to various other systems. We will include a variant call pipeline to deal with a range of sequencing technologies. Additionally we will implement various extensions of the algorithms that would analyse data from backcrossed populations as well as variant data from polyploids such as wheat. We aim to provide visualization tools that would help design markers to verify the candidate mutation. Our algorithm will be provided in various implementations such as Galaxy pipelines, binaries for use in various operating systems as well open source release of the source code for developers to ensure the software is as widely used as possible.

date/time interval

March 31, 2015 - January 20, 2017

participant

MacLean, Dan Principal Investigator
RALLAPALLI, GHANASYAM Co-Principal Investigator
University of East Anglia Administrator

funding provided via

A tool for identifying causative mutations from sequencing data without a reference genome Grant

WheatVIVO

A tool for identifying causative mutations from sequencing data without a reference genome Completed Project

Overview

description

date/time interval

participant

funding provided via

Publications

output publications or other works

Search form

Overview

description

date/time interval

participant

funding provided via

Publications

output publications or other works