abstract
-
Allopolyploidy imposes a major challenge to map sequence reads of RNA-seq and resequencing. Although high-quality genome assemblies of polyploid species such as Chinese Spring of the hexaploid wheat is publicly available, bioinformatic tools need to be developed to analyze allopolyploid species. We have developed bioinformatic workflows of subgenomeclassification approaches named HomeoRoq and EAGLE-RC. We compared them with common mapping tools developed for diploid species. As ground truth in RNA-seq analysis, we used empirical data of two allopolyploid species with modified ploidy, i.e. extracted tetra-Chinese Spring of bread wheat and synthetic allotetraploid of the model polyploid Arabidopsis kamchatica. We found very similar patterns using the two allopolyploid species. Error in mapping was as high as 10% by using a pseudomapping method Kallisto, while it was <2% by using subgenome-classification methods (1). As a result, about half of the differentially expressed homeologs identified by Kallisto was different from those by other methods, consistent with the known bias of Kallisto in lowly expressed genes. In resequencing analysis, we validated the sorting by Sanger sequencing, and the error rate in sorting by HomeoRoq was 0.2% (3/1,375 SNPs) (2). These data suggest that subgenome-classification bioinformatic tools provide much higher quality in analyzing RNA-seq and resequencing data of hexaploid and tetraploid wheats.