MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE ALGORITHMS FOR PREDICTING GENES EXPRESSION IN WHEAT USING METHYLATION PATTERNS Abstract uri icon

abstract

  • Epigenetic regulations play an essential role in controlling aspects of cell-specific gene expression programs and cell differentiation processes and in regulating various biotic and abiotic stresses. Generally, hyper-methylation is correlated with gene silencing, while hypo-methylation is associated with active transcription. Methylation within gene promoter regions is thought to inhibit regulatory protein binding and repress transcription, whereas methylation within introns and exons is correlated with highly expressed genes. However, it is still unclear what methylation levels and/or sites (promoter, exons, introns, UTR 5’ and UTR 3’) have the most impact on gene regulation. The objective of this study was to investigate cytosine methylation patterns across the Chinese Spring genome and their impact on gene expression, using machine learning and artificial intelligence algorithms. We used methylation patterns for predicting deferentially expressed genes (DEG) between root and leaf, using six machine learning algorithms, Logistic regression (LR), Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Classification and regression trees (CART), Naïve Bayes (NB) and Support vector machine (SVM), and one artificial intelligence technique (Neural network). Across the genome, cytosine methylation occurred preponderantly in CG and CHH contexts (40% each) whereas only 20% of methylation occurred in CHG context. However, the level of methylation was high (beta-value  0.8) in CG context, low (beta-value  0.2) in CHH context but highly variable (beta-value 0 – 1, mean = 0.4) in CHG context. There was a strong correlation (r = 0.7 to 0.9) of methylation levels between all genes features (intron, exon, UTR 3’ and UTR 5’), but the promoter (r = 0.4 to 0.5). All machine learning algorithms and the neural network gave high prediction accuracy (0.81) of genes expression classes, except for CART (0.68). The recursive features elimination (RFE) analysis showed that methylation in introns, promoter and exons contribute the most to gene expression and have equal weights, followed by methylation in first exon, UTR 3’ and UTR 5’. This study, used various machine learning and artificial intelligence algorithms, and provides novel insights into DNA methylation patterns in wheat and their role in regulation of gene expression.

publication date

  • July 2019