abstract
-
In wheat (Triticum spp.) endosperm, physiologically active proteins called albumins and globulins (AG) represent about 20% of total grain proteins. Some of these AG can be involved in the synthesis of storage proteins (gliadins and glutenins), which are key determinants of the end-used quality of the flour as they strongly influence the rheological properties of the dough. To identify AG involved in seed storage proteins (SSP) synthesis, we used multivariate methods implemented in MixOmics R package to integrate two quantitative datasets. These datasets were produced during grain filling in response to different nitrogen (N) and sulphur (S) supplies. The first dataset consisted in 352 AG accumulated in the grain. The second one concerned the quantities per grain of each SSP fraction. First, relationships between these two datasets were studied using the unsupervised method sparse Partial Least Square, also known as Projection to Latent Structure (sPLS) - a projection based method according to the covariance features, which allows the selection of the most relevant variables in the two datasets. Secondly, data were integrated using a supervised approach taking into account the nutrition and the grain developmental stage thanks to the block.splda procedure developed in DIABLO (Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics studies. This integrative method resulted in a relevance network. The sPLS analysis revealed that the AGs discriminated rather well the grain developmental stages while the SSP discriminated the different nutrition conditions. The relevance network highlighted 18 AGs highly correlated to SSP variables. Linkage mapping performed using sequence markers within genes or their flanking regions showed that five out of this AG set were significantly associated with SSP composition and could be considered as robust candidate proteins able to modulate the SSP synthesis. In conclusion, coupling data integration and linkage mapping seems to be powerful to identify a small set of relevant candidate proteins for further functional validation.