Improving seed oil and protein content in Brassicaceae : some new genetic insights from Arabidopsis thaliana

– Western Europe oleoproteaginous species like rapeseed mainly accumulate oil and protein in their seeds. To become competitive with soybean, seed protein quantity and quality should be improved in rapeseed. The negative correlation existing between seed protein and oil content apparently prevents the possibility to increase protein content without affecting oil content. Exploration of natural and induced genetic variability in the model plant Arabidopsis thaliana allows the identi ﬁ cation of several genotypes impaired in this negative correlation. Different genetic approaches have been undertaken in order to isolate genetic factors responsible for the tight control of seed oil and protein homeostasis and this negative correlation. Once isolated in this model plant, such genetic determinants will be identi ﬁ ed in important crops such as rapeseed or other oilseed crops in order to manipulate both components independently and thus produce on purposed seeds. In the long term, this research will help breed new varieties that could contribute to reduce Europe ’ s dependence on US soybean import.


Introduction
Seed storage compounds are of crucial importance for human diet, feed and industrial uses, as they are mainly composed of starch, proteins and oil. In oleo-proteaginous species like rapeseed/canola, seed oil was to date the main qualitative determinant that conferred economic value to the harvested seed. This oil is used for human consumption and for oleo-chemistry since it constitutes excellent alternatives to fossil carbon-based products. Nevertheless, it is imperative that these industrial uses do not affect feed/food resources.
In 2030, the world will have to feed 8.4 billion people, i.e. 22% more than in 2010 (United Nation, https://www.un.org/ development/desa/publications/world-population-prospectsthe-2017-revision.html). Population growth and rising living standards will lead to a diversification of diets in all regions of the world, resulting in an increase in oil and protein consumption. Thus, between 2010 and 2030, an increase of nearly 40% in the food demand for oil is expected, to which would be added a moderate additional demand for industrial uses (þ8 to 11%). For plant proteins, the food demand will increase by almost 43%, mainly due to the first nutritional transition still operating in South Asia and sub-Saharan Africa (increase of fruit and vegetable consumption) and to the second nutritional transition (healthier overall diet, i.e. more vegetable and less meat) operating in Europe, North America and Oceania (http://www.terresunivia.fr/sites/default/files/articles/ publications/brochures/2016%2006%2016%20-%20BIPE% 20%26%20SOFIPROTEOL_GlobalOutlook_bd.pdf). At the same time, a 33% increase in animal protein demand will come from developing countries (mainly East Asia) to achieve their first nutritional transition (increase of meat consumption), which will also have a significant impact on world oleoproteaginous seed meal demand for animal feed (þ53%).
On the supply side, vegetable oil resources could more or less meet food, energy and chemical demands. However, a deficit in meal of 58 Mt is forecast in 2030 (http://www.terresunivia.fr/ sites/default/files/articles/publications/brochures/2016%2006% 2016%20-%20BIPE%20%26%20SOFIPROTEOL_GlobalOut look_bd.pdf) thus limiting meat consumption. Currently, the production of vegetable proteins for animal feed in Europe is in short supply. Consequently, improving crop yield, and in particular protein yield, is still a major challenge for the European agriculture and this has to be achieved while reducing the agricultural inputs to minimise the environmental impact of this increased production (Durrett et al., 2008). Oleo-proteaginous cultures, like rapeseed, sunflower and soybean are good candidates to increase plant protein production, but keeping in mind that soybean cultivation is not adapted to northern European regions. European Union (EU) produces about 21 Million tonnes (Mt) of oleo-proteaginous meals, mainly from these three species, however EU imports more than 70% (mainly soybean from Brazil and United States) of its need to feed its livestock (de Visser et al., 2014). Furthermore, rapeseed meal has to compete with soymeal, which contains higher level and better quality of proteins in terms of essential amino-acid composition and digestibility. In this context, improving rapeseed meal quality and quantity is a challenge in Western Europe, where rapeseed is the main cultivated oleo-proteaginous crop. Of course, this should be achieved with regards to seed yield and also to high oil content and quality (fatty acid balance) in rapeseed. During the last four decades in the developed countries, genetic gain was an important leverage to raise rapeseed yield. However, since 1990, this genetic gain yield has slowed down and the current very strong constraints on non-renewable nitrogen inputs further stress this situation. In this context, we know that breakthroughs are needed to bring novel innovative solutions and plant research has to play a major role in this challenge.
The metabolic pathways for the production of storage proteins and oil are already well described and genes encoding the key enzymes of these pathways have been identified in several oleo-proteaginous species (Shewry et al., 1995;Ohlrogge and Jaworski, 1997;Baud et al., 2008;Baud and Lepiniec, 2010). However, the genes and mechanisms determining the differential partitioning of seed reserves into the major storage components remain largely unknown. These factors are of fundamental importance for the successful engineeringwhether it would be by classical breeding or notof high-yield crops by regulating the production and partitioning of storage compounds.
In addition, a strong negative correlation between oil and protein accumulation has been observed in protein-storing seeds like soybean (Chung et al., 2003), as well as in oilstoring seeds like rapeseed (Grami et al., 1977;Jolivet et al., 2013), sunflower (Li et al., 2017) or the model plant Arabidopsis thaliana (Fig. 4), suggesting that seed filling in these species is highly constrained and that manipulating both components independently may be difficult. This was confirmed by QTL/GWAS studies in these species, in which oil and protein QTL often co-localise but display inverse effect, as expected due to balance between the two main compounds of the seed (Chung et al., 2003;Nichols et al., 2006;Bouchet et al., 2014;Hwang et al., 2014;Jasinski et al., 2016). In addition, some attempts to separate oil and protein QTL in soybean were unsuccessful (Chung et al., 2003;Nichols et al., 2006), reinforcing the hypothesis that the same genes are controlling both traits. However, some results are not in complete agreement with this assertion. Indeed, a SNP at which one allele was associated with both higher protein and oil content was identified in soybean (Hwang et al., 2014) and QTL specific of oil content or protein content were identified in rapeseed as well as in Arabidopsis (Bouchet et al., 2014;Jasinski et al., 2016). These QTL are of great interest for improving both traits independently. Moreover, in Arabidopsis, mutant studies have shown that a decrease in the amount of seed protein or oil does not necessarily lead to a compensating increase in the other storage compounds (Finkelstein and Somerville, 1990;Focks and Benning, 1998), suggesting that protein and oil biosynthesis pathways can be disconnected.
Thus it is of interest to understand why there is such a strong negative correlation between oil and protein accumulation in seed and how to break, or at least weaken, this link in order to manage both components independently.
The first step to address this issue is to identify the genetic factors responsible of the tight control of seed oil and protein relative accumulation in oleo-proteaginous species. In a second step, functional analysis of these genes should help understanding this negative correlation and thus provide the tools to manipulate the oil/protein ratio.
At IJPB in Versailles, this question was tackled in Arabidopsis thaliana (Fig. 1) for several reasons. First, it is a species of the same family as rapeseed (Brassicaceae), their seed metabolisms are very similar (Niu et al., 2009) and the close genetic relationship between them allows using comparative genetics to predict orthologous genes and alleles within the Brassica genome (Parkin et al., 2005;Sharma et al., 2014). Second, it is a small plant with a small genome, producing several thousand seeds (thousand seed weight of about 20 mg) in 4 months and easily genetically transformable by floral dipping. For these reasons, it has been chosen as a model organism in plant biology. Consequently, substantial genetic resources are available (notably large collections of natural accessions, recombinant inbred line populations or mutant collections, T-DNA insertion or chemical mutagenesis, giving access to considerable genetic variability), as well as genomic resources (many accessions are sequenced or benefit from very dense physical mapping).

Exploration of the variability of oil and protein contents in Arabidopsis
Seed oil and protein contents are quantitative traits whose variation results from the effect of many genes, the environment (climate, agriculture practices...) and the interactions of these genes with the environment. The amplitude of the variation of these two traits within the Arabidopsis species was studied and their heritability under globally controlled growth conditions was evaluated.
The IJPB Biological Resource Center 48 core-collection of Arabidopsis, in addition to the Col-0 accession and minisets of 20 lines from 8 RIL (Recombinant Inbred Lines, populations https://www.observatoire-vegetal.inra.fr/observatoire-vegeta l_eng/Scientific-platforms/Arabidopsis-Stock-Center) were cultivated. The choice of these lines was made to minimize the number of plants to grow while maintaining as wide as possible the genetic variability (Simon et al., 2008). Each genotype was cultivated in triplicate and three successive and independent cultures were performed in growth chambers with similar global climatic conditions. For each culture, the 624 plants (208 genotypes in 3 replicates) were grown in the same growth chamber, in randomized blocks and moved every other day in "pilgrim steps" in order to minimize the climatic heterogeneity of the growth chamber due to the fact that Arabidopsis is sensitive to climatic variations at centimetric scale.
In parallel, a fast, accurate, and high throughput method, based on near infrared spectrometry (NIRS), to measure Arabidopsis seed oil, protein, carbon and nitrogen contents was developed at IJPB (Jasinski et al., 2016). The development of a NIRS model consists in correlating the NIRS spectra of 100-150 samples with their actual compound content obtained by a reference method, in general labour intensive and timeconsuming. Once set-up, the model allows the estimation of the compound content from any seed sample spectrum (less than 200 mg of seeds -8 to 10 000 seeds). In addition, this measure is non-destructive and does not alter seed viability.
NIRS phenotyping of the 1872 seed samples from the three cultures showed that Arabidopsis seed oil and protein contents displayed a wide range of variation (Fig. 2). In order to quantify the relative contribution of the genotype (G), the environment (E) and the G Â E interaction on the variation of these traits, a global analysis of variance (ANOVA) was carried out on the measures from the three cultures (Fig. 3). It shows that, under our growth conditions, the studied traits are under the control of genetic factors but also depend on the environment. The representation of all these data on a scattered plot allowed us to confirm the negative correlation existing between seed oil and protein contents (Fig. 4). For these studies, the most interesting genotypes are those located at the periphery of the scatter plot. This is obviously the case for genotypes close to the regression line but showing high or Topical Issue low O/P ratio (respectively genotypes corresponding to yellow and blue points in Fig. 4). It can also be noticed that, even though most of the genotypes are close to the P/O regression line, in agreement with seed oil and protein contents being tightly correlated, some genotypes are distant from this regression line, suggesting that this correlation have been removed or relaxed. These genotypes "break" the negative correlation usually observed between oil and protein contents. Indeed, the contents being expressed in percentage of dry seed weight, this indicates that the genotypes "above" the regression line produce seeds with more oil than the "classical line" for a given protein content or more proteins than expected for a given oil content (genotypes corresponding to pink points in Fig. 4). Conversely, genotypes "below" the regression line produce seeds with less oil than expected for a given protein content or less proteins than expected for a given oil content (genotypes corresponding to brown points in Fig. 4). The identification of the genetic factors contributing to this distinct pattern could help when defining new ideotypes for oilseed Brassicaceae and breeding for them.
As mentioned, for a given oil (or protein) value, lines encircled in brown in Figure 4 display a lower oil þ protein amount than lines close to the regression line and the opposite is true for lines encircled in pink. This means that one (or more) other component(s) of the seed should significantly vary between these extreme lines. Therefore the analysis and quantification of the other compounds of Arabidopsis seed, in particular carbohydrates, were undertaken in order to find at least one molecular "marker" that can be used to characterize these lines. Ideally, a NIRS model could then be developed to measure the content of this marker easily and at high throughput level.

Effect of the environment on seed filling
There is an obvious effect of the environment on seed filling, even in Arabidopsis cultivated under controlled conditions (Fig. 3). Moreover the negative correlation is also  An ANOVA was performed for seed oil and protein content on 208 genotypes cultivated three times. Histograms show the effects due to genotype, environment, interaction genotype Â environment (G Â E) and the residual as a percentage of the variation explained.
affected, as illustrated with the three cultures performed under nearly identical climatic conditions (Fig. 5).
The major difficulty in identifying and cloning genes for quantitative traits that are strongly influenced by the environment lies in the accuracy and especially the repeatability of phenotyping. At IJPB, phenotyping robots (Phenoscope, Tisne et al. (2013) and http://www7.inra.fr/vast/Files/PhenoFilm.avi) allow to grow several hundred plants simultaneously on a reduced area, moving sequentially the plants so that they all occupy, successively, the same position in the growth chamber during the same duration. Consequently, their phenotype mainly reflects differences due to variation of their genotype and not those due to variation of the environment. It is thus possible to repeat experiments under identical climatic global conditions, a very important point for validating and cloning genetic determinants. While the first version of the Phenoscopes only allows growing Arabidopsis plants until rosette stage, we are currently building an XL Phenoscope that will allow growing Arabidopsis during its all life cycle ("from seed to seed"). Such a device should improve the phenotyping power of our traits of interest and consequently help to identify more quickly and with greater precision the genes involved. Fig. 4. Seed oil and protein % are negatively correlated. Graph shows the average seed oil and protein content (% of dry seed weight) of 208 Arabidopsis genotypes including RIL from 3 populations and a core-collection of 48 accessions (3 replicates/genotype). All genotypes were grown in one experiment. The black line corresponds to the protein/oil (P/O) regression line. The focus on some remarkable genotypes, corresponding to the highlighted points of different colours, is discussed in the text. In addition, Phenoscope makes also possible to set up complex experimental designs combining several factors in order, for example, to study the effect of abiotic stresses (such as drought and/or nitrogen constraint) on oil and protein accumulation as well as on the slope of the P/O regression line. Indeed, different climatic/nutritional scenari can easily be applied using this tool.

Strategies used to identify genetic factors involved in seed filling
In order to understand how oil and protein are partitioned in Brassicaceae seeds, it is necessary to isolate the genes involved in this tight regulatory process. In addition, understanding how plants cope with a broken P/O negative correlation should allow manipulating accumulation of the storage compounds independently of each other.
To this end, quantitative genetic approaches such as Quantitative Trait Loci (QTL) cloning and Genome Wide Association Studies (GWAS) have been undertaken.
For that purpose, four Arabidopsis RIL populations and a GWAS collection of about 300 Swedish accessions (https:// gwas.gmi.oeaw.ac.at/) have been grown and phenotyped by NIRS. QTL for seed oil and protein contents have been identified in the four populations (Jasinski et al., 2016 and unpublished results) and some are currently fine-mapped in order to identify the genes.
In addition, a forward genetic approach has been undertaken, taking advantage of a collection of about 500 homozygous mutants (obtained after chemical EMS mutagenesis of the Arabidopsis Col-0 accession and made homozygous after 5 successive generations of Single Seed Descent). The seeds of these lines, which present an almost total level of homozygosity, were phenotyped by NIRS. These lines display a wide range of oil and protein content, similar to the one of natural accessions (Fig. 6). Several of them, displaying interesting phenotype (high and low O/P ratio, broken P/O negative correlation) have been selected (Fig. 6). Mapping-by-sequencing is in progress in order to identify the genes responsible for their phenotype.
We therefore hope, through all the approaches and resources developed, to isolate genetic factors involved in the tight control of reserve compound relative accumulation in Arabidopsis seed. Functional analysis of these genes should help elucidate the molecular mechanisms involved during seed filling. These functional analyses will also give us information on the role of these genes on seed physiology and quality (size, viability, germination, longevity, yield) and it will be possible to test the effects of different inputs (nitrogen nutrition, water availability) on plants carrying different alleles of these genes. The ultimate objective of these studies is of course to finally identify these genetic determinants in important crops such as rapeseed or other oilseed crops in order to create ideotypes that better meet the current needs of European agriculture and consumers.