S ustainability criteria Issues in evaluating sustainability of farming systems with indicators

The growing concern about side-effects of policies focusing on economic growth or even technological innovations, as well as agriculture intensification leads more and more stakeholders to pay attention to the questions of monitoring and evaluation of agricultural practices. This step of evaluation is now essential in policy decision, in research and design of innovative solutions, in NGOs’ development projects, as well as in improvement process in ISO certification. The aim of this article is to review steps in the evaluation of sustainability in agriculture, starting in a first section with the necessity to develop a conceptual indicator framework to precise evaluators’ own vision of sustainability. In a second section, we address the necessity to answer preliminary questions that will guide the selection of a set of indicators or an assessment method. In a third section, after discussing the way to categorize indicators, we provide an overview of available indicators for two sustainability themes of the environmental dimension regarding respectively nitrogen management and biodiversity. In a fourth section, we highlight the diversity of evaluation methods of sustainability through six examples in France. Finally we conclude the article with a general discussion on questions that remain to address.


Introduction
Following the Rio conference in 1992, the environmental issue and, more generally, the question of sustainability became a concern in the developed countries and at the planet level.Though its ability to federate, this concept failed to meet a consensus on its implementation until now (Robinson, 2004) so that Lacousmes (2005) spoke about a "driving illusion".However, the growing concern about side-effects of policies focusing on economic growth or technological innovations, as well as agriculture intensification led more and more stakeholders to pay attention to the questions of monitoring and evaluation.This step of evaluation has become now essential in policy decision, in research and design of innovative solutions, in the NGOs' development projects, as well as in improvement process in ISO certification (López-Ridaura et al., 2005;Niemeijer and de Groot, 2008).Thus, there is a general agreement on the need of developing sustainability indicators that have to be organized in a conceptual framework to form an evaluation method.The use of indicators can be easily explained by the impossibility to measure directly environmental impacts in routine outside of research context, or by difficulties when addressing complex systems or concepts such as biodiversity and sustainability (Gras et al., 1989;Maurizi and Verrel, 2002).This has fostered a great development of studies on indicators, especially in the agricultural sector (Riley, 2001;Rosnoblet et al., 2006).
The aim of this article is to review steps in the evaluation of sustainability in agriculture, starting in Section 1 with the necessity to develop a conceptual indicator framework to precise evaluators' own vision of sustainability.In Section 2, we address the necessity to answer preliminary questions that will guide the selection of a set of indicators or an assessment method.In Section 3, after discussing the way to categorize indicators, we provide an overview of available indicators for two sustainability themes of the environmental dimension regarding respectively nitrogen management and biodiversity.In Section 4, we highlight the diversity of evaluation methods of sustainability through six examples in France.Finally we conclude the article with a general discussion on questions that remain to address.

Different sustainability frameworks
Sustainability is by nature a multidimensional issue which addresses hence a set of criteria which can be simply organized in a list or in a more complex framework (Ledoux et al., 2005).In any case, a general conceptual indicator framework is a prerequisite to any indicator selection to avoid a unconsidered and even biased assessment of sustainability (Alkan Olsson et al., 2009).Hansen (1996) distinguished between sustainability as an approach of agriculture and sustainability as a property of agriculture.He separated the former between (i) an alternative ideology and (ii) a set of strategies, and the latter between (iii) an ability to fulfill goals and (iv) an ability to continue.If the first approach based on an ideology remains general and vague, the three others have been translated in operational principles that inspired different evaluation frameworks (Smith and McDonald, 1998).Defining sustainability as a set of strategies or practices led to numerous assessment methods implementing a scoring system of farmers' practices like the IDEA methods for the environmental pillar of sustainability (Zahm et al., 2008) or the indicator set of Rigby et al. (2001).The goalbased approach of sustainability is also very common and encompasses a framework based on a set of general goals often divided in more operational goals (Bockstaller et al., 1997).An example of an environmental general goal can be "preserving water quality" that can be translated into several operational goals like "reducing nitrate leaching" or "decreasing pesticide transfer to ground or surface water".In some cases, the goals can be quantified (e.g."reducing the nitrate leaching by X%").In many other systems, goals are expressed more vaguely in form of themes and sub-themes (e.g.Alkan Olsson et al., 2009;Ledoux et al., 2005).In Life-Cycle Analysis focusing mainly on the environmental dimension, goals refer to environmental impacts derived from the cause-effect chain (Payraudeau and van der Werf, 2005).More recently, Life-Cycle Analysis was applied to the social dimension (Falque et al., 2013;Feschet, 2014).The last approach of sustainability referring to a propriety of agriculture to continue was extended by several authors to a set of systemic properties.Bossel (2000) proposed six basic axes linked to systemic properties: existence, effectiveness, freedom of action, security, adaptability, and coexistence across totally different systems like cultural and social systems, ecosystsms.López-Ridaura et al. (2005) collected a long list of attributes or properties of sustainability to develop an assessment method of sustainability of small peasant farms in Mexico.Finally he focused on five main attributes: productivity (ability to produce a combination of outputs), stability (to reproduce the former), reliability (ability to remain at an equilibrium state in normal conditions), resilience (ability to recover a normal stage following a perturbation) and adaptability or flexibility (ability to function in new conditions).Although this last approach is interesting in its genericity and in avoiding long list of indicators, its implementation raises problem when a property should be translated into concrete indicators (Alkan Olsson et al., 2009).
Most of sustainability frameworks are structured across three sustainability dimensions or pillars: economic, environmental and social.Within each dimension, a list of items, goals, themes, etc. are defined.In some cases, those are organized in a hierarchical way which leads the aggregation step of indicators like for the MASC (Craheix et al., 2012) and DEXiPM models (Pelzer et al., 2012), or in the method of van Asselt et al. (2014).The property-based approach proposes generic properties across the three dimensions of sustainability.A similar attempt was presented by Alkan Olsson et al., 2009) for a goal-based indicator framework (GOF) that classifies sustainability themes between ultimate goal, process to achieve (goal) and means (Alkan Olsson et al., 2009).This classification of themes across sustainability dimensions considers the action chain.A policy is motivated by one or several ultimate goals (e.g.human health, viability, see Bossel (2000)) requiring some process to achieve it (e.g.balance of environmental function, improvement of economic performance) and means (e.g.protecting environmental compartment, increasing financial capital), across the three sustainability dimensions.

Preliminary choices
The clarification of a sustainability framework that structures the indicator selection has to be completed by several preliminary choices and assumptions (Bockstaller et al., 2008).An initial diagnosis is required to identify the actual issues regarding sustainability for a given system, to put in evidence stakeholders and participants implicated, processes involved, degree of severity of impacts etc. (answering therefore the question: why to evaluate?).
The identification of the end-users (to evaluate for whom?) and the definition of the practical objectives (to evaluate for what?) of the indicator, were pointed out as an essential step by several authors (Brooks and Bubb, 2014;Girardin et al., 1999;Mitchell et al., 1995).This preliminary steps will serve as a basis to design or select indicators that meet end-users needs and requirements.Different users groups can be identified like, for example, scientists, advisors, farmers, decision makers, or consumers.The group of people doing the calculations and the group of people using the results should be differentiated.In many cases, although farmers are the targeted user group, they are actually not direct users of the evaluation method but they are end-users of the results (Cerf and Meynard, 2006).It should be noticed that the position of users and end-users of results is in many cases not only driven by scientific considerations (Bouleau, 2012;Gudmundsson, 2003).Indeed, they may interest in the selection of indicators or their results to defend their interests.
An indicator can be developed for various objectives.Those can be ordered in three main usages: (i) to gain knowledge about a system, e.g.ex post evaluation of an action at the end or during its implementation, monitoring purpose with an alert role, or checking the respect of regulation.(ii) for decision support: ex ante evaluation of actions in a planning phase to select the "best" system (Sadok et al., 2008), decision support in real time to drive the system, (iii) communication which implies a reduced number of indicators easy to understand (Mitchell et al., 1995).
The design of a sustainability framework allows to define the issue of concerns or criteria to precise the content of evaluation (to evaluate what?).Those can be presented in form of a set of strategies, of goals, themes, systemic properties, etc. as presented in the previous section.The definition of the system boundaries is another important step directly linked to the previous one (Van Cauwenbergh et al., 2007).It includes the calculation scales, spatial and temporal (to evaluate where?When?), the organizational level, which will be influenced by the user needs, the issues of concerns, etc.In Life Cycle Analysis approaches (LCA), users are compelled to define the system boundaries (Brentrup et al., 2004).It can be the product, the farm including or not upstream such as production of inputs and offstream activities such as waste management.In many other evaluation methods, this definition seems to be neither explicit nor unified between indicators.For example, in several methods indirect energy cost due to fertilizer, pesticide, machine production are included in the energy calculation like in LCA although for the rest of issues like water quality or emissions of pollutant to air, such approach is not implemented and only direct impact at field or farm level are covered.Regarding spatial and time scales one should pay attention to the resolution of calculation and the level at which basic calculations are carried out.Farm and year are typical resolution for environmental indicators.This should not be confused with the extent, i.e. the whole area, (e.g. the region), or time span, (e.g. the crop rotation), covered by the indicators calculation (Purtauf et al., 2005).
Another aspect to consider is the differentiation between the system itself and the encompassing systems that can be separated between the local and the global ones (Fig. 1).This refers respectively to "on-site" (on the system) and "off-site" issues (outside the system), (Smith and McDonald, 1998)."On-site" issues (e.g.soil quality of farm fields) are linked to the sustainability of the studied system (e.g.farm) while "off-site" issues concern the encompassing system, (e.g.region where the farm is located), or the society as a whole (Alkan Olsson et al., 2009).The former refers to sustainability of agriculture itself, the latter may be considered as the contribution of agricultural systems to sustainable development.For the latter, local and global issues can be distinguished.In any case, in the perspective of sustainability, a balanced choice between direct issues of the system, and contribution to local and global issues should be done (Alkan Olsson et al., 2009).
Last but not least, means and resources should be assessed before selecting a method.It includes budget, time, data availability, etc. to be sure that they meet the requirements of a selected method.international level the well-known Pressure/State/Response (PSR) and Driving-force/Pressure/State/Impact/Response (DPSIR) frameworks were inspired by the cause-effect chain.They were developed to ascertain the relevance of environmental indicators for human activities and their consequences at national level.These frameworks are extended by some authors to lower scales (e.g.Maurizi and Verrel, 2002) in spite of criticisms formulated by different authors (Niemeijer and de Groot, 2008).One major drawback is the impression of linearity between pressure, state and impact given by the framework, whereas the reality is more complex and closer to a causal network than to a chain.Another flaw is the ambiguity of the item, for instance pressure.Behind, you can find several types of indicators (see below).For example, pressure encompasses emission indicators which can be measured (e.g.nitrogen content at bottom of the root zone, measured by ceramic cup) or model output (from field leaching model), as well as simple indicators based on information from farmers' management data (e.g.amount of nitrogen input).

Different types of indicators
Several authors made the difference between (i) meansbased indicators (van der Werf and Petit, 2002), or actionoriented indicators (Braband et al., 2003)  Effect indicators may refer more precisely to different stages on the cause-effect chain: emission, state or impact (Bockstaller et al., 2008).As shown on Figure 2, those types do not show same qualities.Indicators belong to the groups of causal indicators and result in most cases in a poor predictive quality whereas measurement indicators may provide more precise information about the state or the impact, without providing information on the causes.Predictive indicators are useful for ex ante assessment and to relate effect to cause (Bockstaller et al., 2008).In any case, all those types have their utility, causal indicator to highlight changes in management or environment sensitivity, measured effect indicators for monitoring, predictive indicators for analyzing cause-effect relations in order to improve the system.

Example of indicators and of evaluation methods
In this section we illustrate the diversity of indicators for two major environmental issues: impact of nitrogen management and biodiversity.We also describe different evaluation methods based on a set of indictors to highlight their variability.

Nitrogen indicators
The typology presented in Figure 3 results from the analysis of a database of 1464 environmental indicators issued from 112 methods or reviews on indicators.The database was created by Rosnoblet et al. (2006) and extended and analyzed by Schneller et al. (2013b).The whole variability presented in Figure 3 ranges from causal indicators based on management data, like the manure storage capacity, to measured effect indicators of water quality with in-between effect predictive indicators based on operational models.The category "nitrogen balance" is the most popular with more than 30 proposals.Those could be differentiated between farm-gate, soil surface and soil system budgets (Oenema et al., 2003).Although nitrogen balance is recognized as the most commonly used indicator to assess nitrogen management (Langeveld et al., 2007), its predictive quality of nitrogen losses, especially nitrate leaching remains questionable, especially in the case of calculations with annual data in situations with low surplus (Oenema et al., 2005).

Biodiversity indicators
Since the 80s, a large number of direct measured indictors for biodiversity has been proposed in the literature and extensively discussed by some authors (e.g.Lindenmayer and Likens, 2011).Indicators based on species diversity and/or abundance among a given taxon or several taxa (e.g.birds, plants, carabid beetles, etc.) are the most commonly used at different scales, from field to national level.Many proposals also exist for causal or indirect indicators.Among the 91 indicators listed for agriculture by Delbaere (2003) more than half belongs to this type.When considering the general model explaining biodiversity in farmland (Le Roux et al., 2008) they can be in two groups: (i) indicators related to management of farmland like the percentage of semi-natural area and, (ii) indicators addressing cropping practices, which can be expressed in amount of inputs per area unit or in percentage of area disturbed by fertilizer, pesticides, irrigation, tillage.Both groups can be expressed at different scales.
Contrary to the two previous types of indicators, examples of predictive effect indicators according to the typology in Figure 2 are less numerous.Table 1 provides an overview of initiatives based on an operational model in arable farming.Most of the methods belong to a multi-criteria assessment method.SALCAbd (Jeanneret et al., 2014) was developed to complete the SALCA method based on Life Cycle Analysis although it only deals with direct effects at field and not upstream or offstream indirect impacts (see Sect. 3).Outputs are in form of a probability of presence or a decreased of number of species, D102, page 5 of 12 Dossier C. Bockstaller et al.: OCL 2015, 22 (1) D102  or in form of risk or impact scores.Whereas some models tackle in an explicit way a broad number of species, for plants (Sanderson et al., 1995) or several taxa (Butler et al., 2009), most of them focus on a few number of species or few taxa without explicit information on these.

Examples of different French evaluation methods of sustainability
Providing an exhaustive overview of the "explosion" of evaluation methods in the last decades is totally out of the scope of the article.The reader should refer to synthesis like this of Rosnoblet et al. (2006) or Singh et al. (2012).Table 2 gives an overview of six French evaluation methods of sustainability in agriculture.The number of indicators for each method is higher than 30 with highest number for DAESE and EVAD.Strictly speaking, DAESE and EVAD are not evaluation method ready to implement by the end-users but offers respectively a broad list of indicators for the first one, and an indicator list and a methodological framework for the second.At farm level, causal indicators based on management variables are implemented to make the method easy to use with farmers.MASC and DEXiPM have been developed to be implemented in research work to evaluate ex ante innovative cropping systems during the design phase and to help to select and improve most performant systems for experimentation.MASC is based on quantitative predictive effect indicators (see Fig. 2) and can also be used to evaluate actual cropping systems (ex post evaluation).DEXiPM is based on qualitative causal indicators which are aggregated to make qualitative predictive effect indicators.This makes the evaluation work much faster than in MASC in spite of higher number of basic indicators.For some methods like IDEA, aggregation is based on a sum of scores that is questioned by many authors (Bockstaller et al., 2008).MASC and DEXiPM decision trees are implemented thanks to the DEXi software (Bohanec et al., 2008).This tool allows the design of such qualitative decision trees using "if then" rules and input variables organized in classes.

Discussion
The clarification of preliminary choices described in Section 3 is important to guide the user in his selection of an evaluation method, to avoid him a contingent selection led by availability of a method in his organization or surroundings.Indeed, such a choice can lead to the use of a non-adapted method that does not meet his needs or his means or even more to biased evaluation of sustainability.In any case, the great number of indicators within some themes like nitrogen, as well as the variety of evaluation methods makes the preliminary step of clarification essential to avoid a randomly approach equivalent to a lottery.Consequently, some authors propose a list of criteria to select the "best" method (Bockstaller et al., 2009;Feschet and Lairez, 2015;Niemeijer and de Groot, 2008), or even more an interactive decision-aid tool such the PLAGE web platform (Surleau-Chambenoit et al., 2013).
However, for some themes, the number of indicators may be low.This is particularly true for predictive effect indicators based on operational models.Such indicators require an important design work integrating knowledge on the effect of farmer management in interaction with soil, climate variables to give an output which expresses an effect which can be linked to emissions, or the state and furthermore the impact like in LCA.Topics like biodiversity (see Sect. 5.2), soil compaction, nuisance due to noises and odours present gaps for predictive effect indicators.A new challenge will be to evaluate ecosystemic services in such a predictive way, from management practices to biodiversity and from biodiversity to ecosystemic services.The second step is addressed, for example, by an indicator assessing flower pollination value of floral diversity in field margin (Ricou et al., 2014).
In any case, users need also to know the quality of the information delivered by an indicator, especially when it is used to evaluate effect on an issue of sustainability like an environmental impact.Due to many simplifications, a direct correlation can rarely be expected excepted for a broad range of landscape conditions.Such correlations were for example pointed out between diversity within taxon (birds, bees, etc.) and causal indicators based on management variable like the nitrogen input or percentage of semi-natural area (Billeter et al., 2008).Specific tests to identify more complex relations have been also proposed (Bockstaller et al. (2008).In every case, this raises the question of uncertainty linked to indicator results.For an indicator based on the nitrogen balance multiplied by a coefficient (see Fig. 3), Mertens and Huwe (2002) handled uncertainty of data by implementing an approach based on fuzzy logic so that no unique value but an interval is given to the user.
Many indicators are available at field and farm level as it comes out from the overview given in this article.However, for example, water quality indicators should be used at the scale of the water catchment or for a landscape.Emissions can be assessed at lower scale of cropping and farming systems.For indicator assessing emissions, results can be upscaled by aggregation of results obtained by calculation of an average value at higher scale weighted by the size or the number of entity at lower scale.Such aggregation at higher scale like a nation is not relevant for local impact, e.g.water quality, erosion, whereas it is possible for global impact, such as greenhouse gases.Upscaling requires some statistical skills for data management but must also integrate new processes (Stein et al., 2001) and new environmental components (e.g.non cropped area).In any case, even without an upscaling procedure, the possibility to work at a fine resolution (e.g.field) for large extent (e.g. a water catchment) is an important challenge for agronomists to enable to work on realistic scenarios of management evolution, allowing finer description of cropping systems than only the type of crop and average fertilizer rates (Leenhardt et al., 2010).
As already mentioned, sustainability is by nature a multidimensional issue which addresses a set of criteria which can be simply organized in a list or in a more complex framework.A question rises sooner or later concerning the necessity of an aggregation to facilitate the interpretation of set of results, especially when more than 30 indicators are used (see Tab. 2).

Dossier
The relevance of composite aggregation (i.e. of indicators addressing totally different issues) is often discussed because of the loss of information but also due to the methodological problems it raises.A major problem is "adding apple and pear" in the case of composite indicators which can appear in scoring method (Rigby et al., 2001).Several method are available to avoid this problem like the normalization technique in monetary unit or physic unit, the multivariate approach, or decision trees based -or not -on fuzzy logic (Bockstaller et al., 2008).The DEXi software tool (Bohanec et al., 2008) makes the design of decision tree (without fuzzy logic) quite easy but remains totally qualitative.For complex trees, special attention should be therefore put on the sensitivity of the aggregated indicator to the variability of basic ones (Carpani et al., 2012).Another flaw is a certain lack of transparency when the automatic weighting procedure ot the software is used.A totally different approach is the use of multi-criteria methods based on an outranking (Cinelli et al., 2014) which has the default of being based on relative comparisons and not on an absolute assessment.More recently another approach addressing, in a transparent way, compensations between indicators has been proposed (van Asselt et al., 2014).In every case, we advise to use both aggregated and individual indicators, the former to compare systems or to select performant ones, the latter to to identify weak and strong points.

Fig. 1 .
Fig. 1.Definition of system and related sustainability issues.
using information on farmers' practices or other causal variables; and (ii) effect-based indicators or result-oriented indicators, based on an assessment of the effect at different stages of the causeeffect chain.Concerning biodiversity, some authors also distinguished between indirect (means-based) and direct (effectbased) indicators.However, observing that these classes still cover indicators totally different like measurement or model output, Bockstaller et al. (2011) and, Feschet and Lairez (2015) proposed another typology taking into account the nature and structure of the indicators.Here we propose an adaptation of the former based on the following classes: (i) Causal indicator based on a causal variable or a simple combination of variables of same nature (sum, product, ratio).Causal indicators can be based on management variables or environmental variables (soil, climate, etc.).For the former, the term "means" is relevant but not for the latter; (ii) Predictive effect indicators based on model output that can be operational (with a reduced and available number of input variables) or complex (from the research point of view, without considering the number and availability of input data); (iii) Measured effect indicators based on field assessment or observation.

Contribution to global issue
Different typologies were proposed to categorize the broad variety of indicators proposed in the last decades.At the

Table 1 .
Examples of predictive biodiversity indicators with their main characteristics (adapted fromBockstaller et al.,