Study population

The present study used observational data from the NutriNet-Santé study. The NutriNet-Santé study is an Internet-based cohort launched in May 200924. Its purpose is to study the determinants of diets, nutritional status, and physical activity as well as their associations with health. The participants, recruited on a voluntary basis, are adults living in France with an access to internet. Participants have to complete annual or biannual questionnaires on socioeconomic status, lifestyle, anthropometry, dietary intake and physical activity. Regularly, specific questionnaires are proposed. Gender, occupational status, income, place of residence, physical activity, and smoking habits are self-reported using validated questionnaires25. The NutriNet- Santé study is in line with the principles of the Helsinki Declaration26 and the protocol has been approved by both the INSERM Ethical Evaluation Committee (CEEI) (no. 0000388FWA00005831) and the National Committee for Information Technology and Freedom (CNIL) (nos. 908450 and 909216). Informed consent was obtained from all participants. The study is registered in ClinicalTrials.gov (NCT03335644).

Assessment of food consumption and protein intake in total and by food groups

Food consumption data were collected via an Organic Food Frequency Questionnaire (Org-FFQ) developed in 2014, including 264 organic and conventional food items27. In the present study, a total of 23 food groups were defined based on their protein content as follows: meat (including beef and pork), processed meat, poultry, seafood, eggs, milk, dairy (including all dairy products except for milk), fast food, sweetened and fatty foods (SFF), fat (including animal fat and margarine), dressing, potatoes, legumes, whole-grain products, cereals (including all cereals products), nuts, soya-based products (including also substitutes), vegetables, fruits, fruit juice, beverages (including all non-alcoholic beverages), oil (including vegetable oils) and alcohol (including all alcoholic beverages). Nutrient values were derived from the food composition table developed for the NutriNet-Santé study28. Detailed information on the composition of the food groups is provided in the legend of Fig. 1.

Environmental data

Environmental pressures, greenhouse gas (GHG) emissions (CO2-eq), Cumulative Energy Demand (MJ), Land use (m2), were assessed using the DIALECTE tool29. Developed by Solagro, this French diagnostic tool aims to evaluate the environmental performance of French farms using a comprehensive approach. The Life Cycle Assessment method was used on 60 raw agricultural products. The scope of the analysis was limited to the agricultural production stage but organic and conventional products were distinguished. Details are provided in Supplemental Material 1. In addition, the pReCiPe score, a synthetic impact indicator including the three indicators above, has been calculated30. To balance conflicting environmental indicators, the ReCiPe method considers both midpoint and endpoint measures. Developed in the Netherlands, this LCA method aligns the indicators to provide a comprehensive view30. It focuses on 18 indicators, three of which are oriented towards final impacts, including resource availability, human health and ecosystem diversity. In practice, some authors have found that the environmental impact of food products and diets can be assessed by measuring greenhouse gas emissions, primary energy consumption, and land occupation. These factors make up about 90% of the total environmental dimension of the ReCiPe model. To calculate the environmental impact of a food product or diet, one can use the partial ReCiPe score (pReCiPe), with normalization and weighting factors31.

The pReCiPe score is computed as follows:

$$pReCiPe = 0.0459 \times GHGe + 0.0025 \times CED + 0.0439 \times LO$$

with GHGe, in kg of CO2eq/d, CED, in MJ/d and LO, in m2/d. The highest the pReCiPe, the highest the environmental impact.

Nutritional quality data

Three dietary indexes were computed. The PANDiet (Diet Quality Index based on the Probability of Adequate Nutrient Intake) is a nutritional adequacy score based on the nutritional references values32,33. The PNNS-GS2 (Programme National Nutrition Santé-Guidelines Score 2) measures the adherence of individuals to the French dietary guidelines established by the High Council of Public Health in 201734. The cDQI (Comprehensive Diet Quality Index) aims to assess the quality of plant and animal foods consumed35. Further details are provided in Supplemental Material 2.

Health risk data

Health risk was assessed using a “Health Risk Score” (HRS) of the diet, computed using the distance to the Theoretical Minimum-Risk Exposure Level (TMREL), provided in the GBD study in 201912. It reflects the overall risk of death associated with the individual dietary pattern, resulted from a suboptimal consumption of each food group. The computation of the HRS, ranging from 0 to 1, is provided in the Supplemental Material 3.

Economic data

The economic data used were participants’ monthly income, and their estimated food expenditures for their whole diet and each food group.

Participants’ income was collected as part of the socio-economic status questionnaire, where each participant provided the income class corresponding to his/her monthly income. Income per consumption unit (C.U) were estimated using household composition and age of family members according to the INSEE procedure36. In the NutriNet-Santé study, the monetary cost of the diet (€/d) was calculated for each participant using prices (€/g) from several databases. Further details are provided in Supplemental Material 4.

Statistical analysis

Among the participants in the cohort NutriNet-Santé, 29, 210 individuals were selected for this study, with Org-FFQ data, no missing data for sociodemographic aspects (except for monthly income which is a non-mandatory question) and with available information on place of purchase. Those considered as under- or over-reporters for energy intake were excluded as previously published27. A flowchart is provided in Supplemental Figure 1.

Construction of the protein-source-typology and description of clusters

The contribution (in %) to total protein intake of the 23 food groups was calculated for each individual, to focus on the sources independently of the total intake. The typology aiming to identify groups of individuals with similar protein sources was built using a two-step procedure. First, a Principal Component Analysis37 was applied on the 23 protein contributing food groups (the list is available on Fig. 1). This multivariate data analysis method allowed to reduce the initial range of information by maximizing the variance. Nine dimensions were retained according to Kaiser criterion (eigenvalues > 1). Then, on the basis of the retained dimensions, an Ascending Hierarchical Classification (AHC) was performed with data preprocessing using the K-means algorithm reiterated 100 times. As this study used a large database, the complementary use of the k-means and AHC methods allowed to stabilize the solution. Further details are provided in Supplemental Material 5.

Description and comparison of clusters

The clusters were named according to the food groups contributing the most to the protein intake of each cluster compared to the whole sample. First, means (SD) of protein contribution of each food group were computed (%/day) for the whole sample. Then, as cluster potentially exhibits a different energy intake than the whole sample, energy-adjusted means of protein contributions of the 23 food groups (SEM) were calculated for each cluster, using ANCOVA models.

The identified clusters were described according to the socio-demographic characteristics reporting mean (SD) or % for continuous and categorical variables respectively. Means comparison across clusters was performed using Pearson’s Chi-square test for categorical variables and ANOVA test for continuous variables. For food groups consumption, mean (SD) were presented for the whole sample, and for each cluster, energy-adjusted mean of food group intakes (g/day) and standard error of the mean (SEM) were calculated, using ANCOVA models.

Percentages of total energy intake were calculated for macronutrients. For vitamins, minerals and fiber, each nutrient energy-adjusted intake was calculated based on the “residual method”38. Prevalence of adequate protein intake is computed as defined in the PANDiet score33.

To allow comparison of clusters to the whole sample in relative values for all indicators, standardized means were computed for the whole sample, corresponding to the mean that the whole sample would have if its energy intake was that of the cluster (\({{\varvec{o}}{\varvec{v}}{\varvec{e}}{\varvec{r}}{\varvec{a}}{\varvec{l}}{\varvec{l}}}_{{\varvec{i}}})\). Relative values as regards energy-adjusted indicator, were then calculated with the following formula:

$${\rm{Relative\, value \,of \,indicator}} _{i} \left( \% \right) = \frac{{ {\rm{Energy\, adjusted\, mean}}_{i} – {\rm{Standardized\, mean}}_{{overall_{i} }} }}{{\rm{Standardized \,mean}_{{overall_{i} }} }} \times 100$$

where i denotes clusters.

Multicriteria analysis

For each sustainability indicator considered, we calculated the mean (SD), and for each cluster, energy-adjusted means (SEM) were calculated via ANCOVA models. Comparison between clusters was based on relative values computed as defined above. A comparison of means across clusters was performed using ANCOVA models.

Economic analysis

The objective was to analyze both food and protein expenditure structure across clusters. The economic analysis included 27,244 of the 29,210 participants, for whom there were no missing income data (since the question was optional). The monthly income variable, modelled, as category was transformed into a numeric variable by considering the class center of the daily income category for each individual as previously done39 and converted as euros per day.

The expenditure structure analysis across clusters was conducted using a budget coefficient approach40. This approach makes comparable the share of food expenditure between individuals with different incomes and different diets40.

To do so, we first computed for each participant, the budget coefficients of both the overall diet and the food groups, using the following formulas:

$$\begin{aligned} & {\rm{Budget \,coefficient \,of \,the \,overall \,diet_{i}}} = \frac{\rm{Overall \,diet \,expenditure_{i} }}{{Income_{i} }} \times 100 \\ & {\rm{Budget \,coefficient \,of \,food \,group_{i,j}}} = \frac{\rm{Food \,group \,expenditure_{i,j} }}{\rm{Overall \,diet \,expenditure_{i} }} \times 100 \\ \end{aligned}$$

where i denotes individuals and j denotes food groups.

Insofar as we assume that the production mode (organic/conventional) affects food expenditure, the analysis was detailed by distinguishing expenditures allocated to organic products from those allocated to conventional products. To do so, budget coefficients of organic and conventional foods, for the overall diet and for each food group were computed for each individual. For the overall diet, budget coefficients of the overall diet by production mode were computed with respect to the overall diet budget. The budget coefficients of the food groups by production mode were calculated in relation to the overall diet budget allocated to foods from the corresponding production mode.

We defined the protein expenditure as the share of the food group expenditure allocated to the daily protein intake. It was calculated, for each food group and for each participant, using the following formula:

$${\text{Protein expenditure}}_{i} ,_{j} \left( euro \right) = \frac{{~{\text{Food expenditure}}_{i} ,_{j} ~\left( euro \right) \times ~{\text{Protein intake}}_{i} ,_{j} ~\left( g \right)}}{{{\text{Quantity consumed}}_{i} ,_{j} ~\left( g \right)~}}$$

where i denotes individuals and j denotes food groups.

Then, the “Total protein expenditure” was calculated for each participant by summing the protein expenditures for all food groups.

The budget coefficients of protein intake were then computed, using the following formulas:

$$\begin{aligned} & {\rm{Budget \;coefficient \;of \;protein}}\; {\text{intake}} {\rm{\; per\; food \;group_{i,j}}} = \frac{\rm{Protein \;expenditure_{i,j} }}{\rm{Overall \;diet \;expenditure_{i} }} \times 100 \\ & {\rm{Budget \;coefficient\; of\; the \;total\; protein}}\; {\text{int}} ake_{i} = \frac{\rm{Total\; Protein\; expenditure_{i} }}{\rm{Overall \;diet\; expenditure_{i} }} \times 100 \\ \end{aligned}$$

where i denotes individuals and j denotes food groups.

Afterwards, non-adjusted means (SD) were computed for all the calculated budget coefficients for the whole sample, and for each cluster, means and standard error of the mean (SEM) adjusted for energy intake were estimated using ANCOVA models. Comparison between clusters was based on relative percentage values computed using standardized means as defined above. Comparison of means across clusters was performed using the ANCOVA test.

Data management and statistical analyses were performed using RStudio software (RStudio, Version 1.4.1717, © 2009–2021 RStudio, PBC).

Write A Comment