In this section, we first present results for spice-herb, phytochemical, and indications bipartite networks, systematically analyzing the associations between health indications and phytochemicals. Second, we present results on spice usage in different Indian regional cuisines using recipe corpora, analyzing the relationships between cuisine and health indications, and comparing the minimum number of spices needed to cover a spectrum of indications using copy-mutate models.

Network analysis of spices & herbs, phytochemicals, and health indications

Two bipartite networks were created: one between spices and phytochemicals, and another between spices and health indications. From these two networks, a third bipartite network was constructed, linking indications and phytochemicals (see Fig. 1a). The first three sub-sections provide a descriptive analysis of the three bipartite networks, and the fourth section makes specific predictions of the indication-phytochemical associations.

A spice-indication bipartite network on two sets of nodes: (i) 1094 spices and herbs and (ii) 1597 medical indications was first built. Datasets were obtained from the Handbook of Medicinal Herbs37 and the Handbook of Medicinal Spices38, providing extensive information about herbs and spices and their associated medical indications. Next, we created bipartite network projections where two nodes representing spices and herbs are connected if they share at least one indication. The Wakita–Tsurumi algorithm39 was applied to this bipartite projection to detect clusters of spices and herbs. For ease of visualization, we used a backbone extraction method40,41 to identify statistically significant edges as shown in Fig. 2a. The central role of garlic is quite evident. To delineate the clusters, we extract bar plots of the prevalence scores of indications (see “Prevalence score” section) in each cluster (see Fig. 2b). Notice that indications belonging to different disease categories, such as respiratory ailments, gastrointestinal disorders, infectious diseases, musculoskeletal conditions, and various forms of cancer, have the highest prevalence scores across all clusters.

Fig. 2: Spice and herb association network with indication clustering.figure 2

a Backbone network visualizing connections between various spices and herbs. Each edge represents an association based on a common indication between herbs and spices. Two spices and herbs are connected if they share a common indication. The node color represents the cluster obtained from the Wakita–Tsurumi algorithm. b The bar plots display the prevalent indications associated with each cluster of spices and herbs.

In Fig. 2b, cancer is found to be the most prevalent indication in cluster 2, represented by onion (Allium cepa) and opium poppy (Papaver somniferum), as well as cluster 3, represented by thyme (Thymus vulgaris) and green or black tea (Camellia sinensis), suggesting they contain phytochemicals that are beneficial in cancer prevention and management. Respiratory diseases, including asthma, mucososis, cough, and bronchitis, are the most prevalent in cluster 7, represented by banana (Musa spp.) and peppermint (Mentha × piperita), and cluster 8, represented by garlic (Allium sativum) and black pepper (Piper nigrum). Most of the clusters have a high association with at least one gastrointestinal disease. Roughly 80% of the spices in cluster 6 (refer to Fig. 2b), including basil (Ocimum tenuiflorum) and vervain (Verbena officinalis), are linked to alleviating constipation. Cluster 4 has a strong association with gastrosis and hepatosis, whereas cluster 8 is strongly associated with hepatosis and constipation. On the level of individual indications, pain, cough, and diarrhea are covered respectively by almost all spices within cluster 5 (represented by licorice (Glycyrrhiza glabra) and golden seal (Hydrastis canadensis)), cluster 7 (represented by banana and peppermint), and cluster 8 (represented by garlic and black pepper). This structured approach allows us to identify not just individual spices with therapeutic potential but also groups of spices that collectively have a range of health indications.

A second spice-phytochemical bipartite network between (i) 742 spices and herbs and (ii) 2993 bioactive phytochemicals was obtained from the Duke Phytochemical Database42 to explore relationships between spices and their constituent phytochemicals. Figure 3 shows the projection graph on phytochemicals clustered using the Wakita–Tsurumi algorithm. The projection graph on spices and herbs is provided as Supplementary Fig. 1 in Supplementary Section 1. The blue cluster in Fig. 3 on the right primarily consists of terpenes, components of essential oils derived from plants that possess anti-bacterial and anti-inflammatory properties. The light green cluster contains mostly antioxidants, including vanillic acid, quercetin, p-coumaric acid, and caffeic acid. The yellow cluster comprises phytosterols such as campesterol, stigmasterol, and campesterol, which are plant sterols beneficial for cardiovascular health. The Food and Drug Administration (FDA) has approved that foods containing at least 0.65 g of plant sterol esters per serving, consumed twice daily with meals for a total daily intake of at least 1.3 g, may reduce the risk of heart disease23. The bottom brown cluster consists of major essential amino acids phenylalanine, methionine, leucine, histidine, and lysine; the absence of these in the diet can lead to decreased immunity, muscle loss, and even mental dysfunction. The dark blue cluster contains a small group of polyunsaturated fatty acids (PUFAs), including linoleic acid, palmitic acid, stearic acid, and oleic acid, commonly found in oils. PUFAs boost immunity in low amounts, but consuming high amounts of PUFAs with starch can lead to diseases, particularly heart disease and weight gain.

Fig. 3: Phytochemical association network based on shared spices and herbs.figure 3

A unipartite backbone network visualizing connections between various phytochemicals. Each edge represents an association based on common spices and herbs between these phytochemicals. Two phytochemicals are connected if they share a common spice/herb. The node color represents the cluster obtained from the Wakita–Tsurumi algorithm.

To understand the therapeutic properties of spices, we aimed to identify the constituent phytochemicals that contribute to their disease associations using a third indication-phytochemical bipartite network (refer to Fig. 4). We defined a specificity score to quantify the uniqueness of phytochemicals and their associations with indications (see “Specificity score” section). Notice in Fig. 4a that in endocrine diseases, high specificity was observed for dianethole and p-anisaldehyde with andropause. These phytochemicals are present in fennel and anise and are effective against endocrine diseases and other types of diseases43. The efficacy of 1,2,6-tri-o-galloyl-beta-d-glucose—found in Cornus officinalis—against protein glycation has been demonstrated, making it effective for reducing blood pressure44. Other molecules in the blood disease category did not show high specificity values. Capsaicin and its precursor, vanillylamine, are useful as analgesics and are used in ointments for musculoskeletal pain management, which is evident in the musculoskeletal diseases specificity plots45. Other compounds with high specificity in this category include capsorubin and capsanthin, carotenoids found in red bell peppers that are used for pain management46, as well as dihydrocapsaicin, a compound from the same capsaicin family. In the metabolic disease specificity plots in Fig. 4d, high values were observed for vitamin K, glucosamine (found in many plants, including aloe vera and Cannabis sativa), daidzein (found in soybean), coumestrol (found in soybean, spinach, Brussels sprouts, and legumes), and imperatorin (found in Ammi majus and Angelica archangelica). Researchers have found that these molecules are effective against fatty liver, steatosis, and hyperuricemia47,48,49.

Fig. 4: Heatmaps showing the specificity scores obtained for phytochemical-indication pairs for six disease categories.figure 4

a endocrine diseases b hematological diseases c musculoskeletal d metabolic diseases e neurological diseases f cardiovascular diseases.

Galanthamine, found in Galanthus nivalis and other sources, showed high specificity for myasthenia gravis and Alzheimer’s disease in the neurological diseases category (refer to Fig. 4e)50. Similarly, high specificity scores were obtained for tigloidine, periplocymarin, cymarin, cymarol, strophanthidin, and tropine for neurological diseases like Parkinson’s and neurodystonia51,52. The high specificity scores for vanillylamine, capsorubin, and other capsaicin family molecules for cluster headaches and diabetic neuropathy are noteworthy and can be observed in Fig. 4e. In the cardiovascular disease category (Fig. 4f), most molecules were non-specific, with some high-scoring specific molecules such as asarinin found in sesame, nitidine found in Zanthoxylum americanum, and periplocymarin found in Strophanthus hispidus53,54. Some molecules, such as trans-isoasarone found in Acorus calamus, show antifungal properties but are toxic and difficult to use for therapeutic purposes55.

To further assess the capability of the specificity score in discovering new associations and validating known relationships, we conduct a systematic analysis. We focus on the top-100 inferred indication-phytochemical relationships in terms of specificity scores. For each inference, we first compared the results against the indication-chemical relationships provided by the Comparative Toxicogenomics Database (CTD)56, a reliable public database containing both curated and inferred relationships. If our inferred relationships were not found in CTD, we manually searched for supporting evidence in other literature using Google Scholar. Out of the 100 top inferences (see Fig. 5), we can validate 60 indication-phytochemical relations through CTD or literature. Among these 60 inferences, 20 could be inferred through gene-chemical interactions and gene-disease associations according to CTD, but have not been experimentally proven yet. The remaining 40 inferences were confirmed through experimental literature. Thus, 20 of our top inferences are new discoveries with high confidence that have also been predicted in CTD through alternative means, 40 are correct predictions backed by experimental scientific evidence, and the remaining 40 are new hypotheses that can be tested with molecular experiments. Indeed, a key use of our specificity score method is to distill novel scientific hypotheses from traditional knowledge of herbs and spices.

Fig. 5: Validation of indication-phytochemical relationships.figure 5

The string map represents the number and nature of relationships extracted from the top-100 indication-phytochemical relationships, categorized as Inferred (IF), Experimentally Verified (EV), and New Hypotheses (NH), and sorted based on specificity scores.

Understanding spice usage and their health implications

We use public Indian recipe corpora obtained from Sanjeev Kapoor57 and Tarla Dalal58 websites, comprising 18 regional cuisines, to understand spice usage patterns across India and their association with disease categories.

To understand the similarity between different regional cuisines of India, we calculated the usage frequencies of different spices in each cuisine (see “Usage and authenticity of spices” section). The principal component analysis (PCA) bi-plot (Fig. 6a) of Indian cuisines and spices usage frequencies, with spices as factors projected on the principal components (PCs), reveals a clear North to South geographical orientation. The plot highlights the significant role of coconut and curry leaves in South Indian cuisines. The dendrogram shows the splitting of South Indian cuisines into distinct regional cuisines, with Andhra and Kerala cuisines exhibiting higher similarity. Gujarati and Jain cuisines, characterized by extensive use of asafoetida (Ferula assa-foetida) and the absence of onion and garlic, cluster together and share similarities with Maharashtrian and South Indian cuisines in terms of spice usage.

Fig. 6: Regional cuisine analysis of spice usage and indication coverage.figure 6

a PCA bi-plot obtained from the frequency of spice usage in different cuisines, representing regions as scores and spices as loadings. PC1 and PC2 account for 37.86% and 24.79% of the total variance, respectively. b Heatmap showing the authenticity of spices in Indian cuisines. The darker regions indicate more frequent use of certain spice pairs. c A cluster map was obtained by calculating the cosine similarities between the PCs of different regions. Cuisines that are closer together in the dendrogram have more similar spice usage profiles. d Cluster heatmap showing the indication coverage for the different regional cuisines of India.

The cluster map (Fig. 6c) provides insights into evolutionary relationships among Indian cuisines based on spice usage. The divergence in spice combinations across Indian cuisines may also be traced to early Vedic traditions and dietary norms59. For example, Brahmin communities often avoided onions and garlic—classified as tamasic foods—resulting in Jain, Gujarati, and some Maharashtrian cuisines embracing asafoetida (Ferula assa-foetida) as a substitute flavoring agent. This is reflected in their clustering in our spice usage analysis (Fig. 6c). Punjabi and Sindhi cuisines demonstrate a lineage to Kashmiri and Mughlai cuisines, as evident from their spice usage patterns. The similarity between Kashmiri and Mughlai cuisines can be attributed to their shared use of saffron (Crocus sativus), cardamom (Amomum subulatum), and clove (Syzygium aromaticum) (Fig. 6a). This lineage may be attributed to the historical spread of Mughal culinary practices across northern India during the 16th-18th centuries. The adoption of saffron, cardamom, and clove in these cuisines mirrors the emphasis on aromatic richness seen in royal Mughal kitchens, as documented in historical manuscripts such as the Ni’matnama and Ain-i-Akbari59. Similarly, Hyderabadi and Parsi cuisines show resemblance due to their pronounced use of garlic and onion. Coastal cuisines such as Goan, Hyderabadi, and Parsi also exhibit culinary patterns shaped by historical trade and colonization. The introduction of ingredients like chili, tomato, and vinegar during the Portuguese colonial era influenced dishes such as vindaloo and xacuti, which later diffused into regional adaptations59. The shared use of garlic and onion in Hyderabadi and Parsi cuisines further emphasizes these connections. These historical layers have shaped ingredient availability and regional taste preferences, preparation styles, and the symbolic role of spices in culinary identity. However, it is important to note that the similarities observed are based on spice usage data and require further evidence to corroborate the cultural or historical aspects of these connections.

Note that regional variation in spice usage across Indian cuisines may align with underlying genetic differences in taste perception. For instance, South Indian cuisines such as those from Kerala, Tamil Nadu, and Andhra Pradesh make extensive use of bitter-tasting ingredients like mustard seeds (Sinapis alba) and curry leaves (Murraya koenigii) (see Fig. 6a). This may be related to population-level variation in the CA6 gene, which affects bitter taste sensitivity through the rs2274333 polymorphism. The ancestral A allele is associated with higher gustatory sensitivity (supertasters), while the derived G allele is linked to reduced bitter perception (non-tasters). According to Prakriti et al.60, the A allele is more prevalent in western Indian populations, whereas northern and northeastern populations exhibit higher frequencies of the G allele. Although allele frequency data for southern India remains sparse, the strong presence of bitter ingredients in southern cuisines suggests a possible role for chemosensory adaptation. Similarly, cuisines such as Hyderabadi and Parsi, which are rich in garlic and onion, may reflect reduced sensitivity to pungent sulfur compounds mediated by TRPV1 polymorphisms. Mughlai, Kashmiri, and Punjabi cuisines—characterized by aromatic spices like saffron, cardamom, and clove—may correlate with population-level variation in olfactory receptor genes such as OR7D4. A similar genetic influence is evident in cilantro preference, where variants in the OR6A2 olfactory receptor gene (e.g., rs72921001) are associated with heightened perception of a soapy flavor in coriander leaf, contributing to population-level differences in its acceptance. These findings support a tentative link between chemosensory genotypes and regional spice practices. However, cuisine evolution is complex and shaped by historical, ecological, and cultural factors beyond genetics61,62,63.

Figure 6b presents a heatmap of the authentic spices, defined by their unique use in each regional Indian cuisine. While there is a substantial overlap in spice usage across cuisines, the analysis reveals several interesting observations, some well-known and others less so. The presence of asafoetida in Jain and Gujarati cuisines is well-established, as Jains and many Gujaratis exclude onion and garlic from their diet for religious reasons, but asafoetida contains di-allyl sulfur, the same pungent phytochemical as in garlic and onion, making it an ideal substitute64. As noted earlier, curry leaves and coconut are integral to South Indian cuisines. Mughlai and Hyderabadi cuisines also heavily feature cardamom and clove, while Kashmiri cuisine is uniquely characterized by the presence of saffron and fennel. A lesser-known fact is the use of peanuts as an authentic spice/herb in Maharashtrian cuisine, which is not widely recognized. These findings highlight the diversity and complexity of Indian cuisines, showcasing the interplay of regional preferences, religious influences, and unique spices that define the authentic flavors of each culinary tradition. The three most frequently used spices and the three most authentic spices across Indian cuisines, used to generate the culinary mappings in Fig. 6a, b are detailed in Supplementary Table 1 in Supplementary Information. Notably, chili (Capsicum annuum) is the most frequently used spice across all regional cuisines, reflecting its central role in Indian culinary practices.

Figure 6d presents a heatmap with hierarchical clustering of Indian cuisines based on their indication coverage. It shows that regional cuisines have better coverage for five disease categories: cancer, respiratory diseases, general symptoms, gastrointestinal diseases, and infectious diseases. Hyderabadi, Goan, Parsi, Punjabi, and Mughlai cuisines show a broader and stronger coverage of the indication spectrum than the other cuisines. The analysis reveals that Hyderabadi and Goan cuisines exhibit the highest scores for alleviating infectious diseases, followed closely by Parsi cuisine.

Each cuisine has a unique profile of herbs and spices, with some having combinations with greater disease mitigation, as observed in Mughlai and Hyderabadi cuisine. New fusion cuisines have emerged with increasing globalization as ingredients from different cultures are blended to create new recipes. Here, we study how well combinations of spices from culinary practice cover a spectrum of diseases, using a minimum set-cover algorithm to find the minimum set of spices required to cover a range of indications for each disease category. We then compare the disease coverage capability of recipes generated under four different settings: real settings (using recipes from Tarla Dalal and Sanjeev Kapoor) and three random settings. The random settings simulating culinary globalization include the uniform copy-mutate (U-CM) model, the frequency-conserved copy-mutate (FC-CM) model14,36, and the random uniform (RU) model (see “Random recipe generation” section). To ensure a fair comparison against the real recipes, each recipe in the random settings contained six spice ingredients, which corresponds to the median number of spices per recipe in the real recipe dataset. We generated 50 sets of 5636 recipes for each random model and used the mean size of the minimum recipe sets for comparison. Figure 7 compares the mean size of the minimum set of spices needed to cover health indications in both the random and the actual settings (obtained from the recipe datasets) for 12 different disease categories. For example, to comprehensively address gastrointestinal indications, traditional recipes frequently include cumin, ginger, and fennel, spices known to aid digestion. Conversely, under randomly generated conditions (e.g., the RU model), fewer spices, typically dominated by garlic and turmeric, achieve similar health coverage due to their multi-functional medicinal properties. Similarly, infectious diseases are typically covered by extensive spice combinations such as turmeric, garlic, and black pepper in Mughlai and Hyderabadi cuisines, highlighting their comprehensive therapeutic applications within culinary traditions. Note that the size of the minimum set of spices under FC-CM is close to that of the original recipe datasets, which can be attributed to the fact that it conserves the frequency of spices used. For most disease categories, fewer spices are needed to cover the spectrum of indications for U-CM and RU models. These randomly generated recipes require fewer spices to cover infectious, gastrointestinal, and cancer indications than traditional Indian cuisines. This efficiency may be due to spices like garlic that address multiple health concerns and are used extensively in various Indian regional cuisines. However, actual spice usage in cuisines is also influenced by flavor, ingredient interactions, availability, and cultural factors, not just health benefits. Going forward, it is of interest to explore mixing while preserving or enhancing the flavor of the recipes65.

Fig. 7: Minimum spice set distribution across disease categories via different generative models.figure 7

Each plot represents the distribution of the minimum number of spices obtained using the minimum set-cover algorithm for the three copy-mutate models (FC-CM, U-CM, RU) for 12 disease categories. The red line represents the size of the minimum set of spices obtained for the recipe dataset.

Write A Comment