IngID: a framework for parsing and systematic reporting of ingredients used in commercially packaged foods

2021 
Abstract There is lack of information in the scientific literature on types of ingredients used in packaged foods. USDA’s Global Branded Food Products Database makes publicly available a compiled dataset of ingredient lists for over a quarter million commercial food products. This paper reports on the development of a framework for parsing and systematic reporting of ingredients used in commercially packaged foods (IngID) in the US and delineates the complexity and challenges of current ingredient lists, using baked products to illustrate. The major steps in the development of IngID prototype were 1) identifying top-selling baked products, 2) obtaining their ingredient lists, 3) parsing individual ingredients after several pre-processing steps as ingredient lists were inconsistent and varied, 4) building a thesaurus by assigning a preferred descriptor to equivalent terms such as synonyms and spelling errors, and 5) assigning broader terms such as flour, sweeteners. The current version of IngID includes 3 main files - an input Food details file, an output file of parsed text strings, and a thesaurus of ∼6500 parsed ingredients. IngID can help improve our understanding of commercial ingredients, characterizing foods in dimensions other than the traditional nutrient profiles, development of food ontology, computer programs and artificial intelligence tools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    3
    Citations
    NaN
    KQI
    []