Result: ChatGPT as a Semantic Engineering Assistant: Lessons from Ontology Design in the Agricultural Biodiversity Domain
collection:INRAE
collection:INRAEOCCITANIEMONTPELLIER
collection:MISTEA
collection:MATHNUM
collection:RESEAU-EAU
collection:INSTITUT-AGRO
collection:TEST-MATHNUM
URL: http://creativecommons.org/licenses/by/
Further Information
Modeling species names in biodiversity ontologies is particularly difficult in multilingual contexts, where semantic conflation often occurs. A good example is the common name "pimenta." In Brazilian Portuguese, experts usually refer to Capsicum spp. (chili peppers), while its direct translation “pepper” in English often denotes Piper nigrum (black pepper) (Soares et al. 2025a). In Brazilian markets, however, Piper nigrum is more accurately associated with “pimenta-do-reino" (“pimenta-negra”). This issue was observed on Wikipedia, when translating the Portuguese page for “pimenta” into English, the entry switches from Capsicum spp. to black pepper ( Piper nigrum ), showing how easily semantic drift can appear in multilingual data modeling. The correct association between common names used in the agricultural market with species would be a way to avoid the misunderstanding of these cultural differences. However, another challenge in vocabulary management emerges, which is how to manage species names in ontologies to keep them updated as the taxonomy itself updates. Some agriculturally controlled vocabularies, such as Agrotermos (Telles et al. 2024) lack automated mechanisms for updating taxonomic classifications. For example, Prochilodus cearensis , Prochilodus scrofa , and Prochilodus margravii are all listed in Agrotermos as preferred terms, i.e., the authorized, standard term selected to represent a concept in a controlled vocabulary, while according to the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy (GBIF Secretariat 2023) these names are synonyms, as shown in Table 1. When developing the Agricultural Product Types Ontology (APTO), which was designed to represent products traded in Brazilian agricultural markets based on Agrotermos and AGROVOC, we proposed two approaches using generative AI, specifically OpenAI's ChatGPT-4, as a semantic engineering assistant to automate the inclusion of scientific names in the ontology: Prompt-based queries with a plugin accessing the GBIF API A ChatGPT-generated Python script that converted GBIF taxonomy data into Web Ontology Language (OWL) format Prompt-based queries with a plugin accessing the GBIF API A ChatGPT-generated Python script that converted GBIF taxonomy data into Web Ontology Language (OWL) format These AI-supported methods automated the construction of APTO’s “Organism” module, integrating taxonomic hierarchies and managing synonyms. ChatGPT effectively identified synonymy (e.g., see Table 1) and reduced manual labor in ontology development. The first approach is no longer reproducible since OpenAI has replaced plugins by GPTs. As such, we are currently developing a GPT named Taxonomy OWLizer 2.0*1, which is an evolution of the first approach described in that paper. Concerns about scalability, reproducibility, and hallucinations (false, made-up information) remain, highlighting the need for expert oversight throughout the process. When ChatGPT was used without API access, hallucinations appeared more frequently. For instance, when asked to check a list of plant species names for typos, it incorrectly suggested that Euterpe edulis was a synonym of Euterpe oleracea , even though both are recognized as distinct species in widely used catalogues such as the GBIF Backbone Taxonomy (Soares et al. 2025a). This case study demonstrates that generative AI can support but not yet replace human-led ontology development. It also emphasizes AI’s potential contribution to biodiversity informatics, particularly for managing evolving and multilingual vocabularies. All tools and source code related to our work are archived on Zenodo (Soares et al. 2025b). Detailed protocols are provided in Soares et al. 2025a.