Treffer: A Conversational Large-Language-Model Tutor that Accelerates Machine-Learning Method Development in Routine Bioanalytical Workflows.

Title:
A Conversational Large-Language-Model Tutor that Accelerates Machine-Learning Method Development in Routine Bioanalytical Workflows.
Authors:
Le ATH; Department of Chemistry and Centre for Research on Biomolecular Interactions, York University, 4700 Keele Street, Toronto, M3J 1P3, Ontario, Canada., Shvekher T; Department of Chemistry and Centre for Research on Biomolecular Interactions, York University, 4700 Keele Street, Toronto, M3J 1P3, Ontario, Canada., Nguyen L; Department of Chemistry and Centre for Research on Biomolecular Interactions, York University, 4700 Keele Street, Toronto, M3J 1P3, Ontario, Canada., Krylov SN; Department of Chemistry and Centre for Research on Biomolecular Interactions, York University, 4700 Keele Street, Toronto, M3J 1P3, Ontario, Canada.
Source:
Chembiochem : a European journal of chemical biology [Chembiochem] 2025 Nov 08; Vol. 26 (21), pp. e202500678. Date of Electronic Publication: 2025 Sep 29.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Wiley-VCH Verlag Country of Publication: Germany NLM ID: 100937360 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1439-7633 (Electronic) Linking ISSN: 14394227 NLM ISO Abbreviation: Chembiochem Subsets: MEDLINE
Imprint Name(s):
Original Publication: Weinheim, Germany : Wiley-VCH Verlag, c2000-
References:
Patterns (N Y). 2025 May 08;6(6):101260. (PMID: 40575123)
JMIR Med Inform. 2024 Apr 8;12:e55318. (PMID: 38587879)
J Chem Inf Model. 2021 Jul 26;61(7):3197-3212. (PMID: 34264069)
Chem Sci. 2023 Nov 22;14(48):14003-14019. (PMID: 38098730)
Front Genet. 2022 Jan 27;13:824451. (PMID: 35154283)
Anal Chem. 2025 Apr 08;97(13):7352-7358. (PMID: 40146944)
Small Methods. 2024 Dec;8(12):e2400305. (PMID: 38682615)
Biotechnol Adv. 2021 Jul-Aug;49:107739. (PMID: 33794304)
Int J Surg. 2025 Jan 01;111(1):1669-1670. (PMID: 39041947)
Nat Commun. 2025 Apr 02;16(1):3165. (PMID: 40175414)
J Hum Genet. 2024 Oct;69(10):487-497. (PMID: 38424184)
Nat Commun. 2019 Dec 20;10(1):5811. (PMID: 31862874)
Bioengineering (Basel). 2023 Jul 27;10(8):. (PMID: 37627775)
BMJ. 2021 Oct 20;375:n2281. (PMID: 34670780)
Analyst. 2011 Apr 21;136(8):1703-12. (PMID: 21350755)
Chembiochem. 2025 Nov 8;26(21):e202500678. (PMID: 41021828)
NPJ Digit Med. 2024 Feb 20;7(1):41. (PMID: 38378899)
Inf Fusion. 2019 Oct;50:71-91. (PMID: 30467459)
Anal Chem. 2025 Mar 4;97(8):4461-4472. (PMID: 39972614)
Commun Med (Lond). 2022 Jul 6;2:78. (PMID: 35814295)
Entropy (Basel). 2023 Jun 01;25(6):. (PMID: 37372232)
Genome Biol. 2022 Mar 25;23(1):83. (PMID: 35337374)
Grant Information:
Grant RGPIN-2022-04563 Natural Sciences and Engineering Research Council of Canada; S.N.K and York University
Contributed Indexing:
Keywords: generative AI in biochemical science; machine learning education tool; machine learning in biochemical science; machine learning model design; prompt engineering
Entry Date(s):
Date Created: 20250929 Date Completed: 20251112 Latest Revision: 20251112
Update Code:
20251113
PubMed Central ID:
PMC12596919
DOI:
10.1002/cbic.202500678
PMID:
41021828
Database:
MEDLINE

Weitere Informationen

As machine learning (ML) becomes increasingly relevant in experimental chemistry, many scientists face barriers to adoption due to limited training in ML. While AutoML platforms offer powerful capabilities, they lack the instructional scaffolding needed by users without an ML background. To address this gap, a lightweight, conversational assistant is presented that guides users through ML workflow design using plain-language dialog. Powered by OpenAI's GPT-4o and deployed via a Gradio interface, the assistant operates under a structured system prompt that simulates pedagogical reasoning. It behaves like a domain-specific tutor: helping users define ML goals, assess data structure, select models, evaluate metrics, and generate annotated Python code. A complete documentation of the development process is provided, allowing researchers to adapt the system for other domains. Herein, its utility is demonstrated in two representative case studies: 1) image classification of lateral flow immunoassay test strips for diagnostic readout; and 2) regression-based prediction of liquid chromatography-mass spectrometry retention times from molecular descriptors for small molecules. In both cases, lab members with no ML experience successfully developed working models guided solely by the assistant. By lowering the barrier to ML adoption in data-rich analytical workflows, this system offers a customizable workflow for building domain-specific assistants across experimental science.
(© 2025 The Author(s). ChemBioChem published by Wiley‐VCH GmbH.)