Treffer: AGDES: a Python package and an approach to generating synthetic data for differential equation solving with LLMs.
Weitere Informationen
The rapid development of large language models (LLMs), including their successful application to solving mathematical problems requiring complex reasoning, presents a potential avenue for using LLMs in solving differential equations. While these equations are currently being solved successfully both numerically and via the symbolic approach, it is possible that fine-tuned LLMs, if they treat solving equations as text-to-text translation problems, could find analytical solutions for a broader range of equations, including those that currently have no known solution. However, to achieve high-quality fine-tuning of LLMs, it is essential to generate datasets comprising differential equation-solution pairs with a size that is considerably larger than that of published equation-solution pairs in reference books and textbooks. Consequently, the generation of datasets with synthetic data is needed. This paper introduces AGDES, a ready-to-use open-source Python package for constructing large datasets that contain differential equations and their solutions in LaTeX format. This package allows the generation of linear differential equations of the second and third order, linear inhomogeneous equations, polynomial equations, equations with separating variables, and their corresponding analytical solutions. To generate equations of second and third order and their solutions, we use the known theoretical relations between equation coefficients and the solution and vary the values of the coefficients to obtain synthetic data. The generation of polynomial equations and their solutions is based on the basic rules of differentiation and variation of coefficients. Finally, the generation of inhomogeneous equations and equations with separating variables and their solutions relies on the application of Python SymPy library and the variation of elementary functions in the right part of the equations. The novel aspects of the proposed tool are its speed, parameter tuning flexibility, and capacity to handle a diverse range of equations. Applying AGDES, we generated a dataset of 53,516 equation-solution pairs in LaTeX format, which can further be directly used for fine-tuning LLMs to solve differential equations. [ABSTRACT FROM AUTHOR]