Result: Detecting machine-generated texts with adaptive quantile regression

Title:

Detecting machine-generated texts with adaptive quantile regression

Source:

МОДЕЛИРОВАНИЕ, ОПТИМИЗАЦИЯ И ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ. 12

Publisher Information:

Voronezh Institute of High Technologies, 2024.

Publication Year:

2024

Subject Terms:

градиентный спуск, классификация текстов, text classification, квантильная регрессия, quantile regression, adaptive algorithm, numerical methods, mathematical modeling, численные методы, адаптивный алгоритм, gradient descent, математическое моделирование

Document Type:

Academic journal Article

Language:

Russian

ISSN:

2310-6018

DOI:

10.26102/2310-6018/2024.44.1.033

Accession Number:

edsair.doi...........fc3fc00874123a13d179ab37d4f4885c

Database:

OpenAIRE

Further Information

В работе рассматривается задача детектирования машинно-сгенерированных текстов при помощи различных инструментов построения регрессионных моделей – классической линейной регрессии, логистической регрессии и квантильной регрессии. Прогресс в области машинного обучения позволяет создавать все более реалистичные тексты, что открывает возможности для их недобросовестного использования. По мере того, как алгоритмы генерации текстов становятся более сложными, возрастает и сложность задачи детектирования таких текстов, что также требует применения более сложных методов математического моделирования и более эффективных численных методов. Рассматриваемый алгоритм адаптивной квантильной регрессии представляет собой инструмент, который позволяет строить модели с акцентом на различные квантили, что делает его особенно полезным для детектирования нетипичных значений, что может указывать на искусственную природу текстов. Также в работе представлено подробное описание исходного открытого набора данных для обучения моделей, представляющего собой сгенерированные тексты при помощи модели GhatGPT и случайные рукописные тексты c различных форумов, приведен анализ проведенных вычислительных экспериментов. Результаты исследования показывают высокую эффективность предложенного метода в данной прикладной области. This paper considers the problem of detecting machine-generated texts using various regression model building tools - classical linear regression, logistic regression and quantile regression. Advances in machine learning are creating increasingly realistic texts, which opens the door to misuse. As text generation algorithms become more sophisticated, the complexity of the task of detecting such texts increases, which also requires more sophisticated mathematical modeling methods and more efficient numerical techniques. The proposed adaptive quantile regression algorithm is a tool that allows building models with emphasis on different quantiles, which makes it particularly useful for detecting atypical values that may indicate the artificial nature of the texts. The paper also presents a detailed description of the initial open dataset for model training, which is a set of generated texts using the GhatGPT 3 model and random texts from various forums, and analyzes the computational experiments performed. The results show the high efficiency of the proposed method in this application domain. This paper considers the problem of detecting machine-generated texts using various regression model building tools - classical linear regression, logistic regression and quantile regression. Advances in machine learning are creating increasingly realistic texts, which opens the door to misuse. As text generation algorithms become more sophisticated, the complexity of the task of detecting such texts increases, which also requires more sophisticated mathematical modeling methods and more efficient numerical methods. The proposed adaptive quantile regression algorithm is a tool that allows building models with emphasis on different quantiles, which makes it particularly useful for detecting atypical values that may indicate the artificial nature of the texts. The paper also presents a detailed description of the initial open dataset for model training, which is a set of generated texts using the GhatGPT 3 model and random texts from various forums, and analyzes the computational experiments performed. The results show the high efficiency of the proposed method in this application domain.

Result: Detecting machine-generated texts with adaptive quantile regression

Further Information

Links

Additional functions