Die Ergebnisse können Gästen nur in Auswahl angezeigt werden. Bitte loggen Sie sich für Vollzugriff ein: Login

Treffer: Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study.

Title:

Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study.

Authors:

Bayram, Ali, Menekse Dalveren, Gonca Gokce, Derawi, Mohammad

Source:

Applied Sciences (2076-3417); Sep2025, Vol. 15 Issue 18, p9907, 17p

Subject Terms:

ARTIFICIAL intelligence, PYTHON programming language, COMPARATIVE studies, EVALUATION methodology, BENCHMARK problems (Computer science), GENERATIVE pre-trained transformers, MACHINE learning, COMPUTER software development

Database:

Complementary Index

Weitere Informationen

This study conducts a comprehensive comparative analysis of six contemporary artificial intelligence models for Python code generation using the HumanEval benchmark. The evaluated models include GPT-3.5 Turbo, GPT-4 Omni, Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude Sonnet 4, and Claude Opus 4. A total of 164 Python programming problems were utilized to assess model performance through a multi-faceted methodology incorporating automated functional correctness evaluation via the Pass@1 metric, cyclomatic complexity analysis, maintainability index calculations, and lines-of-code assessment. The results indicate that Claude Sonnet 4 achieved the highest performance with a success rate of 95.1%, followed closely by Claude Opus 4 at 94.5%. Across all metrics, models developed by Anthropic Claude consistently outperformed those developed by OpenAI GPT by margins exceeding 20%. Statistical analysis further confirmed the existence of significant differences between the model families (p < 0.001). Anthropic Claude models were observed to generate more sophisticated and maintainable solutions with superior syntactic accuracy. In contrast, OpenAI GPT models tended to adopt simpler strategies but exhibited notable limitations in terms of reliability. These findings offer evidence-based insights to guide the selection of AI-powered coding assistants in professional software development contexts. [ABSTRACT FROM AUTHOR]

Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Treffer: Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study.

Weitere Informationen

Links

Zusatz-Funktionen