Treffer: Proyecto de código abierto: desarrollo de una herramienta para la detección de contenido generado por IA en textos ; Open Source Project: Development of a tool for detecting AI-generated content in texts

Title:
Proyecto de código abierto: desarrollo de una herramienta para la detección de contenido generado por IA en textos ; Open Source Project: Development of a tool for detecting AI-generated content in texts
Contributors:
Cubillos Delgado, Alfonso
Publication Year:
2024
Document Type:
other/unknown material
File Description:
application/pdf
Language:
Spanish; Castilian
Relation:
Alshurafat, H., Al Shbail, M. O., Hamdan, A., Al-Dmour, A., & Ensour, W. (2024). Factors affecting accounting students’ misuse of chatgpt: an application of the fraud triangle theory. Journal of Financial Reporting and Accounting, 22(2). https://doi.org/10.1108/JFRA-04-2023-0182; Boulieris, P., Pavlopoulos, J., Xenos, A., & Vassalos, V. (2024). Fraud detection with natural language processing. Machine Learning, 113(8). https://doi.org/10.1007/s10994-023-06354-5; Campbell, M., & Jovanović, M. (2023). Detecting Artificial Intelligence: A New Cyberarms Race Begins. Computer, 56(8). https://doi.org/10.1109/MC.2023.3279446; Chaka, C. (2023). Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools. Journal of Applied Learning and Teaching, 6(2). https://doi.org/10.37074/jalt.2023.6.2.12; Currie, G. M. (2023). Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy? In Seminars in Nuclear Medicine (Vol. 53, Issue 5, pp. 719–730). W.B. Saunders. https://doi.org/10.1053/j.semnuclmed.2023.04.008; Demirci, D., Sahin, N., Sirlancis, M., & Acarturk, C. (2022). Static Malware Detection Using Stacked BiLSTM and GPT-2. IEEE Access, 10. https://doi.org/10.1109/ACCESS.2022.3179384; Dempere, J., Modugu, K., Hesham, A., & Ramasamy, L. K. (2023). The impact of ChatGPT on higher education. In Frontiers in Education (Vol. 8). https://doi.org/10.3389/feduc.2023.1206936; Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. BioRxiv.; Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2). https://doi.org/10.30935/cedtech/13036; Hamilton, L. M., & Lahne, J. (2022). Natural Language Processing. In Rapid Sensory Profiling Techniques: Applications in New Product Development and Consumer Research, Second Edition. https://doi.org/10.1016/B978-0-12-821936-2.00004-2; Harry, A. (2023). Role of AI in Education. Interdiciplinary Journal and Hummanity (INJURITY), 2(3). https://doi.org/10.58631/injurity.v2i3.52; Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications, 82(3). https://doi.org/10.1007/s11042-022-13428-4; Krause, D. (2023). Mitigating Risks for Financial Firms Using Generative AI Tools. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4452600; M. Mijwil, M., Hiran, K. K., Doshi, R., Dadhich, M., Al-Mistarehi, A.-H., & Bala, I. (2023). ChatGPT and the Future of Academic Integrity in the Artificial Intelligence Era: A New Frontier. Al-Salam Journal for Engineering and Technology, 2(2), 116–127. https://doi.org/10.55145/ajest.2023.02.02.015; Mizuno, T., Fujimoto, S., & Ishikawa, A. (2022). Generation of individual daily trajectories by GPT-2. Frontiers in Physics, 10. https://doi.org/10.3389/fphy.2022.1021176; Ngo, T. T. A. (2023). The Perception by University Students of the Use of ChatGPT in Education. International Journal of Emerging Technologies in Learning, 18(17). https://doi.org/10.3991/ijet.v18i17.39019; Oh, B. D., & Schuler, W. (2023). Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times? Transactions of the Association for Computational Linguistics, 11. https://doi.org/10.1162/tacl_a_00548; Oshikawa, R., Qian, J., & Wang, W. Y. (2018). A Survey on Natural Language Processing for Fake News Detection. http://arxiv.org/abs/1811.00770; Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (n.d.). Language Models are Unsupervised Multitask Learners. https://github.com/codelucas/newspaper; Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences (Switzerland), 13(9). https://doi.org/10.3390/app13095783; Rajanak, Y., Patil, R., & Singh, Y. P. (2023). Language Detection Using Natural Language Processing. 2023 9th International Conference on Advanced Computing and Communication Systems, ICACCS 2023. https://doi.org/10.1109/ICACCS57279.2023.10112773; Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J. N., & Sheppard, J. (2023). ChatGPT for digital forensic investigation: The good, the bad, and the unknown. Forensic Science International: Digital Investigation, 46. https://doi.org/10.1016/j.fsidi.2023.301609; Schönberger, M. (2023). ChatGPT in higher education: the good, the bad, and the University. International Conference on Higher Education Advances. https://doi.org/10.4995/HEAd23.2023.16174; Shah, F., Anwar, A., Ul Haq, I., Alsalman, H., Hussain, S., & Al-Hadhrami, S. (2022). Artificial Intelligence as a Service for Immoral Content Detection and Eradication. Scientific Programming, 2022. https://doi.org/10.1155/2022/6825228; Uzun, L. (2023). ChatGPT and Academic Integrity Concerns: Detecting Artificial Intelligence Generated Content. Language Education and Technology, 3(1), 45–54. http://www.langedutech.com/letjournal/index.php/let/article/view/49; Vismay Vora, E. al. (2023). A Multimodal Approach for Detecting AI Generated Content using BERT and CNN. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 691–701. https://doi.org/10.17762/ijritcc.v11i9.8861; Wu, Q., Jiang, H., Yin, H., Karlsson, B. F., & Lin, C. Y. (2023). Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1. https://doi.org/10.18653/v1/2023.acl-long.403; https://hdl.handle.net/20.500.12495/14069
Rights:
Attribution 4.0 International ; http://creativecommons.org/licenses/by/4.0/
Accession Number:
edsbas.6C4A4F9B
Database:
BASE

Weitere Informationen

El desarrollo de la herramienta se estructura de la preparación de los datos, identificar diferentes fuentes de datos para el entrenamiento del modelo, luego seleccionar el mejor lenguaje de programación que se adapte y tenga un fácil mantenimiento, por lo cual fue escogido Python 3, una vez seleccionado el lenguaje de programación, seleccionar el mejor modelo que se adapte y evolucione con futuras revisiones junto con sus librerías, una vez seleccionado el potencial modelo que fue GPT-2 para su base, se comenzó a diseñar y configurar el código con el modelo GPT-2, una vez diseñado el código, comienza la fase de implementación y entrenamiento del modelo, realizando ajustes en sus parámetros para dar una mejor precisión en sus decisiones, la fase final del código fue hacer un diseño interactivo para que el usuario pueda interactuar y probar la herramienta en una instancia local y posteriormente en un alojamiento web, la herramienta tiene la capacidad de además de hacer un análisis y una clasificación sobre el contenido de los archivos anexados , también puede exportar un archivo csv con los resultados de los análisis hechos. Por último, evaluar el modelo utilizando métricas de desempeño, con el propósito de analizar su precisión y exactitud a diferentes situaciones y comparar los mismos análisis con otros productos lanzados a producción como pueden ser GPTZero, ZeroGPT y Copyleaks. ; The development of the tool is structured from the preparation of the data, identifying different data sources for training the model, then selecting the best programming language that fits and is easy to maintain, for which Python 3 was chosen, once the programming language was selected, select the best model that fits and evolves with future revisions along with its libraries, once the potential model was selected that was GPT-2 for its base, the code began to be designed and configured with the GPT-2 model, once the code was designed, the implementation and training phase of the model begins, making adjustments to its parameters ...