Result: Stylistics analysis and authorship attribution algorithms based on self-organizing maps : Advances in Self-Organizing Maps
Institute for Molecular Medicine Finland, Tukholmankatu 5, 00270 Helsinki, Finland
Faculty of Telematics, Universidad de Colima, Mexico
CINVESTAV IDS, México D.F., Mexico
Postgraduate Program in Complex Systems, Universidad Autónoma de la Ciudad de México, Mexico
Faculty of Literary Creation, Universidad Autónoma de la Ciudad de México, Mexico
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Further Information
The style followed by authors can be thought of as a collection of attributes that defines the stylistics space. Texts from the same author tend to be similar in that space. However, the identification of stylistics spaces has proven to be challenging. Associated with the stylistics space is the authorship attribution task. On it, a text of unknown authorship is presented to a system, and the system is expected to identify the author of the text. Two modules define an authorship attribution algorithm: the stylistics space and a classifier. We present a methodology that includes both, a module that allows the identification of novel stylistics spaces, and a classifier to confront the authorship attribution task from the features that define space. The methodology imbricates feature selection, anomaly detection, classification, and visualization algorithms. We applied the capabilities of self-organizing maps not only for visualization but also for anomaly detection, which defines the basis of the classifier. We compared our authorship attribution algorithm with two existing ones. Our methodology achieved similar or better results under bag-o/-words-related stylistics spaces, and it presented the lowest error under a novel stylistics space based on the rate of introduction of new words.