Treffer: Fast Stylometry

Title:
Fast Stylometry
Contributors:
Wood, Thomas
Publisher Information:
Zenodo
Publication Year:
2024
Collection:
Zenodo
Document Type:
E-Ressource software
Language:
English
DOI:
10.5281/zenodo.11096941
Rights:
Accession Number:
edsbas.C76A7750
Database:
BASE

Weitere Informationen

Fast Stylometry is a Python library for calculating the Burrows' Delta. Burrows' Delta is an algorithm for comparing the similarity of the writing styles of documents, known as forensic stylometry. The library can also calculate the probability that two books were by the same author. I wrote this library to improve my understanding, and also because the existing libraries I could find were focused around generating graphs but did not go as far as calculating probabilities. Burrows' Delta algorithm The Burrows' delta is a statistic which expresses the distance between two authors' writing styles. A high number like 3 implies that the two authors are very dissimilar, whereas a low number like 0.2 would imply that two books are very likely to be by the same author. Explanation of the maths and thinking behind Burrows' Delta and how it works. The Burrows' delta is calculated by comparing the relative frequencies of function words such as “inside”, “and”, etc, in the two texts, taking into account their natural variation between authors.