Treffer: NLP and Graph-Theory-based Coding for Text (Corpus) Analysis. A Comparative Poetry and Philosophy Implementation Case Study
boreal:249800
urn:ISSN:1938-4122
urn:EISSN:1938-4122
1288278690
From OAIster®, provided by the OCLC Cooperative.
Weitere Informationen
The case study analyzes a Python script—written by the first author and adapted by the second for her own tasks—and its extra-functional relevance to two different research projects comparatively (extra-functional in the sense foregrounded in the CFP). The code combines natural language processing (NLP) and graph theory applications to represent textual corpora as networks and analyze the latter for features (mainly centralities) reflecting on the corpora and the relationships between their texts or between several corpora. The projects are the Graph Poem (UCLouvain and UOttawa) and The Normalization of Philosophy (U of Groningen). The code is used for analyzing poetry corpora in either English or French, and early modern European philosophical corpora in English and/or French and/or Latin, respectively. Yet while the poetry project explores English or French corpora that are compared cross-lingually only when the former consist of translations of the poems in the latter, the philosophy project deals with multilingual corpora on a regular basis, as there are either authors who wrote in two or three languages, or regions/communities/institutions in early modern Europe with significant bilingual or trilingual output. The comparative analysis comports therefore two main categories of aspects: genre and programming-language-related ones. For the former, the discussion briefly reviews the relevant facets of a couple of the NLP tools developed by the first author and his team for automated poetry analysis (mainly the diction and the rhyme classifiers) as well as the work done by the second author to adapt the code for her philosophy corpora (processing obsolete language or spellings, processing and analyzing multilingualism, etc.). In terms of the programming language used (Python), the study features a brief critique of the coding libraries and resources the script taps into, such as NLTK, a GitHub repo for bilingual NLP (which the second author repurposed for generally