Treffer: TEI Analytics: converting documents into a TEI format for cross-collection text analysis

Title:
TEI Analytics: converting documents into a TEI format for cross-collection text analysis
Source:
Digital Humanities 2008, University of Oulu, Finland, June 25-29Literary and linguistic computing. 24(2):187-192
Publisher Information:
Oxford: Oxford University Press, 2009.
Publication Year:
2009
Physical Description:
print, 1/4 p
Original Material:
INIST-CNRS
Document Type:
Konferenz Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, United States
ISSN:
0268-1145
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Sciences of information and communication. Documentation

FRANCIS
Accession Number:
edscal.21801389
Database:
PASCAL Archive

Weitere Informationen

For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.