Treffer: POLIcy design ANNotAtions (POLIANNA): A dataset for analysing legal texts

Title:
POLIcy design ANNotAtions (POLIANNA): A dataset for analysing legal texts
Publication Year:
2023
Document Type:
dataset
Language:
English
Rights:
undefined
Accession Number:
edsbas.98604DA4
Database:
BASE

Weitere Informationen

The POLIANNA dataset is a collection of legislative texts from the European Union (EU) that have been annotated based on theoretical concepts of policy design. The dataset includes 412 annotated articles, drawn from 18 EU climate change mitigation and renewable energy laws, and can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset includes a novel coding scheme for annotating text spans, a description of the annotated corpus, an analysis of inter-annotator agreement, and a discussion of potential applications. The ultimate goal is to use this dataset to build tools that assist with manual coding of policy texts by automatically identifying relevant paragraphs. Detailed instructions and further guidance about the dataset as well as all the code used for this project can be found on the GitHub project page. The repository also contains useful code to calculate various inter-annotator agreement measures and can be used to process text annotations generated by INCEpTION. Dataset Description We provide the dataset in 3 different formats: JSON: Each article corresponds to a folder, where the Tokens and Spans are stored in a separate JSON file. Each article-folder further contains the raw policy-text as in a text file and the metadata about the policy. This is the most human-readable format. JSONL: Same folder structure as the JSON format, but the Spans and Tokens are stored in a JSONL file, where each line is a valid JSON document. Pickle: We provide the dataset as a Python object. This is the recommended method when using our own Python framework that is provided on GitHub. For more information, check out the GitHub project page. License The POLIANNA dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. If you use the POLIANNA dataset in your research in any form, please cite the dataset. Citation [PLACEHOLDER FOR CITATION OF OUR PAPER] ; This work was also supported by ETH Career Seed Grant SEED-24 19-2, funded by the ETH .