Result: Github BPMN Artifacts Dataset 2021
Title:
Github BPMN Artifacts Dataset 2021
Publisher Information:
Zenodo
Publication Year:
2022
Collection:
Zenodo
Subject Terms:
Document Type:
dataset
Language:
unknown
Relation:
https://zenodo.org/records/5903352; oai:zenodo.org:5903352; https://doi.org/10.5281/zenodo.5903352
DOI:
10.5281/zenodo.5903352
Rights:
Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number:
edsbas.8609FAA4
Database:
BASE
Further Information
Information about 327,436 potential BPMN artifacts identified in all public Github repositories referenced in the GHTorrent dump from March 2021. The data file is in line-delimited JSON format, with each row containing an array with the following six elements: GHTorrent project ID GitHub user name GitHub repository name GitHub branch name Path to file inside repository SHA1 hash of the file's contents To get a list of retrievable URLs, use e.g. the following Python one-liner: python3 -c 'import json; import sys; print(*[f"https://raw.githubusercontent.com/{u}/{r}/{b}/{f}" for _, u, r, b, f, _ in map(json.loads, sys.stdin)], sep="\n")' < bpmn-artifacts.jsonl > urls.txt (using the hashes to filter out duplicates first is recommended though)