Treffer: Transformation from human-readable documents and archives in arc welding domain to machine-interpretable data.
Weitere Informationen
• Transforming human-readable documents to machine-interpretable data for implementing advanced knowledge discovery via artificial intelligence methods. • An integrated method for extracting and migrating data from multi-format sources as a structured JSON format. • Good feasibility, reproducibility and efficiency in code compiling and package available under Python environment. • Relatively low technical barriers and opportunity cost of learning extraction skills, with further optimisation for ad-hoc features in coding script. • Case study for practical implementation of data migration using real industry (welding) documents and archives. The capability of extracting useful information from documents and further transferring into knowledge is essential to advance technology innovations in industries. However, the overwhelming majority of scientific literature primarily published as unstructured human-readable formats is incompatible for machine analysis via contemporary artificial intelligence (AI) methods that effectively discovers knowledge from data. Therefore, the extraction approach transforming of unstructured data are fundamental in establishing state-of-the-art digital knowledge-based platforms. In this paper, we integrated multiple Python libraries and developed a method as a cohesive package for automated data extraction and quick processing to convert unstructured documents into machine-interpretable data. Transformed data can be further incorporated with AI analytical methods. The output files have shown excellent quality of digitalised data without major flaws in terms of context inconsistency. All scripts were written in Python with functional modules providing easy accessibility and proficiency to achieve objectives. Eventually, the finalised well-structured data can be implemented for further knowledge discovery. [ABSTRACT FROM AUTHOR]
Copyright of Computers in Industry is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)