Treffer: Rule-Based Information Extraction from Multi-format Resumes for Automated Classification.
Weitere Informationen
Nowadays, with the expansion of the Internet, a lot of people publish their resumes on the internet and social media networks. Large companies receive hundreds of resumes per day, which comes in several formats such as Joint Photographic Experts Group (JPG), Portable Document Format (PDF) and Word files. Therefore, information extraction from resumes can be applied automatically by several methods. In this research, the important details that are taken from resumes are: name, date of birth, email, phone number, GPA, gender, nationality, and address. The private resumes dataset used is taken from different sources including open source as well as personally annotated. The processes of information extraction for resumes have been performed in different phases such as: pre-processing, converting the resumes files into PDF and information extraction by the rule-based method to extract the eight elements from resumes. To carry out the experiment, the Python language is used, particularly the spacy library and word2vec technique. Consequently, the experimental results demonstrate that the testing phase achieved 96.4% information extraction precision which is quite considerable in contrast to the techniques in the literature. The scheme is then extended to classify the resume based on the extracted information fields and exhibited classification accuracy, precision, recall and F1-score as 98.02%, 98.01%, 98% and 98%, respectively. [ABSTRACT FROM AUTHOR]
Copyright of Mathematical Modelling of Engineering Problems is the property of International Information & Engineering Technology Association (IIETA) and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)