Treffer: Reflecting Design Considerations: An End-to-End Case Study on Preparing Cricket Data Available on Net Analysis Ready.
Weitere Informationen
The use of Internet as a source of secondary data is becoming more popular day by day. Websites are made up of webpages that contain a huge volume of useful information in textual form. However, webpages are coded using text-based mark-up languages (e.g., HTML, XHTML, XML, etc.) to facilitate end-user viewing rather than any automated use of them. This has led to a new science called web scraping that fetches webpages and then extracts data for future use. Many organizations have picked up this business opportunity to come up with efficient web scraping tools. The paper exposes the readers to how data can be sourced from the internet for scientific or commercial purpose. This elaborates on the available design options for data fetching, extracting, validating and transforming in the absence of any end-to-end tool or to supplement a tool. This is followed up by a specific case study which deals with reactive analysis of structured data from multiple predetermined sources/pages. This paper concludes that design considerations for web scraping have to be dynamic. Neither traditional copy-andpaste nor trapping feeds using Application Programming Interfaces (API) nor Java, Python or R programming nor the end-to-end tool available is uniformly better than the rest. [ABSTRACT FROM AUTHOR]
Copyright of IUP Journal of Information Technology is the property of IUP Publications and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)