Treffer: A Characterization of Search Engine Results
Montana State University - Billings
Weitere Informationen
Copyright Elizabeth McShane 2023 ; Background: According to a Pew Internet Survey, 91% of online adults use some form of web search. While search engine optimization studies are commonly employed by companies to gauge their visibility in search results, few studies have been done to characterize results from the user’s perspective. We wanted to explore the impact search engine choice may have on search results by characterizing top results from several search engines. Aim: Previous research has relied on manual review of search results. Instead of taking this approach, we began developing and testing a set of tools to gather, analyze, and characterize search engine results automatically. Approach: Selenium will be used to run searches and record the top ten organic results. The URLs of the search results will be stripped down to their domains in a python-based program, then categorized using a URL Lookup API. Finally, the results will be analyzed using a python-based program. Results and Conclusions: To date, we have succeeded in gathering search results from Bing, Google, and DuckDuckGo for 50 random search terms and stripped the URLs, leaving the domains. We have also identified a service that provides website categorization, using IAB taxonomy. The development we have done so far has allowed us to identify the following targets for future development. Data Gathering: Some search engines, such as Google, proved difficult to scrape and some irregular results, such as null values, were returned. We would like to explore other methods of web scraping in addition to Selinium and develop several methods that may be able to overcome unique scraping challenges that come with different search engines. In addition, we want to expand the search engines scraped to other, lesser-known search engines. Due to time constraints, the categorization API has not been fully integrated into the program. Thus, automated API integration is another target for future development. We would also like to identify any data, such as ...