Serviceeinschränkungen vom 12.-22.02.2026 - weitere Infos auf der UB-Homepage

Treffer: A Chatbot for Football Analytics : A deep dive into RAG, LLM Orchestration and Function Calling ; Chatbot för fotbollsanalys : En djupdykning inom RAG, LLM-orkestrering och function calling

Title:
A Chatbot for Football Analytics : A deep dive into RAG, LLM Orchestration and Function Calling ; Chatbot för fotbollsanalys : En djupdykning inom RAG, LLM-orkestrering och function calling
Authors:
Publisher Information:
KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication Year:
2025
Collection:
Royal Inst. of Technology, Stockholm (KTH): Publication Database DiVA
Document Type:
Dissertation bachelor thesis
File Description:
application/pdf
Language:
English
Relation:
TRITA-EECS-EX; 2025:471
Rights:
info:eu-repo/semantics/openAccess
Accession Number:
edsbas.BFDCAB91
Database:
BASE

Weitere Informationen

This thesis presents a small, modular chatbot that lets novice users talk directly to structured football data, lowering the entry barrier for analytics work. Its goal is to replace a 45-minute, multi-step workflow with a one-minute conversation and thereby make advanced insights available to journalists, coaches and football-focused professionals and enthusiasts. The assistant supports three key tasks: (i) create an opponent-analysis dashboard for a team, (ii) create a match-analysis for a game, and (iii) explain complex Key Performance Indicators in plain language. The pipeline follows a multilayer Retrieval-Augmented Generation design. A multilingual-E5 embedder stores data on football matches in a Chroma vector database. The user query gets cleaned, classified and corrected if needed and a hybrid retriever ranks candidates based on the cleaned user query. Lastly an OpenAI assistant validates the fetched data (LLM-as-a-judge) and either answers directly with the analysis directly or calls backend functions that return JSON dashboards, this keeps token usage and latency low. Benchmarking shows the retriever ranks the correct dataset in 92 % of synthetic queries (MRR = 0.92), and an eight-person user study cut dashboard-creation time from 45 min to 17 s on average, with a BUS-15 usability score of 76/100. Building a domain-specific RAG stack that works on nearly identical football tables and runs on consumer hardware combines hard research questions (retrieval disambiguation, function-calling orchestration, latency–cost trade-offs) with industrial constraints (proprietary data, zero GPU budget). Key limitations are CPU-only hosting, absence of fine-tuning data and a bias toward men’s competitions, which constrain latency and domain coverage. Even so, the work demonstrates that a lightweight RAG + LLM stack can democratise football analytics on commodity hardware. Future steps are adding GPU hosting, richer metadata filters and a fine-tuned analysis model, will most likely lift accuracy, reduce variance and turn ...