Result: Proudové algoritmy pro Lp vzorkování velkých dat
Title:
Proudové algoritmy pro Lp vzorkování velkých dat
Authors:
Contributors:
Veselý, Pavel, Vu, Tung Anh
Publisher Information:
2024.
Publication Year:
2024
Subject Terms:
Document Type:
Dissertation/ Thesis
Bachelor thesis
Language:
Czech
Access URL:
Accession Number:
edsair.od......2186..ea5e8f2fe94395c76895a7672d59b4b8
Database:
OpenAIRE
Further Information
Large-scale computations often require working with datasets far larger than the avail- able memory. That creates the need to summarise large data in small space. One of the possible techniques is Lp sampling. Its goal is to take a stream of data defining a vector of frequencies and randomly sample an index with the probability proportional to the p-th power of its frequency. In this work we will describe the main existing algorithms for Lp sampling with p = 0 a p = 2. In the process we will introduce a slight algorith- mic improvement for Distinct Sampling and extend the Truly Perfect Sampler algorithm with frequency estimation. Next we will implement these algorithms and experimentally evaluate their efficiency.