Treffer: Didžiųjų duomenų klasterizavimas ir klasifikavimas

Title:

Didžiųjų duomenų klasterizavimas ir klasifikavimas

Authors:

Norkevičiūtė, Lina

Contributors:

Kurasova, Olga

Publisher Information:

Institutional Repository of Vilnius University, 2020.

Publication Year:

2020

Document Type:

Dissertation Bachelor thesis

File Description:

application/pdf

Language:

Lithuanian

Access URL:

https://repository.vu.lt/VU:ELABAETD81706279&prefLang=en_US

Accession Number:

edsair.od......4036..c88745706b28629a31a6c6b3e143fdb5

Database:

OpenAIRE

Weitere Informationen

In today’s world more and more data are collected and generated by digital devices every day. They are characterized not only by the exceptional volume and velocity at which they have to be saved and processed, but by their variety too. Most of this data are unstructured or just semi-structured, in order to preserve their veracity and value, Big Data technologies and techniques have to be used. Various data mining tasks, such as data clustering and classification, can be utilised for extracting information from collected material. However, most of the regular clustering and classification algorithms are not well suited for Big Data analysis. When using them, data have to be preprocessed by reducing the data features subset or selecting just a sample of available material. Clustering and classification algorithms can be applied to Big Data by performing them in parallel or in a distributed network of multiple devices. Various Big Data technologies, such as MapReduce programming model, Apache Hadoop framework and Apache Spark Big Data engine, can be used for this purpose too. They allow to perform Big Data analysis without putting too much effort into distributing data or calculations and focusing only on developing functionality for finding useful information.

Treffer: Didžiųjų duomenų klasterizavimas ir klasifikavimas

Weitere Informationen

Links

Zusatz-Funktionen