Treffer: Implementation of text mining techniques using NLTK in Python programming language

Title:
Implementation of text mining techniques using NLTK in Python programming language
Contributors:
University of Belgrade [Belgrade]
Source:
2018 26th Telecommunications Forum (TELFOR), Nov 2018, Belgrade, Serbia. ⟨10.1109/TELFOR44991.2018⟩
Publisher Information:
HAL CCSD; IEEE, 2018.
Publication Year:
2018
Collection:
collection:SHS
collection:AO-LINGUISTIQUE
Subject Geographic:
Original Identifier:
HAL: hal-03091167
Document Type:
Konferenz conferenceObject<br />Conference papers
Language:
Serbian
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1109/TELFOR44991.2018
DOI:
10.1109/TELFOR44991.2018
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.03091167v1
Database:
HAL

Weitere Informationen

This paper examined text mining techniques using NLTK (Natural Language ToolKit) library for natural language processing in Python programming language. Algorithms hereby proposed addressed this aspect by extracting documents, generating concordance, lexical frequency, diversity, dispersion, list of most frequently used words, as well as by retrieving bigrams, collocations and words of a certain length in order to provide new insight into texts. These methods form only a small part of wide range of NLTK functionalities.
U ovom radu razmatraju se osnovne tehnike analize tekstova upotrebom NLTK (Natural Language ToolKit) biblioteka za obradu prirodnih jezika pomoću programskog jezika Python. Demonstriraćemo algoritme za izdvajanje dokumenata, generisanje konkordanse, leksičke frekvencije, diverziteta, disperzije, najučestalijih reči, pronalaženje bigrama, kolokacija i reči određene dužine, radi pružanja novog uvida u tekstove. Rezultati pomenutih metoda donose prikaz jednog dela širokog spektra funkcionalnosti NLTK paketa, na koji se informatička zajednica često oslanja tokom raznorodnih projekata u okviru računarske lingvistike.