Investigation of Automatic Part-of-Speech Tagging using CRF, HMM and LSTM on Misspelled and Edited Texts

Farhad Aydinov, Igbal Huseynov, Sofiya Sayadzada, Samir Rustamov

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Part-of-speech tagging is the process of assigning words in a given text to appropriate parts-of speech in order to reduce the disambiguation which may arise depending on the contextual usage of the words. In this paper, the problem of word sense disambiguation in Azerbaijani language is addressed by applying part of speech tagging on two varying data corpora, misspelled, and edited (clean) text using 3 different machine learning algorithms: Hidden Markov Model, Long Short-Term Memory, and Conditional Random Fields. The comparative analysis on the outcomes of the mentioned algorithms and their accuracy scores were analysed in the paper. The misspelled dataset for the experiments is provided by Unibank from their chatbot dialogues while the clean textual data was retrieved from the books and newspapers in Azerbaijani. The experiments showed that the Bidirectional LSTM has the highest accuracy scores for both edited (98.2%) and noisy (96.2%) data corpora. Suggested models can be used in the application of algorithms focuses on part of speech tags and syntactic structure of Azerbaijani language which is an agglutinative language belonging to Turkic languages family, thus enabling the research to be further investigated in other agglutinative languages with similar grammatical structure.

Original languageEnglish
Title of host publicationProceedings of the 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022
PublisherAssociation for Computing Machinery
Publication date2022
Pages21-28
ISBN (Electronic)978-1-4503-9874-9
DOIs
Publication statusPublished - 2022
Event5th Artificial Intelligence and Cloud Computing Conference, - Osaka, Japan
Duration: 17 Dec 202219 Dec 2022

Conference

Conference5th Artificial Intelligence and Cloud Computing Conference,
Country/TerritoryJapan
CityOsaka
Period17/12/202219/12/2022

Keywords

  • CRFs
  • HMMs
  • LSTM
  • Part of Speech Tagging
  • PoS Tagging for agglutinative languages

Fingerprint

Dive into the research topics of 'Investigation of Automatic Part-of-Speech Tagging using CRF, HMM and LSTM on Misspelled and Edited Texts'. Together they form a unique fingerprint.

Cite this