Information and Innovations
Informaciâ i innovacii

ISSN 1994-2443 (Print)
ISSN 2949-2157 (Online)

eng | рус

Preview

Information and Innovations

Advanced search

Archives

Problems of Algorithms Development to Determine Quality of Topic Models Ensembles for Make Rubricators

A. P. Shiryaev, A. R. Fedorov, P. A. Fedorov, L. G. Gagarina, E. M. Portnov

https://doi.org/10.31432/1994-2443-2018-13-3-53-58

Full Text:

PDF (Rus)

Generate QR code

Abstract

Intelligent data mining is one of the most relevant areas of research in the modern world. The spectrum of its application is extremely wide and covers practically all scientiﬁc disciplines. The task of analyzing text collections with the purpose of establishing thematic headings, which should be classiﬁed as separate articles with observance of the principle of systematization “from the general to the particular” and the formation of the list of “nuclear” categories, is very actual. Clustering and, in particular, topic modeling is one of the methods of intelligent text analysis. The solution of the problem of clustering text collections is fundamentally ambiguously, and there are several reasons. Firstly, there isn’t known clearly the best criterion of quality of clustering. There are a lot of reasonable criteria, but they all can give diﬀerent results. Secondly, the number of clusters is usually unknown in advance and determined according by some subjective criterion. Thirdly, clustering result depends signiﬁcantly on the distance metric, the choice of which is usually subjective and set by the expert. Nowadays ensembles of models are becoming more widespread among the data mining techniques. They can signiﬁcantly improve the accuracy of modeling results. The main purpose of this research is to increase the clustering eﬀectiveness of textual information by using the ensemble thematic models. This article describes the usage of a voting algorithm, which is based on a group of diﬀerent evaluation algorithms. Voting algorithm allows you to select the most appropriate solution, to accurately assess the quality of the topic model and to generate a set of relevant topics. Computational experiment demonstrates coincidence with the results of expert assessments and the evaluations of formal criteria in general. The concept for quality evaluation of thematic models ensemble, which uses the simple voting algorithm, was explored and proposed for further researches.

Keywords

Кластерный анализ, голосующий алгоритм, качество тематических моделей, перплексия

About the Authors

A. P. Shiryaev

National Research University of Electronic Technology, Moscow, Russia
Russian Federation

A. R. Fedorov

National Research University of Electronic Technology, Moscow, Russia
Russian Federation

P. A. Fedorov

National Research University of Electronic Technology, Moscow, Russia
Russian Federation

L. G. Gagarina

National Research University of Electronic Technology, Moscow, Russia
Russian Federation

E. M. Portnov

National Research University of Electronic Technology, Moscow, Russia
Russian Federation

References

1. Воронцов К.В. Вероятностное тематическое моделирование. URL: http://www.machinelearning. ru/wiki/images/2/22/Voron-2013-ptm.pdf (дата обращения 26.09.2018)

2. Бериков В.Б., Лбов Г.С. Современные тенденции в кластерном анализе. URL: https://docplayer. ru/26851064-Sovremennye-tendencii-v-klasternom- analize-v-b-berikov-g-s-lbov.html (дата обращения)

3. Кашницкий Ю.С., Игнатов Д.И. Ансамблевый метод машинного обучения, основанный на рекомендации классификаторов // Интеллектуальные системы. Теория и приложения. 2015. Т. 19. № 4. С. 37-55

4. Skurichina M., Duin R. P. W. Limited bagging, boosting and the random subspace method for linear classiﬁers // Pattern Analysis Applications. - 2002. - Pp. 121-135.

5. Журавлев Ю.И., Рязанов В.В., Сенько О.В. Распознавание. Математические методы. Программная система. Практические применения. - М: Фазис, 2005 г. , 159 стр.

6. Blei D., Ng A., and Jordan M. Latent Dirichlet allocation // Journal of Machine Learning Research. - 2003. - vol. 3. - Pp. 993-1022.

7. Thomas Hofmann. Probabilistic latent semantic analysis // Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval. 1999

8. Vorontsov K.V., Potapenko A.A. EM-like algorithms modiﬁcation for probabilistic topic modeling // Machine learning and data analysis - 2013. - vol. 1, № 6. - Pp. 657-686

9. Воронцов К.В. Лекции по алгоритмам кластеризации многомерного шкалирования URL: http://www.cs.ru/voron/download/Clustering.pdf (дата обращения 26.09.2018)

10. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeﬀrey Dean. Eﬃcient estimation of word representations in vector space// ICLR Workshop. - 2013

11. David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence // In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. - Association for Computational Linguistics, 2010. - Pp. 100-108

Review

For citations:

Shiryaev A.P., Fedorov A.R., Fedorov P.A., Gagarina L.G., Portnov E.M. Problems of Algorithms Development to Determine Quality of Topic Models Ensembles for Make Rubricators. Information and Innovations. 2018;13(3):53-58. (In Russ.) https://doi.org/10.31432/1994-2443-2018-13-3-53-58

Views: 1540

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1994-2443 (Print)
ISSN 2949-2157 (Online)

* not an advertisement

Indexing Databases

* not an advertisement

Popular articles

Editor-in-Chief

Lonchakov Yury V.

Article Tools

Finding References

Email this article (Login required)

Email the author (Login required)

About the Authors

A. P. Shiryaev
National Research University of Electronic Technology, Moscow, Russia
Russian Federation

A. R. Fedorov
National Research University of Electronic Technology, Moscow, Russia
Russian Federation

P. A. Fedorov
National Research University of Electronic Technology, Moscow, Russia
Russian Federation

L. G. Gagarina
National Research University of Electronic Technology, Moscow, Russia
Russian Federation

E. M. Portnov
National Research University of Electronic Technology, Moscow, Russia
Russian Federation

Notifications