Information Potential of a Corpus of Scientific Texts
https://doi.org/10.31432/1994-2443-2023-18-4-21-37
Abstract
The article considers publicly available corpus of texts presented in the internet, characterises and considers the potential of corpus linguistics for analysing the development of scientific trends, discourse and changes in the field of terminology. A dataset based on a corpus of texts of scientific articles in a petroleum transport trade journal and the Google Books Corpus is presented. The dataset allows us to examine changes in term usage frequencies from 1940 to 2019.
The results of analyses of term usage frequencies are presented, and a comparison is made between changes in the technology industry and the development of key vocabulary. The results show that studies made using data from corpuses of scientific and technical texts have good potential for understanding trends in technological development and the dynamics of change in industry and terminology.
About the Author
V. N. KomaritsaRussian Federation
Valentin Nikolaevich Komaritsa - Cand. Sci. (Eng.), deputy head of the department of Publishing Projects and Media Communications
Moscow
References
1. Mikova N.S., Sokolova A.V. Monitoring global`ny`x texnologicheskix trendov: teoreticheskie osnovy` i luchshie praktiki // FORSAJT. 2014. T. 8. № 4.
2. Nguen Txan` V`et, Kravecz A.G. Novy`j metod prognozirovaniya texnologicheskix trendov na osnove analiza nauchny`x statej i patentov. International Journal of Open Information Technologies ISSN: 2307-8162 vol. 10, no. 10, 2022.
3. Bashkov A.S., Solomencev Ya.K. Ispol`zovanie vektorny`x metodov predstavleniya slov v zadachax vy`yavleniya trendov // Vestnik Rossijskogo novogo universiteta. Seriya «Slozhny`e sistemy` modeli, analiz i upravlenie». 2019. Vy`pusk 2, P. 80-88.
4. Soshhenko A. E., Komaricza V.N. Analiz zavisimosti mezhdu chislom publikacij i kolichestvom citirovaniya statej v nauchnoj periodike truboprovodnogo transporta uglevodorodov // Nauka i texnologii truboprovodnogo transporta nefti i nefteproduktov. — 2015. — № 3(19). — P. 108-115.
5. E`recz E`jden. Neizvedannaya territoriya: kak «bol`shie danny`e» pomogayut raskry`vat` tajny` proshlogo i predskazy`vat` budushhee nashej kul`tury`: / E`recz E`jden, Zhan-Batist Mishel`. — Moskva. Izd-vo AST. 2016. — 350 p.
6. Stop Hyping Big Data and Start Paying Attention to Long Data. URL: http://goo.gl/X7oEC (data dostupa 01.08.2023).
7. Google Books. URL: https://ru.wikipedia.org/wiki/Google_Knigi, (data dostupa 08.08.2023).
8. Jean-Baptiste Michel, Erez Lieberman Aiden: What we learned from 5 million books. URL: https://www.ted.com/, (data dostupa 08.08.2023).
9. Kotov Yu.A., Kolomecz N.V. E`lementy` sistemy` TextLab dlya chastotnogo analiza teksta. Sovremenny`e tendencii razvitiya nauki i texnologij. Sbornik nauchny`x trudov po materialam Mezhdunarodnoj nauchnoprakticheskoj konferencii. V 5-ti chastyax. Chast` II. Pod obshhej redakciej Zh.A. Shapoval. 2017.
10. McEnery Tony, Wilson Andrew. Corpus Linguistics: An Introduction. 2nd edition. — Edinburgh University Press, 2001. — 235 p.
11. Zhongquan Du, Feng Jiang, Luda Liu. Profiling figure legends in scientific research articles: A corpus-driven approach, Journal of English for Academic Purposes, Volume 54, 2021, 101054, ISSN 1475-1585, URL: https://doi.org/10.1016/j.jeap.2021.101054.
12. Mordovin A. Yu. Lingvisticheskaya ideologiya korpusov tekstov / Irkutskij gos. lingvisticheskij un-t. — Irkutsk: 2014. — 190 p.
13. Butenko Yu.I. Model` teksta nauchnotexnicheskoj stat`i dlya razmetki v korpuse nauchno-texnicheskix tekstov. Vestnik Novosibirskogo gosudarstvennogo universiteta. Seriya: Informacionny`e texnologii. 2022. T. 20. № 3. P. 5-13.
14. Plungyan V.A., Reznikova T.I., Sichinava D.V. Nacional`ny`j korpus russkogo yazy`ka: obshhaya xarakteristika. Nauchnotexnicheskaya informaciya. Seriya 2: Informacionny`e processy` i sistemy`. 2005. № 3. P. 9-13.
15. General`ny`j internet-korpus russkogo yazy`ka. URL: http://www.webcorpora.ru/, (data dostupa 01.08.2023).
16. Korpus biograficheskix tekstov — Russian Corpus of Biographical Texts. URL: https:// sites.google.com/site/utcorpus (data dostupa 29.08.2023).
17. Korpus russkix uchebny`x tekstov. URL: http://web-corpora.net/learner_corpus (data dostupa 07.09.2023).
18. Corpora of Academic Texts. URL: https://www.clarin.eu/resource-families/corpora-academic-texts (data dostupa 07.09.2023).
19. Davies M. 2011. Google Books Corpus (155 billion words, 1810-2009). URL: http://googlebooks.byu.edu/, (data dostupa 01.08.2023).
20. Glazkova A.V. Avtomaticheskij poisk fragmentov, soderzhashhix biograficheskuyu informaciyu, v tekste na estestvennom yazy`ke // Trudy` Instituta sistemnogo programmirovaniya RAN. 2018. Tom 30. № 6. P. 221-236. DOI: 10.15514/ISPRAS-2018-30(6)-12.
21. Andreev N.D. Statistiko-kombinatorny`e metody` v teoreticheskom i prikladnom yazy`kovedenii / AN SSSR. In-t yazy`koznaniya. – Leningrad: Nauka. Leningr. otd-nie, 1967. — 403 p.
22. Komaricza V.N. Analiz klyuchevy`x slov v nauchny`x stat`yax. Nauchno-texnicheskaya informaciya. Seriya 1. Organizaciya i metodika informacionnoj raboty`. 2023. № 9. Pp. 9 — 15.
23. Grinev-Grinevich S.V., Sorokina E`.A. Perspektivny`e napravleniya razvitiya terminologicheskix issledovanij // Vestnik Moskovskogo gosudarstvennogo oblastnogo universiteta. Seriya: Lingvistika. 2018. № 5. Pp. 18–28.
24. Maslov V.P. O zakone Cipfa i rangovy`x raspredeleniyax v lingvistike i semiotike / V.P. Maslov., T.V. Maslova // Matematicheskie zametki. — 2006. — T. 80. — N. 5 — Pp. 718-732.
25. Google Books Ngram Viewer. URL: https://books.google.com/ngrams/, (data dostupa 21.08.2023).
26. 15 years of Google Books. Blog Google. URL: https://blog.google/products/search/15-years-google-books/, (data dostupa 15.08.2023).
27. ChatGPT. URL: http:// ru.wikipedia. org/, (data dostupa 04.08.2023).
Review
For citations:
Komaritsa V.N. Information Potential of a Corpus of Scientific Texts. Information and Innovations. 2023;18(4):21-37. (In Russ.) https://doi.org/10.31432/1994-2443-2023-18-4-21-37