Author information:
Csanád Temesvári https://orcid.org/0009-0002-5724-4287: Magyar Nemzeti Bank, Analyst. E-mail: temesvarics@mnb.hu
Beáta Horváth https://orcid.org/0009-0001-0269-4881: Magyar Nemzeti Bank, Senior Economic Analyst. E-mail: horvathbea@mnb.hu
Lívia Réka Ónozó https://orcid.org/0009-0002-7595-8531: Magyar Nemzeti Bank, Supervisory Advisor; Budapest University of Technology and Economics, PhD Student. E-mail: onozol@mnb.hu
Abstract:
Economic text data, such as news articles or retail trade item names, are an alternative, feature-rich, high frequency information source that can provide insight into economic trends and generate timelier and more accurate estimates. We trained multiple deep learning models for two distinct research tasks: 1) the creation of a sentiment index derived from the categorisation of financial and economic articles into three sentiment categories; and 2) the classification of retail trade item names into appropriate tariff categories. Our models consistently outperformed their baseline counterparts for retail trade item classification, while our sentiment index was able to accurately predict economic downturns where high-frequency data were not available.
Cite as (APA):
Temesvári, C., Horváth, B., & Ónozó, L.R. (2026). Natural Language Processing-Driven Use-Cases for Economic Analysis Using Unstructured Data. Financial and Economic Review, 25(1), 27–52. https://doi.org/10.33893/FER.25.1.27
Column:
Study
Journal of Economic Literature (JEL) codes:
C43, C45, C60
Keywords:
Natural Language Processing, Deep Learning, macroeconomic nowcasting, classification
References:
Aguilar, P. – Ghirelli, C. – Pacce, M. – Urtasun, A. (2021): Can news help measure economic sentiment? An application in COVID-19 times. Economics Letters, 199, 109730. https://doi.org/10.1016/j.econlet.2021.109730
Arthur, F.V. – Gyires-Tóth, B. – Debreczeni, M.I. – Ónozó, L.R. (2023): Language of the Market: NLP-Driven Sentiment Analysis of the Hungarian Economy. In: 14th IEEE International Conference on Cognitive Infocommunication (CogInfoCom), Budapest, Hungary, pp. 93–98. https://doi.org/10.1109/CogInfoCom59411.2023.10397544
Ash, E. – Hansen, S. (2023): Text Algorithms in Economics. Annual Review of Economics, 15: 659–688. https://doi.org/10.1146/annurev-economics-082222-074352
Babii, A. – Ghysels, E. – Striaukas, J. (2021): Machine Learning Time Series Regressions with an Application to Nowcasting. Journal of Business & Economic Statistics, 40(3): 1094–1106. https://doi.org/10.1080/07350015.2021.1899933
Bai, J. – Perron, P. (1998): Estimating and testing linear models with multiple structural changes. Econometrica, 66(1): 47–78. https://doi.org/10.2307/2998540
Baker, S.R. – Bloom, N. – Davis, S.J. (2016): Measuring economic policy uncertainty. Quarterly Journal of Economics, 131(4): 1593–1636. https://doi.org/10.1093/qje/qjw024
Beltagy, I. – Lo, K. – Cohan, A. (2019): SciBERT: A Pretrained Language Model for Scientific Text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620, Hong Kong, China. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1371
Berndt, D.J. – Clifford, J. (1994): Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAIWS’94, AAAI Press, Seattle, WA, pp. 359–370. http://www.aaai.org/Papers/Workshops/1994/WS-94-03/WS94-03-031.pdf. Downloaded: 5 May 2025.
Chen, C. – Palmer, A. – Sporleder, C. (2011): Enhancing active learning for semantic role labeling via compressed dependency trees. In: Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 183–191. https://aclanthology.org/I11-1021.pdf. Downloaded: 28 September 2025.
Devlin, J. – Chang, M-W. – Lee, K. – Toutanova, K. (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1: 4171–4186. https://doi.org/10.18653/v1/N19-1423
De Bondt, G.J. – Sun, Y. (2025): Enhancing GDP nowcasts with ChatGPT: a novel application of PMI news releases. Working Paper 3063, European Central Bank. https://doi.org/10.2866/2788332
Dickey, D.A. – Fuller, W.A. (1979): Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a): 427–431. https://doi.org/10.1080/01621459.1979.10482531
Granger, C.W.J. (1969): Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3): 424–438. https://doi.org/10.2307/1912791
Gentzkow, M. – Kelly, B. – Taddy, M. (2019): Text as data. Journal of Economic Literature, 57(3): 535–574. https://doi.org/10.1257/jel.20181020
Hannan, E.J. – Quinn, B.G. (1979): The determination of the order of an autoregression. Journal of the Royal Statistical Society: Series B (Methodological), 41(2): 190–195. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x
Horváth, B. – Lovics, G. (2023): Havi munkaügyi adatok becslésének módszertana a KSH-ban (Methodology of Monthly Labor Market Data Estimation at the Hungarian Central Statistical Office). Szigma, 54(3–4): 205–226. https://doi.org/10.15170/SZIGMA.54.1190
Huang, A.H. – Wang, H. – Yang, Y. (2023): FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research, 40(2): 806–841. https://doi.org/10.1111/1911-3846.12832
Jiang, S. – Pang, G. – Wu, M. – Kuang, L. (2012): An improved K-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1): 1503–1509. https://doi.org/10.1016/j.eswa.2011.08.040
Kalamara, E. – Turrell, A. – Redl, C. – Kapetanios, G. – Kapadia, S. (2022): Making text count: Economic forecasting using newspaper text. Journal of Applied Econometrics, 37(5): 896–919. https://doi.org/10.1002/jae.2907
Liu, Y. – Ott, M. – Goyal, N. – Du, J. – Joshi, M. – Chen, D. et al. (2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Loughran, T. – McDonald, B. (2011): When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance, 66(1): 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
Nasiopoulos, D.K. – Roumeliotis, K.I. – Sakas, D.P. – Toudas, K. – Reklitis, P. (2025): Financial Sentiment Analysis and Classification: A Comparative Study of Fine-Tuned Deep Learning Models. International Journal of Financial Studies, 13(2), 75. https://doi.org/10.3390/ijfs13020075
Nemeskey, D.M. (2020): Natural Language Processing for Language Modeling. Ph. D. dissertation, Eötvös Loránd University, Budapest. https://doi.org/10.15476/ELTE.2020.066
Ónozó, L.R. – Putz, O. – Járási, I. – Gyires-Tóth, B. (2024a): Kiskereskedelmi terméknevek kategorizálása Kombinált Nomenklatúra szerint (Categorising retail product names according to the Combined Nomenclature). In: Berend, G. – Gosztolya, G. – Vincze, V. (eds.): XX. Magyar Számítógépes Nyelvészeti Konferencia (XX. Hungarian Computational Linguistics Conference). Szegedi Tudományegyetem, Szeged, Magyarország, pp. 131–144. https://m2.mtmt.hu/gui2/?mode=browse¶ms=publication;34560678. Downloaded: 3 December 2024.
Ónozó, L.R. – Arthur, F.V. – Gyires-Tóth, B. (2024b): Leveraging LLMs for Financial News Analysis and Macroeconomic Indicator Nowcasting. In: IEEE Access, Vol. 12: 160529–160547. https://www.doi.org/10.1109/ACCESS.2024.3488363
Penedo, G. – Kydlíček, H. – Sabolčec, V. – Messmer, B – Foroutan, N. – Kargaran, A.H. et al. (2025): FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language. Second Conference on Language Modeling. https://openreview.net/pdf?id=jnRBe6zatP. Downloaded: 13 September 2025.
Proakis, J.G. – Manolakis, D.G. (2007): Digital Signal Processing: Principles, Algorithms and Applications, 3rd Edition. Prentice-Hall International, Incorporated. https://uvceee.files.wordpress.com/2016/09/digital_signal_processing_principles_algorithms_and_applications_third_edition.pdf. Downloaded: 28 August 2025.
Rostam, Z.R.K. – Kertész, G. (2025): Advances in Pre-trained Language Models for Domain-Specific Text Classification: A Systematic Review. ACM Transactions on Intelligent Systems and Technology, 16(6), 124: 1–41. https://doi.org/10.1145/3763002
Schröder, C. – Niekler, A. – Potthast, M. (2022): Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers. In: Muresan, S. – Nakov, P. – Villavicencio, A. (eds.): Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, pp. 2194–2203. https://doi.org/10.18653/v1/2022.findings-acl.172
Schwarz, G. (1978): Estimating the Dimension of a Model. The Annals of Statistics, 6(2): 461–464. http://www.jstor.org/stable/2958889
Settles, B. (2011): From Theories to Queries: Active Learning in Practice. In: Guyon, I. – Cawley, G. – Dror, G. – Lemaire, V. – Statnikov, A. (eds.): Active Learning and Experimental Design workshop in conjunction with AISTATS 2010, pp. 1–18. http://proceedings.mlr.press/v16/settles11a/settles11a.pdf. Downloaded: 10 September 2025.
Sobrino, N.D. – Ghirelli, C. – Hurtado, S. – Pérez, J.J. – Urtasun, A. (2020): The narrative about the economy as a shadow forecast: an analysis using Banco de España quarterly reports. Working Papers 2042, Banco de España. https://www.bde.es/f/webbde/SES/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/20/Files/dt2042e.pdf
Sparck Jones, K. (1972): A statistical Interpretation of Term Specificity and its Applications in Retrieval. Journal of Documentation, 28(1): 11–21. https://doi.org/10.1108/eb026526
Tetlock, P.C. (2007): Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3): 1139–1168. https://doi.org/10.1111/j.1540-6261.2007.01232.x
Toda, H.Y. – Yamamoto, T. (1995): Statistical inference in vector autoregressions with possibly integrated processes. Journal of Econometrics, 66(1–2): 225–250. https://doi.org/10.1016/0304-4076(94)01616-8
Tong, H. (1978): On a Threshold Model in Pattern Recognition and Signal Processing. In: Chen, C. (ed.): Pattern Recognition and Signal Processing. NATO ASI Series E: Applied Sc., (29). Sijthoff & Noordhoff, Netherlands, pp. 575–586. https://www.researchgate.net/publication/246995827_On_a_Threshold_Model_in_Pattern_Recognition_and_Signal_Processing
Üveges, I. – Vági, R. – Megyeri, A. – Fülöp, A. – Nagy, D. – Vadász, J.P. et al. (2024): Saving labeling cost by embracing Active Learning: a case study. In: Berend, G. – Gosztolya, G. – Vincze, V. (eds.): XX. Magyar Számítógépes Nyelvészeti Konferencia (XX. Hungarian Computational Linguistics Conference). Szegedi Tudományegyetem, Szeged, Magyarország, pp. 145–158. https://www.researchgate.net/publication/377730059_Saving_labeling_cost_by_embracing_Active_Learning_a_case_study
Vaswani, A. – Shazeer, N. – Parmar, N. – Uszkoreit, J. – Jones, L. – Gomez, A.N. et al. (2017): Attention is All You Need. Advances in Neural Information Processing Systems 30. Curran Associates Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. Downloaded: 4 December 2024.
Xia, Y. – Mukherjee, S. – Xie, Z. – Wu, J. – Li, X. – Aponte, R. et al. (2025): From Selection to Generation: A Survey of LLM-based Active Learning. In: Che, W. – Nabende, J. – Shutova, E. – Pilehvar, M.T. (eds.): Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers: 14552–14569. https://doi.org/10.18653/v1/2025.acl-long.708
Yang, Z.G. – Dodé, R. – Ferenczi, G. – Héja, E. – Jelencsik-Mátyus, K. – Kőrös, Á. et al. (2023): Jönnek a Nagyok! BERT-Large, GPT-2, GPT-3 nyelvmodellek magyar nyelvre (The Big Ones are Coming! BERT-Large, GPT-2, GPT-3 language models for Hungarian). In: 19. Magyar Számítógépes Nyelvészeti Konferencia (19th Hungarian Computational Linguistics Conference), Szegedi Tudományegyetem, Szeged, pp. 247–262. https://acta.bibl.u-szeged.hu/78417/. Downloaded: 12 December 2024.
Yang, Z.G. – Laki, L.J. (2021): Improving Performance of Sentence-level Sentiment Analysis with Data Augmentation Methods. In: IEEE (ed.): 12th International Conference on Cognitive Infocommunications (CogInfoCom 2021): Proceedings. Institute of Electrical and Electronics Engineers (IEEE), pp. 417–422.
Yang, Z.G. – Váradi, T. (2023): Training Experimental Language Models with Low Resources, for the Hungarian Language. Acta Polytechnica Hungarica, 20(5): 169–188. https://doi.org/10.12700/APH.20.5.2023.5.11
Zhang, Z. – Strubell, E. – Hovy, E. (2022): A Survey of Active Learning for Natural Language Processing. In: Goldberg, Y. – Kozareva, Z. – Zhang, Y. (eds.): Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 6166–6190. https://doi.org/10.18653/v1/2022.emnlp-main.414