Natural Language Processing-Driven Use-Cases for Economic Analysis Using Unstructured Data

Temesvári, C.; Horváth, B.; Ónozó, L.R.

doi:10.33893/FER.25.1.27

31 March 2026DOI: https://doi.org/10.33893/FER.25.1.27

Author information:

Csanád Temesvári https://orcid.org/0009-0002-5724-4287: Magyar Nemzeti Bank, Analyst. E-mail: temesvarics@mnb.hu

Beáta Horváth https://orcid.org/0009-0001-0269-4881: Magyar Nemzeti Bank, Senior Economic Analyst. E-mail: horvathbea@mnb.hu

Lívia Réka Ónozó https://orcid.org/0009-0002-7595-8531: Magyar Nemzeti Bank, Supervisory Advisor; Budapest University of Technology and Economics, PhD Student. E-mail: onozol@mnb.hu

Abstract:

Economic text data, such as news articles or retail trade item names, are an alternative, feature-rich, high frequency information source that can provide insight into economic trends and generate timelier and more accurate estimates. We trained multiple deep learning models for two distinct research tasks: 1) the creation of a sentiment index derived from the categorisation of financial and economic articles into three sentiment categories; and 2) the classification of retail trade item names into appropriate tariff categories. Our models consistently outperformed their baseline counterparts for retail trade item classification, while our sentiment index was able to accurately predict economic downturns where high-frequency data were not available.

Cite as (APA):

Temesvári, C., Horváth, B., & Ónozó, L.R. (2026). Natural Language Processing-Driven Use-Cases for Economic Analysis Using Unstructured Data. Financial and Economic Review, 25(1), 27–52. https://doi.org/10.33893/FER.25.1.27

PDF download

The works on this site are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Column:

Study

Journal of Economic Literature (JEL) codes:

C43, C45, C60

Keywords:

Natural Language Processing, Deep Learning, macroeconomic nowcasting, classification

References:

Aguilar, P. – Ghirelli, C. – Pacce, M. – Urtasun, A. (2021): Can news help measure economic sentiment? An application in COVID-19 times. Economics Letters, 199, 109730. https://doi.org/10.1016/j.econlet.2021.109730

Arthur, F.V. – Gyires-Tóth, B. – Debreczeni, M.I. – Ónozó, L.R. (2023): Language of the Market: NLP-Driven Sentiment Analysis of the Hungarian Economy. In: 14th IEEE International Conference on Cognitive Infocommunication (CogInfoCom), Budapest, Hungary, pp. 93–98. https://doi.org/10.1109/CogInfoCom59411.2023.10397544

Ash, E. – Hansen, S. (2023): Text Algorithms in Economics. Annual Review of Economics, 15: 659–688. https://doi.org/10.1146/annurev-economics-082222-074352

Babii, A. – Ghysels, E. – Striaukas, J. (2021): Machine Learning Time Series Regressions with an Application to Nowcasting. Journal of Business & Economic Statistics, 40(3): 1094–1106. https://doi.org/10.1080/07350015.2021.1899933

Bai, J. – Perron, P. (1998): Estimating and testing linear models with multiple structural changes. Econometrica, 66(1): 47–78. https://doi.org/10.2307/2998540

Baker, S.R. – Bloom, N. – Davis, S.J. (2016): Measuring economic policy uncertainty. Quarterly Journal of Economics, 131(4): 1593–1636. https://doi.org/10.1093/qje/qjw024

Beltagy, I. – Lo, K. – Cohan, A. (2019): SciBERT: A Pretrained Language Model for Scientific Text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620, Hong Kong, China. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1371

Berndt, D.J. – Clifford, J. (1994): Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAIWS’94, AAAI Press, Seattle, WA, pp. 359–370. http://www.aaai.org/Papers/Workshops/1994/WS-94-03/WS94-03-031.pdf. Downloaded: 5 May 2025.

Chen, C. – Palmer, A. – Sporleder, C. (2011): Enhancing active learning for semantic role labeling via compressed dependency trees. In: Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 183–191. https://aclanthology.org/I11-1021.pdf. Downloaded: 28 September 2025.

Devlin, J. – Chang, M-W. – Lee, K. – Toutanova, K. (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1: 4171–4186. https://doi.org/10.18653/v1/N19-1423

De Bondt, G.J. – Sun, Y. (2025): Enhancing GDP nowcasts with ChatGPT: a novel application of PMI news releases. Working Paper 3063, European Central Bank. https://doi.org/10.2866/2788332

Dickey, D.A. – Fuller, W.A. (1979): Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a): 427–431. https://doi.org/10.1080/01621459.1979.10482531

Granger, C.W.J. (1969): Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3): 424–438. https://doi.org/10.2307/1912791

Gentzkow, M. – Kelly, B. – Taddy, M. (2019): Text as data. Journal of Economic Literature, 57(3): 535–574. https://doi.org/10.1257/jel.20181020

Hannan, E.J. – Quinn, B.G. (1979): The determination of the order of an autoregression. Journal of the Royal Statistical Society: Series B (Methodological), 41(2): 190–195. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x

Horváth, B. – Lovics, G. (2023): Havi munkaügyi adatok becslésének módszertana a KSH-ban (Methodology of Monthly Labor Market Data Estimation at the Hungarian Central Statistical Office). Szigma, 54(3–4): 205–226. https://doi.org/10.15170/SZIGMA.54.1190

Huang, A.H. – Wang, H. – Yang, Y. (2023): FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research, 40(2): 806–841. https://doi.org/10.1111/1911-3846.12832

Jiang, S. – Pang, G. – Wu, M. – Kuang, L. (2012): An improved K-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1): 1503–1509. https://doi.org/10.1016/j.eswa.2011.08.040

Kalamara, E. – Turrell, A. – Redl, C. – Kapetanios, G. – Kapadia, S. (2022): Making text count: Economic forecasting using newspaper text. Journal of Applied Econometrics, 37(5): 896–919. https://doi.org/10.1002/jae.2907

Liu, Y. – Ott, M. – Goyal, N. – Du, J. – Joshi, M. – Chen, D. et al. (2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692

Loughran, T. – McDonald, B. (2011): When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance, 66(1): 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x

Nasiopoulos, D.K. – Roumeliotis, K.I. – Sakas, D.P. – Toudas, K. – Reklitis, P. (2025): Financial Sentiment Analysis and Classification: A Comparative Study of Fine-Tuned Deep Learning Models. International Journal of Financial Studies, 13(2), 75. https://doi.org/10.3390/ijfs13020075

Nemeskey, D.M. (2020): Natural Language Processing for Language Modeling. Ph. D. dissertation, Eötvös Loránd University, Budapest. https://doi.org/10.15476/ELTE.2020.066

Ónozó, L.R. – Putz, O. – Járási, I. – Gyires-Tóth, B. (2024a): Kiskereskedelmi terméknevek kategorizálása Kombinált Nomenklatúra szerint (Categorising retail product names according to the Combined Nomenclature). In: Berend, G. – Gosztolya, G. – Vincze, V. (eds.): XX. Magyar Számítógépes Nyelvészeti Konferencia (XX. Hungarian Computational Linguistics Conference). Szegedi Tudományegyetem, Szeged, Magyarország, pp. 131–144. https://m2.mtmt.hu/gui2/?mode=browse&params=publication;34560678. Downloaded: 3 December 2024.

Ónozó, L.R. – Arthur, F.V. – Gyires-Tóth, B. (2024b): Leveraging LLMs for Financial News Analysis and Macroeconomic Indicator Nowcasting. In: IEEE Access, Vol. 12: 160529–160547. https://www.doi.org/10.1109/ACCESS.2024.3488363

Penedo, G. – Kydlíček, H. – Sabolčec, V. – Messmer, B – Foroutan, N. – Kargaran, A.H. et al. (2025): FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language. Second Conference on Language Modeling. https://openreview.net/pdf?id=jnRBe6zatP. Downloaded: 13 September 2025.

Proakis, J.G. – Manolakis, D.G. (2007): Digital Signal Processing: Principles, Algorithms and Applications, 3rd Edition. Prentice-Hall International, Incorporated. https://uvceee.files.wordpress.com/2016/09/digital_signal_processing_principles_algorithms_and_applications_third_edition.pdf. Downloaded: 28 August 2025.

Rostam, Z.R.K. – Kertész, G. (2025): Advances in Pre-trained Language Models for Domain-Specific Text Classification: A Systematic Review. ACM Transactions on Intelligent Systems and Technology, 16(6), 124: 1–41. https://doi.org/10.1145/3763002

Schröder, C. – Niekler, A. – Potthast, M. (2022): Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers. In: Muresan, S. – Nakov, P. – Villavicencio, A. (eds.): Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, pp. 2194–2203. https://doi.org/10.18653/v1/2022.findings-acl.172

Schwarz, G. (1978): Estimating the Dimension of a Model. The Annals of Statistics, 6(2): 461–464. http://www.jstor.org/stable/2958889

Settles, B. (2011): From Theories to Queries: Active Learning in Practice. In: Guyon, I. – Cawley, G. – Dror, G. – Lemaire, V. – Statnikov, A. (eds.): Active Learning and Experimental Design workshop in conjunction with AISTATS 2010, pp. 1–18. http://proceedings.mlr.press/v16/settles11a/settles11a.pdf. Downloaded: 10 September 2025.

Sobrino, N.D. – Ghirelli, C. – Hurtado, S. – Pérez, J.J. – Urtasun, A. (2020): The narrative about the economy as a shadow forecast: an analysis using Banco de España quarterly reports. Working Papers 2042, Banco de España. https://www.bde.es/f/webbde/SES/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/20/Files/dt2042e.pdf

Sparck Jones, K. (1972): A statistical Interpretation of Term Specificity and its Applications in Retrieval. Journal of Documentation, 28(1): 11–21. https://doi.org/10.1108/eb026526

Tetlock, P.C. (2007): Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3): 1139–1168. https://doi.org/10.1111/j.1540-6261.2007.01232.x

Toda, H.Y. – Yamamoto, T. (1995): Statistical inference in vector autoregressions with possibly integrated processes. Journal of Econometrics, 66(1–2): 225–250. https://doi.org/10.1016/0304-4076(94)01616-8

Tong, H. (1978): On a Threshold Model in Pattern Recognition and Signal Processing. In: Chen, C. (ed.): Pattern Recognition and Signal Processing. NATO ASI Series E: Applied Sc., (29). Sijthoff & Noordhoff, Netherlands, pp. 575–586. https://www.researchgate.net/publication/246995827_On_a_Threshold_Model_in_Pattern_Recognition_and_Signal_Processing

Üveges, I. – Vági, R. – Megyeri, A. – Fülöp, A. – Nagy, D. – Vadász, J.P. et al. (2024): Saving labeling cost by embracing Active Learning: a case study. In: Berend, G. – Gosztolya, G. – Vincze, V. (eds.): XX. Magyar Számítógépes Nyelvészeti Konferencia (XX. Hungarian Computational Linguistics Conference). Szegedi Tudományegyetem, Szeged, Magyarország, pp. 145–158. https://www.researchgate.net/publication/377730059_Saving_labeling_cost_by_embracing_Active_Learning_a_case_study

Vaswani, A. – Shazeer, N. – Parmar, N. – Uszkoreit, J. – Jones, L. – Gomez, A.N. et al. (2017): Attention is All You Need. Advances in Neural Information Processing Systems 30. Curran Associates Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. Downloaded: 4 December 2024.

Xia, Y. – Mukherjee, S. – Xie, Z. – Wu, J. – Li, X. – Aponte, R. et al. (2025): From Selection to Generation: A Survey of LLM-based Active Learning. In: Che, W. – Nabende, J. – Shutova, E. – Pilehvar, M.T. (eds.): Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers: 14552–14569. https://doi.org/10.18653/v1/2025.acl-long.708

Yang, Z.G. – Dodé, R. – Ferenczi, G. – Héja, E. – Jelencsik-Mátyus, K. – Kőrös, Á. et al. (2023): Jönnek a Nagyok! BERT-Large, GPT-2, GPT-3 nyelvmodellek magyar nyelvre (The Big Ones are Coming! BERT-Large, GPT-2, GPT-3 language models for Hungarian). In: 19. Magyar Számítógépes Nyelvészeti Konferencia (19th Hungarian Computational Linguistics Conference), Szegedi Tudományegyetem, Szeged, pp. 247–262. https://acta.bibl.u-szeged.hu/78417/. Downloaded: 12 December 2024.

Yang, Z.G. – Laki, L.J. (2021): Improving Performance of Sentence-level Sentiment Analysis with Data Augmentation Methods. In: IEEE (ed.): 12th International Conference on Cognitive Infocommunications (CogInfoCom 2021): Proceedings. Institute of Electrical and Electronics Engineers (IEEE), pp. 417–422.

Yang, Z.G. – Váradi, T. (2023): Training Experimental Language Models with Low Resources, for the Hungarian Language. Acta Polytechnica Hungarica, 20(5): 169–188. https://doi.org/10.12700/APH.20.5.2023.5.11

Zhang, Z. – Strubell, E. – Hovy, E. (2022): A Survey of Active Learning for Natural Language Processing. In: Goldberg, Y. – Kozareva, Z. – Zhang, Y. (eds.): Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 6166–6190. https://doi.org/10.18653/v1/2022.emnlp-main.414