A Comparative Study on the Financial Term Comprehension Capabilities of Different Types of Large Language Models
DOI: https://doi.org/10.62517/jike.202604227
Author(s)
Qirui Yang
Affiliation(s)
Beijing University of Science and Technology, Beijing, China
Abstract
Since 2023, the application of large language models (LLMs) in the financial sector has moved from the proof-of-concept stage to practical, large-scale implementation. The level of a model's understanding of terminology not only determines its ability to perform practical tasks such as risk control, investment research, and customer service but also serves as a critical standard for regulators to assess the compliance of LLMs. This study selected eight representative models-GPT-4o, Claude Sonnet 4.5, Gemini 1.5 Pro, Qwen2.5-72B, ERNIE-Speed, Spark 3.5, FinGPT 4.0 Instruct, and FinBERT-Large-to establish a unified evaluation system for financial terminology, conducting explanation tasks on 200 financial terms. The research employed quantitative metrics such as text similarity, coverage, and response length, complemented by qualitative observation and budget analysis, to analyze the strengths, weaknesses, and applicable scopes of various models. The results show that financial domain-specific models hold significant advantages in semantic alignment and term recognition. International general-purpose models demonstrate stable performance in complex long-text tasks due to their multimodal and long-context processing capabilities. Domestic general-purpose models offer higher cost-effectiveness in large-scale deployment scenarios due to their low cost and strong adaptability to Chinese corpora. This paper also summarizes model combination strategies and provides recommendations for establishing industry evaluation standards, offering references for financial institutions in model selection, deployment, implementation, and management.
Keywords
Large Language Models; Financial Terminology; Model Evaluation; FinGPT; Compliance Governance
References
[1] Li Wei. "Keynote Speech at the 2024 China Fintech Forum: Promoting the Safe Application of Financial Large Models." Technology Department, People's Bank of China, 2024.
[2] People's Bank of China. Financial Technology Development Plan (2022–2025). Beijing: People's Bank of China, 2022.
[3] Shanghai Artificial Intelligence Laboratory. Financial Large Model Application Evaluation Report. Shanghai: Shanghai Artificial Intelligence Laboratory, 2024.
[4] OpenAI. “GPT-4o System Card.” OpenAI Technical Report, 2024.
[5] Anthropic. “Claude 3.5 Sonnet Model Card.” Anthropic Technical Documentation, 2024.
[6] Google DeepMind. “Gemini 1.5 Pro Technical Report.” Google DeepMind Publications, 2024.
[7] Toubao Research Institute. 2024 China Large Language Model Capability Analysis. Beijing: Toubao Research Institute, 2024.
[8] Araci, D. "FinBERT: Financial Sentiment Analysis with Pre-trained Language Models." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
[9] Entropy Technology. FinBERT2 Financial Large Model White Paper. Shanghai: Entropy Technology, 2024.
[10] Yang, Z., Brashears, T., Hu, J., et al. “FinGPT: Open-Source Financial Large Language Models.” arXiv:2306.06031, 2023.
[11] Wu, S., Lee, J., Hall, J., et al. “BloombergGPT: A Large Language Model for Finance.” arXiv:2303.17564, 2023.
[12] Zhang, W., Zhang, Y., Wang, Y., et al. “FinBen: A Holistic Financial Benchmark for Large Language Models.” arXiv:2402.12659, 2024.
[13] Li, Z., Zhang, H., Sun, Y., et al. “XFinBench: Benchmarking LLMs in Complex Financial Problem Solving and Reasoning.” arXiv:2508.15861, 2025.
[14] Chen, Z., Yang, X., Luo, Y., et al. “InvestorBench: A Benchmark for Financial Decision-Making Tasks with LLM-based Agents.” arXiv:2412.18174, 2024.
[15] S&P Global Market Intelligence & Kensho. “AI Benchmarks for Financial Services.” Kensho Whitepaper, 2024.