Automated semantic ontology construction for foresight studies using large language models

Authors

DOI:

https://doi.org/10.20535/SRIT.2308-8893.2026.2.09

Keywords:

foresight, large language models, semantic ontology, scenario analysis, weak signals, hierarchical clustering

Abstract

Recent advances in large language models (LLMs) enable the automated discovery of semantic structures and emerging signals within text streams, offering an opportunity to redesign foresight workflows into continuous, data-driven systems. This study aims to develop and validate an automated framework for extracting, structuring, and comparing semantic ontologies using LLMs. The paralyzed approach was used for data mining from social media platforms and filtering non-domain data. The key semantic elements, goals and hypernyms corresponded, were extracted using multiple LLM configurations, with a consensus mechanism to provide semantic reliability and minimize hallucination. The extracted elements were embedded in a high-dimensional vector space, clustered iteratively using cosine similarity, and merged hierarchically. Convergence process and structural stability were analyzed using the elbow criterion and similarity metrics. The Proposed approach provides a cost-efficient alternative to traditional expert-based foresight analysis. By integrating LLM-driven semantic extraction with quantitative clustering, it enables the identification of emerging trends, weak signals, and long-term thematic structures. The results highlight the potential of LLM-based semantic modeling as a foundation for automated foresight systems.

References

M. Zgurovsky, N. Pankratova, System Analysis & Intelligent Computing, Theory and Applications. Berlin: Springer, 2022, 432 p. doi: http://doi.org/10.1007/978-3-030-94910-5

A. Rosa, N. Gudowsky, P. Repo, “Sensemaking and lens-shaping: Identifying citizen contributions to foresight through comparative topic modelling,” Futures, vol. 129, pp. 1–15, 2021. doi: http://doi.org/10.1016/j.futures.2021.102733

C. Mühlroth, M. Grottke, “Artificial Intelligence in Innovation: How to Spot Emerging Trends and Technologies,” IEEE Transactions on Engineering Management, vol. 69, no. 2, pp. 493–510, April 2022. doi: https://doi.org/10.1109/TEM.2020.2989214

Y. Kishita, T. Kusaka, Y. Mizuno, Y. Umeda, “Toward theory development in futures and foresight by drawing on design theory: A commentary on Fergnani and Chermack 2021,” Futures & Foresight Science, vol. 3, issue 3-4, 2021, pp. 1–3. doi: https://doi.org/10.1002/FFO2.91

O. Matei, R. Erdei, D. Delinschi, “Multimodal transportation overview and optimization ontology for a greener future,” Artificial Intelligence in Intelligent Systems: Proceedings of 10th Computer Science On-line Conference, vol. 2, pp. 158–172. Springer 2021. doi: https://doi.org/10.1007/978-3-030-77445-5_15

Y. Chen, S. Sabri, A. Rajabifard, M. Agunbiade, “An ontology-based spatial data harmonisation for urban analytics,” Computers, Environment and Urban Systems, vol. 72, pp. 177–190. Elsevier, 2018. doi: https://doi.org/10.1016/j.compenvurbsys.2018.06.009

T. Brown et al., “Language Models are Few-Shot Learners,” arXiv preprint, 75 p., 2020. Available: https://arxiv.org/abs/2005.14165

J. Achiam et al., “Gpt-4 technical report,” arXiv preprint, 100 p., 2023. Available: https://arxiv.org/abs/2303.08774

Gemini Team Google: Rohan Anil et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint, 90 p., 2025. Available: https://arxiv.org/abs/2312.11805

xAI.Grok 3 beta - the age of reasoning agents. Available: https://x.ai/news/grok-3/

Victory Drones [Telegram channel], 2022–2025. Available: https://t.me/VictoryDrones

Y. Chen, X. Pan, Y. Li, B. Ding, J. Zhou, “EE-LLM: Large-scale training and inference of early-exit large language models with 3D parallelism,” arXiv preprint, 27 p., 2024. Available: https://arxiv.org/abs/2312.04916

O. Michel, R. Bifulco, G. Retvari, S. Schmid, “The Programmable Data Plane: Abstractions, Architectures, Algorithms, and Applications,” Proc. ACM Computing Surveys (CSUR), vol. 54, issue 4, pp. 1–36, 2021. doi: https://doi.org/10.1145/3447868

T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” Proceedings of the 2020 Conference on Empirical Methods in Natural LanguageProcessing: System Demonstrations, pp. 38–45, 2020. doi: https://doi.org/10.18653/v1/2020.emnlp-demos.6

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubig, “Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023. doi: https://doi.org/10.1145/3560815

E. Yu et al., “Merlin: Empowering Multimodal LLMs with Foresight Minds,” arXiv preprint, 28 p., 2023. doi: https://doi.org/10.48550/arXiv.2312.00589

N. Muennighoff, N. Tazi, L. Magne, N. Reimers, “MTEB: Massive Text Embedding Benchmark,” Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Dubrovnik, Croatia, 2023, pp. 2014–2037. doi: https://doi.org/10.18653/v1/2023.eacl-main.148

A. Lucky, T. Kartik, B. Gaurav, M. Ankush, “Authorship Clustering using TF-IDF weighted Word-Embeddings,” Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE 19). Association for Computing Machinery, New York, NY, USA, 2019, pp. 24–29. doi: https://doi.org/10.1145/3368567.3368572

Downloads

Published

2026-06-30

Issue

Section

Methods, models, and technologies of artificial intelligence in system analysis and control