Cross-Model Evaluation of LLMs for Generating Formal Specification of Distributed Industrial Control Systems
Conference
Raptis, GE, Khan, MT, Koulamas, C et al. (2025). Cross-Model Evaluation of LLMs for Generating Formal Specification of Distributed Industrial Control Systems
. 270 5846-5854. 10.1016/j.procs.2025.10.053
Raptis, GE, Khan, MT, Koulamas, C et al. (2025). Cross-Model Evaluation of LLMs for Generating Formal Specification of Distributed Industrial Control Systems
. 270 5846-5854. 10.1016/j.procs.2025.10.053
The increasing complexity and decentralization of Industrial Control Systems (ICS) have expanded the attack surface for cyber-security threats, particularly in critical infrastructure domains. Inline monitoring using formal annotations like the Java Modeling Language (JML) offers a lightweight yet precise method to detect behavioral anomalies. However, the manual creation of such specifications is resource-intensive and requires domain expertise. This paper explores generative artificial intelligence (AI), specifically large language models (LLMs), to automate the synthesis of inline formal security monitors. We benchmark three state-of-the-art LLMs (GPT-4o, DeepSeek-V3, and Gemini 2.5 Flash) on their ability to generate JML annotations for ICS software drawn from the ASM2S water distribution system. Our evaluation across five dimensions (syntax, semantics, property coverage, alarm semantics, and effort savings) reveals distinct trade-offs. GPT-4o demonstrates strong syntactic and structural alignment with ASM2S, while DeepSeek-V3 offers richer behavioral modeling. Gemini 2.5 Flash showcases conceptual depth but introduces non-verifiable constructs. These findings demonstrate the potential of LLMs as co-pilots in secure-by-design ICS development and underscore the need for syntax-aware fine-tuning and interactive verification workflows.