Skip to content
← Back to blog

Beyond Checkbox Compliance: Why AI Agents on Mission-Critical Infrastructure Need Formal Behavioral Verification

A few weeks ago, IBM i security expert Carol Woodbury raised an alarm that should concern every enterprise running AI agents on mission-critical infrastructure: MCP — the protocol that lets AI agents talk to each other and access data sources — was deployed with "very little messaging around security."

Her concern is precise and well-founded. AI agents on enterprise systems behave differently from human users. They operate faster, they delegate to sub-agents, they cross system boundaries without the social friction that slows down human actors. And the security tooling built for human users — audit journals, exit points, authority profiles — was never designed to verify that autonomous agents are behaving correctly in real time.

This isn't hypothetical. IBM itself acknowledges the problem. Their 2026 cybersecurity predictions note that autonomous AI agents "access sensitive data with minimal human oversight" and "replicate and evolve without leaving clear audit trails." Their solution — watsonx.governance — monitors agent lifecycles and provides compliance accelerators. PowerSC monitors file integrity, hardens configurations, and produces audit-ready reports against STIG, PCI, SOX, and HIPAA.

These are valuable tools. They are also insufficient for the problem AI agents create. Here's why.

The Verification Gap in Enterprise AI Security

Current enterprise security tools verify three things: configuration (is the system hardened correctly?), access (does this user have the right permissions?), and activity (what did this user do?). They answer "who did what" and "was the system configured correctly."

AI agents create a fourth question that existing tools don't answer: is the agent behaving correctly right now?

Configuration hardening tells you the system is locked down. It doesn't tell you that an AI agent with valid credentials is making decisions that violate your business rules. Access control tells you the agent has permission to read customer records. It doesn't tell you the agent is reading customer records at 3 AM in a pattern that looks like data exfiltration but technically falls within its permission scope. Activity logging tells you what happened. It doesn't tell you whether what happened was correct — whether the agent's behavior was consistent with what it was supposed to do.

This is the verification gap: the distance between monitoring what agents do and proving that what they do is right.

PowerSC can tell you that an AIX system is configured according to STIG requirements. It cannot tell you that an AI agent running on that system is making decisions consistent with your compliance policies. The audit journal records every SQL query an agent executes through Mapepire or ODBC. It does not verify that the pattern of queries represents legitimate business logic rather than an emergent behavior the agent's designers didn't anticipate.

What Formal Behavioral Verification Looks Like

We've been working on this problem from a different angle. Our framework, substrate-guard, was built to protect autonomous AI ecosystems — systems where agents operate continuously without human supervision. The problem we faced is the same one enterprise infrastructure faces with AI agents: how do you prove, not just monitor, that autonomous systems are behaving correctly?

The approach has six layers:

Layer 1: Behavioral Observation. Every action an agent takes — every API call, every data access, every decision — is captured as an event. Not through polling or periodic audits, but through continuous observation at the system level. On Linux, this uses eBPF for kernel-level tracing. The principle applies to any platform: observe everything, in real time, without modifying the system being observed.

Layer 2: Policy Evaluation. Each observed event is evaluated against a set of declarative safety rules. These aren't configuration checks — they're behavioral assertions. "No single agent should access more than N customer records per hour." "API costs should not exceed $X per day." "Response latency should not exceed Y milliseconds." The policies describe what correct behavior looks like, and every event is checked against them continuously.

Layer 3: Formal Verification. For critical properties, we go beyond rule checking to mathematical proof. Using Z3 SMT solvers, we can formally verify that certain behavioral properties hold — not just that they haven't been violated yet, but that they cannot be violated given the current configuration. This is the difference between "we haven't seen a fire" and "the building is made of materials that cannot burn."

Layer 4: Tamper-Evident Audit Trail. Every event, every policy decision, and every verification result is recorded in an HMAC-SHA256 chain. Each entry contains the hash of the previous entry, making the chain tamper-evident: any modification, deletion, or insertion of entries breaks the chain and is immediately detectable. This isn't blockchain — it's simpler, faster, and doesn't require distributed consensus. It's cryptographic proof that the audit record hasn't been altered.

Layer 5: Attestation. The system can produce signed attestations — cryptographic statements that a specific set of properties held at a specific time, verifiable by any third party without revealing the underlying data.

Layer 6: Offline Operation. The entire verification stack works without internet connectivity, using CRDT-based synchronization when reconnected. For air-gapped environments or systems where network isolation is a security requirement, verification continues uninterrupted.

Real Numbers

This isn't a whitepaper or a proposal. substrate-guard is deployed in production on two independent ecosystems simultaneously:

Deployment 1 (AI Research Agency): 7 OPA-based policy rules, daily audit cycles at 04:00 UTC, 2,943+ events processed, 0 violations, running for 6+ days.

Deployment 2 (SUBSTRATE Ecosystem): 10 Python-based safety rules, hourly audit cycles (46+ completed), 12,794 events processed (4,270 artifacts + 8,524 LLM trace spans), 70 violations detected (65 historical duplicates + 5 latency timeouts), HMAC chain verified intact from genesis.

Combined: 15,737+ events across two independent ecosystems with different rule sets, different audit frequencies, and different operational contexts. The same verification framework adapted to both. 0.02% violation rate on current production after initial audit.

The violations matter as much as the clean records. The 65 duplicate title detections prove the system catches real problems, not just confirms silence. The 5 latency violations caught LLM calls that exceeded 120-second thresholds — operationally normal but important to flag and document.

The framework generates compliance evidence mapped to SOC 2 Type II controls (CC6.1, CC7.2, CC8.1, CC4.1), ISO 27001:2022 (A.8.15, A.8.16, A.5.23), and ISO/IEC 42001:2023 (AI Management System). Not as checkbox compliance, but as continuous, cryptographically verifiable evidence.

Why This Matters for Mission-Critical Infrastructure

Enterprises running AI agents on IBM Power, AIX, IBM i, or any mission-critical platform face a specific version of this problem. Their existing security stack — PowerSC, audit journals, exit points, SIEM integration — monitors activity and enforces configuration. But AI agents create behavioral patterns that configuration monitoring cannot catch.

An agent that has correct permissions, accesses data through approved channels, and never triggers a configuration violation can still behave incorrectly — making decisions that violate business logic, accessing data in patterns that indicate drift, or interacting with other agents in ways that create emergent risks nobody designed for.

Carol Woodbury's advice to "treat AI agents like users with profiles and minimum authority" is exactly right as a starting point. But authority management tells you what an agent can do. Behavioral verification tells you what an agent is doing — and proves whether it's correct.

The gap between "can do" and "is doing correctly" is where the real risk lives. Closing it requires moving from configuration compliance to behavioral verification — from monitoring what's allowed to proving what's correct.

The Approach

We're not proposing that enterprises replace their existing security infrastructure. PowerSC, audit journals, SIEM integration — these are essential and should remain. What we're proposing is an additional layer: formal behavioral verification that sits alongside existing monitoring and adds mathematical proof to what is currently observational evidence.

The verification framework is open source. The paper describing its architecture is published on arXiv (2603.21149). The code is on GitHub. The production data — 15,737+ events across two independent deployments — is real and growing daily.

For enterprises evaluating how to secure AI agents on mission-critical infrastructure, the question is not whether to monitor agent behavior — that's table stakes. The question is whether monitoring alone is sufficient, or whether the consequences of incorrect agent behavior on systems that run banking, healthcare, insurance, and government operations demand something stronger: formal proof that agents are doing what they're supposed to do, continuously, verifiably, and tamper-evidently.

We believe the answer is obvious. But we're researchers, not salespeople. The data speaks for itself.


Octavian Untila is the founder of Aisophical SRL and creator of the SUBSTRATE autonomous AI ecosystem platform. substrate-guard is open source at github.com/octavuntila-prog/substrate-guard. The formal verification paper is available at arXiv:2603.21149.

For enterprises interested in evaluating behavioral verification for AI agents on mission-critical infrastructure, contact: octav@aisophical.com

Comments

Sign in to join the conversation.

No comments yet. Be the first to share your thoughts.