r/ClaudeAI 15d ago

Claude Code Workflow MCP server for the TLA+ model checker tla-rs

Hi all,

Just shipped an MCP server some of you might find useful: **tla-mcp**.

TLA+ is a formal-spec language for designing concurrent and distributed
systems. You describe what your protocol should do and a model checker
tries every reachable state to catch invariant violations, deadlocks,
race conditions you didn't see coming. With tla-mcp registered, Claude
Code can call the checker as a first-class tool: validate a spec, run a
bounded check with a counterexample trace, replay specific scenarios, all from inside the chat.

Tool descriptions are deliberately opinionated about how the model
should use the checker (budget all limits upfront, treat `limit_reached`
as inconclusive, look at the last transition of a trace first) so the
guidance survives context truncation.

Install + client config snippet + tour of the four tools is on the
landing page: **https://fabracht.github.io/tla-rs/**

It's an experiment. Feedback and bug reports welcome.
1 Upvotes

5 comments sorted by

2

u/[deleted] 15d ago

[removed] — view removed comment

1

u/Anxious_Tool 14d ago

Thanks. The "specify, prove, then implement" framing isn't accidental in the tool descriptions. It took a few iterations to land. Early sessions showed agents defaulting to generate-first even when the checker was available. They treated check_spec as a post-hoc review step rather than a design step. The current descriptions push the other way: validate_spec is explicitly described as something to call after every edit, and check_spec tells the model to budget all three limits upfront and read counterexample traces back-to-front. It's prompt engineering happening inside the tool registry. The fact that it survives context truncation is the side benefit you mentioned.

The other thing that makes this combination promising: LLMs are good at writing TLA+ specs. The historical barrier was always the syntax, which is unforgiving and an acquired taste, plus the worldview shift of specifying what holds rather than what runs. With the model handling the translation from intent to TLA+ and the checker keeping it honest, the activation energy for formal methods drops a lot. The bet behind tla-mcp is that the bottleneck for adoption was never the math. It was the keystrokes.

2

u/pquattro 15d ago

his is a great use of MCP for formal verification workflows. I’ve used TLA+ for protocol design and found that integrating model checking into dev tools (like this) significantly improves iteration speed. One thing to watch: tla-rs’s output format can vary between versions, so pinning the CLI version in the MCP server might help avoid brittle parsing. Have you considered adding a tool to extract the ‘Next’ actions from counterexample traces? That’s often the most actionable part of a failure report.

1

u/Anxious_Tool 14d ago

Thanks. Both good points, and both already non-issues by design.

Version pinning / brittle parsing: tla-mcp doesn't wrap the tla CLI or parse its text output. It links tla-rs as a library and returns typed Rust structs serialized to a versioned JSON schema (schema_version: "1", frozen, bumped explicitly on breaking changes). The failure mode you're warning about, CLI output drift between versions, can't happen here.

Extracting Next actions from counterexamples: already there. check_spec's invariant_violation outcome returns:

{
  "status": "invariant_violation",
  "invariant": "<name>",
  "trace": [ <StateSnapshot>, ... ],
  "actions": [ null, "<action>", "<action>", ... ],
  "stats": { ... }
}

The actions array is index-aligned with trace. actions[i] is the action name (the disjunct of Next) that produced trace[i]. actions[0] is null because the initial state has no predecessor. The tool description tells the agent to read actions[len-1] first when investigating a violation, since that's almost always where the bug is.

One thing that isn't there yet: action parameters. The array gives you the action name but not its parameter bindings (Reader(p1) vs Reader(p2), which process, which message). If you're doing protocol design that's almost certainly what you want. Worth opening an issue for.