Instruments that help people leverage AI

Context engineering. Retrieval workflows. Reproducible benchmarking. Spec management for the humans and agents doing the work.

domains

structured engineering

MBSE, SDLC graphs, code repair

method

per-task scoring

effect sizes, not leaderboards

tools

open source

Rust + Python

outputs

papers + MRs

preprints, upstream contributions

O12

One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.

Pre-rendered model views scored 0.893 vs 0.558 for agent-assembled context. d=1.01, N=10. 4x cheaper.

Exploratory study, single corpus, N=3-10 replications. Full methodology and threats →

What we build

synthesist

spec management

Tracks work through orient-plan-agree-execute-reflect-report. Task DAGs, propagation, stakeholder dispositions. Rust.

rune

skill registry

Syncs reusable AI agent instructions from git-based registries. Bidirectional, multi-registry, drift detection. Rust.

kit

tool verification

Manages developer toolchains from git registries. Checksum and signature verification. Generates mise config. Rust.

muxr

session manager

Tmux sessions organized into verticals and remotes. Save, restore, server isolation. One keybind to switch. Rust.

sysml

CLI + MCP server

Structural retrieval, graph traversal, and completeness checking for SysML v2 models. Rust. 14 commands, 10 MCP tools.

sysml-bench

benchmark harness

Reproducible evaluation of tool-augmented LLMs on structured engineering tasks. Python.

lever

reference architecture

Four primitives for LLM-correct codebases. Derived obligations, prescriptive failure, bundled enforcement.

kebnf

grammar converter

Converts OMG KeBNF specs to ANTLR4 and tree-sitter. Bridges OMG specifications and working parsers. Rust.

tree-sitter-sysml

parser

Tree-sitter grammar for SysML v2. 6 language bindings. The parsing foundation for the MBSE toolchain.

How this started

The lab started with a narrow question: how does AI interact with structured engineering artifacts? We built tools, ran benchmarks, wrote papers, and the same shape kept showing up across domains. Along the way we found alignment with GitLab's Knowledge Graph team, who are solving related context-engineering and retrieval problems at production scale on the SDLC side. We've been contributing findings on prescriptive failure patterns and tool description effectiveness into their eval methodology.

Everything is MIT-licensed and on GitLab. If you have a domain with structured artifacts and want to know where AI leverage actually lives inside it, we'd like to talk.

About us →