7 Ways Mimir Makes LLMs Safe Enough for People Who Don’t Trust Each Other

John W8MEJ Menerick · April 9, 2025

Autonomous Security Zero-Trust AI Secure Multiparty Computation Trusted Execution Environments Confidential Computing LLM Security Federated Inference Cryptography Enterprise Defense Security at Scale

Most LLM architectures quietly assume one thing: somebody, somewhere, is trusted.

The cloud provider can see your prompts. The model host can see your weights. The infrastructure team can see everything if they really want to.

Mimir is built for the world where that assumption breaks.

It’s a proof-of-concept framework for running collaborative LLM inference across mutually distrustful parties—using a shared Transformer model—without exposing prompts or weights to anyone who shouldn’t see them. oai_citation:0‡GitHub In other words: inference as a cryptographic treaty, not a friendly API call.

“Collaborative LLM inference where secrets stay secret.” oai_citation:1‡GitHub

Here are the seven most interesting ideas hiding in the repo.

1. Inference Becomes a Multi-Party Treaty, Not a Single API Call

Most LLM stacks assume a simple picture: one client, one model host, one request. Mimir’s mental model is closer to a negotiation table.

The README describes it as enabling multiple mutually distrustful parties to jointly run autoregressive inference on a shared model, without revealing private inputs or parameters. oai_citation:2‡GitHub

Parties A and B each submit prompts into a Coordinator that orchestrates the session. Instead of “call /v1/chat/completions and hope for the best,” inference becomes a structured protocol requiring participation and agreement from all sides. oai_citation:3‡GitHub

Why this is interesting:
It reframes LLMs from “services you consume” into shared infrastructure you don’t fully trust. That’s much closer to how large enterprises, coalitions, or cross-org collaborations actually operate.

2. MPC Does the Math, TEEs Guard the Keys

Mimir doesn’t pick between Multiparty Computation (MPC) and Trusted Execution Environments (TEEs); it layers them.

All the heavy math—attention, MLP, projections—runs over secret shares using MPC. oai_citation:4‡GitHub
Critical secrets (like decryption keys) live inside an enclave, which handles KMS decrypt and enforces mTLS-bound identity. oai_citation:5‡GitHub

The architecture diagram is explicit: a Coordinator with gRPC APIs, a Secure Attention module, MPC matmul, and an Enclave that anchors identity and key usage. oai_citation:6‡GitHub

Why this is interesting:
MPC alone protects data from the host, but not from whoever controls keys. TEEs alone protect keys, but still force you to trust the enclave operator. Mimir’s design says: use MPC to hide values, TEEs to control capabilities. That’s a more realistic threat model for shared, cross-org AI infrastructure.

3. Attention and MLP Are Re-Engineered for Secrecy, Not Just Speed

Most ML papers treat attention and MLP layers as performance problems. Mimir treats them as privacy attack surfaces.

In the “How It Works” section:

All attention, MLP, and projection computations are done over secret shares.
Secure matrix multiplication uses Beaver triples with MACed shares, i.e., SPDZ-style authenticated MPC. oai_citation:7‡GitHub
Even the exponential in softmax is approximated via a Chebyshev minimax polynomial to reduce information leakage. oai_citation:8‡GitHub

Why this is interesting:
It’s easy to talk about “confidential inference” in marketing copy. It’s much harder to go all the way down and ask: how does our choice of approximation for exp() affect what can leak through side channels or output distributions?

Mimir bakes that concern directly into the math, not just the policy.

4. Only the Next Token Escapes—The Rest Stays Secret-Shared

One of the most striking design choices: only the final predicted token is revealed. The key/value cache and intermediate activations never appear in plaintext; they remain secret-shared across parties. oai_citation:9‡GitHub

That means:

No party sees the full internal state of the model.
You can’t trivially reconstruct another party’s prompt or the underlying weights from cached activations.
The “observable surface area” of each step is intentionally tiny. oai_citation:10‡GitHub

Why this is interesting:
Most inference APIs leak a lot of structure—logits, full probability distributions, or detailed intermediate traces for debugging. Mimir goes the other way: minimal disclosure by default, and you have to justify any extra observability you want.

In a world of prompt-injection, membership inference, and model extraction attacks, that feels like the right default.

5. A Dedicated Triple Service Turns Cryptography into a Utility

MPC protocols live and die by their Beaver triples—precomputed random values used to make secure multiplication fast. Mimir doesn’t hide this complexity; it elevates it into its own service.

The README’s architecture calls out a Triple Service responsible for: oai_citation:11‡GitHub

Generating triples,
Performing sacrifice checks (i.e., sanity checks that triples weren’t tampered with),
Feeding them into the Coordinator’s MPC matmul engine.

Why this is interesting:
Instead of burying triple generation in some helper function, Mimir treats it like a first-class infrastructure dependency, much like a key management service or feature flag system. It makes explicit that high-assurance cryptography isn’t free—you provision it, monitor it, and scale it like any other critical component.

6. The README Reads Like a Threat Model, Not a Sales Pitch

Tucked right after the happy-path description is a blunt warning: this is a research PoC, not a proof-of-concept-ready system. oai_citation:12‡GitHub

The repo calls out limitations such as: oai_citation:13‡GitHub

MPC math is correct but not formally malicious-secure.
No padding to hide sequence lengths.
Attestation is placeholder-only—no SEV-SNP/TDX verifier integration yet.
Containers are non-root, but not fully sandboxed with AppArmor/Seccomp.
Timing is tested, but the kernels are not fully constant-time.

And then it lists very specific hardening work needed for a real deployment: constant-time kernels, full attestation verification, differential privacy or padding for length, SLSA-compliant CI, signed images, SBOMs, encrypted FS, key rotation, revocation… oai_citation:14‡GitHub

Why this is interesting:
A lot of “confidential AI” projects hand-wave around these details. Mimir does the opposite—it documents its own shortcomings so you can have an honest conversation about where research ends and engineering begins.

7. Deployment Targets Look Like a Threat-Model Wishlist

For an early-stage PoC, the range of planned and supported deployment targets is… ambitious: oai_citation:15‡GitHub

Local simulation via Docker Compose,
AWS Nitro Enclaves (EIF builds),
OCI Confidential VMs,
Torrents for distributing artifacts,
Hooks for Ethereum and Bitcoin anchoring,
A future Kubernetes deployment.

This isn’t just “run it on your laptop.” It’s a roadmap for running confidential inference in regulated, multi-cloud, and even on-chain contexts, where auditability and provenance matter as much as latency. oai_citation:16‡GitHub

Why this is interesting:
It hints at a future where “prove how you ran this model” might involve checking an enclave attestation, verifying MPC parameters, and following a Merkle-anchored log on a public chain—all for a single inference service.

The Big Idea: Inference Provenance as a First-Class Citizen

At first glance, Mimir looks like a very specific thing: a proof-of-concept for confidential, multi-party LLM inference that combines MPC and TEEs. oai_citation:17‡GitHub

But zoom out a bit, and it’s about something bigger:

Inference as a protocol, not a function call.
Secrecy as a default, not a bolt-on.
Cryptography and hardware security sharing the trust load instead of outsourcing it to “the cloud.”

The question it quietly asks is simple and uncomfortable:

If we had to design LLM systems for a world where nobody fully trusts anyone else, would they look more like Mimir than what we’re shipping today?

Because once you’ve seen an architecture where multiple parties can share a model without sharing their secrets, it’s hard to un-see how exposed most current inference stacks really are.

Share on: