Memori: A Structured Memory Layer for Scalable LLM Agents

Marcus had been talking to his AI assistant for three months. It knew his work style, his preferences, even the inside jokes they'd developed. Then one day, he started a new session — and the assistant had no idea who he was.

It was like meeting a stranger who wore his friend's face.

If you've used AI agents, you've lived this. Every session feels like a first date. You either pay enormous token costs to paste in your whole conversation history, or you accept that your agent forgets everything.

That frustration is behind a new paper that's getting attention in the AI research community. It's called Memori, and it's a persistent memory layer for LLM agents — a system designed to solve one of the most annoying problems in AI: making agents actually remember things.

The core insight is deceptively simple: memory isn't a storage problem. It's a structuring problem.

The Memory Crisis

Here's what's happening right now. When you want an AI agent to "remember" something, developers typically do one of two things:

Stuff the entire conversation history into the prompt — expensive, slow, and eventually breaks as context windows overflow
Keep things stateless — agent starts fresh every time, no continuity

Neither works well. We're essentially paying to re-read the entire internet every time we want continuity.

The math is brutal. Full-context approaches can use 25,000+ tokens per query. That's expensive. And as context grows, models get worse at finding relevant information — researchers call it "context rot."

Memori's Fix: Structured Memory

Memori doesn't just store conversations. It transforms them.

The system uses what's called an "Advanced Augmentation pipeline" that converts raw dialogue into what researchers call "semantic triples" — compact representations of who did what to whom, plus conversation summaries.

Think of it like this: instead of storing a 5,000-word transcript of a conversation, Memori extracts the key facts and stores those. When the agent needs context, it retrieves only the relevant bits — not the whole mess.

The results, tested on the LoCoMo benchmark:

81.95% accuracy — better than existing memory systems
1,294 tokens per query — roughly 5% of what full-context approaches need
67% fewer tokens than competing approaches
20× cheaper than full-context methods

Why This Matters

For the average AI user, this could change everything.

Consider the investigative journalist who builds context across months of sources. The product manager who trains their assistant on company preferences. The developer who wants their coding agent to remember their architectural decisions.

All of these people today either pay through the nose for continuity or start from zero every time. We're showing there's a third path.

The paper also touches on something deeper: the path to general intelligence may run through memory. If agents can't remember, they can't learn from experience. They can't evolve. They're just stateless functions that happen to generate text.

Memory isn't a feature. It's a foundational pillar.

Sources:

Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents — arXiv, submitted March 20, 2026
Memori on GitHub — Open source implementation
LoCoMo Benchmark — Benchmark used for evaluation