Letta and Mem0: What AI Memory Looks Like When You Actually Need It

Memory is the agent feature everyone thinks they need until they sit down and try to specify what they mean by it. The word covers at least four different things: keeping a chat history under the context limit, recalling facts about a user, recovering state across deploys, and learning over time. Most apps need only the first one and call it a day with a vector database.

This post is about the cases where you genuinely need more, and the two projects that handle them well. Letta and Mem0 come at the problem from different directions. Letta treats the LLM like an operating system with a hierarchical memory layout. Mem0 is a memory layer you bolt onto whatever framework you already use.

If you want the broader agent framework picture first, the How to Build with AI Agents post covers where memory fits in the stack.

When memory matters and when it does not

Skip the memory layer if any of these are true:

Your conversations are short enough to fit in a single context window.
You can stuff a system prompt with the user's profile.
The application has no notion of returning users.

Reach for a memory layer when:

Conversations span sessions and the user expects continuity.
You need to extract durable facts from chat history (preferences, history, decisions).
You want the agent to evolve, picking up new patterns from interactions.
Multiple agents need to share knowledge.

The line is fuzzier than it sounds. A customer support bot may want both: a recent-message buffer for the current ticket and a user-level memory for past tickets and preferences.

Letta: hierarchical memory blocks

Letta started life as MemGPT, the project that demonstrated treating LLM context like a paged memory system. Today the framework is the production form of that research. The core abstraction is the memory block: a discrete, named region of state that the agent can read from and write to during a run.

You typically define blocks for things like persona (who the agent is), human (what the agent knows about the user), and any number of custom blocks for domain knowledge. The agent can edit these blocks via tools the framework gives it, which means continuity is built-in rather than an afterthought.

from letta_client import Letta

client = Letta(token="...")
agent = client.agents.create(
    memory_blocks=[
        {"label": "persona", "value": "You are a coding assistant who remembers the user's stack."},
        {"label": "human", "value": "Name: Riley. Stack: Python, Postgres, FastAPI."}
    ],
    model="openai/gpt-4o-mini"
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "What did we decide about the rate limiter?"}]
)

Letta is the right pick when memory is a first-class concern of your product. Tutors, coaches, ongoing project assistants, support agents that span multiple sessions. The framework comes with a CLI for local development, SDKs for Python and TypeScript, and is model-agnostic.

Mem0: a memory layer you plug in

Mem0 takes the inverse approach. Instead of being your agent framework, it is a memory layer you compose into whatever stack you already have. Library, self-hosted server, or managed cloud, your pick. The README cites benchmark results of 91.6 on LoCoMo and 93.4 on LongMemEval.

The architecture splits into two operations: extract and recall. Extract is a single-pass, ADD-only operation that pulls facts out of conversation turns and stores them with entity links. Recall combines semantic search, BM25 keyword matching, and entity-based boosting to surface the right context for the next turn. Memories accumulate continuously rather than being overwritten, which is a deliberate design choice.

from mem0 import Memory

m = Memory()

m.add("I prefer dark mode and my IDE is Cursor", user_id="riley")
m.add("My usual coffee order is oat milk flat white", user_id="riley")

memories = m.search("What does Riley like?", user_id="riley")
for entry in memories["results"]:
    print(entry["memory"])

Mem0 has SDKs for Python and TypeScript, a REST API, and integrations for LangGraph, CrewAI, and the Vercel AI SDK. There are also pre-built skills for Claude, Cursor, and other coding assistants. If you want memory that stays out of the way of your existing architecture, this is the pick.

How they actually differ

Letta is opinionated about your agent shape. You build agents inside Letta, and memory is the spine of the framework. Mem0 is unopinionated. You keep your agent framework, you keep your tool calls, you just call Mem0 when you want to remember or recall.

Letta's memory blocks are explicitly structured and writable by the agent. The agent knows it has a human block and can update it. Mem0's memories are extracted automatically by the LLM, stored as facts with entity links, and surfaced via search. There is less explicit structure, more emergent organization.

In practice the choice often comes down to where you start. Already shipping a CrewAI or LangGraph agent and need long-term recall? Add Mem0. Building a conversational product where memory is the differentiator? Build it on Letta from day one.

Integration patterns that work

A few patterns I have seen ship cleanly:

Customer support bot uses Mem0 to extract user preferences and past issues, layered on top of a LangGraph state machine that handles the ticket workflow.
Tutoring product builds the entire app on Letta because the agent's evolving understanding of the student is the product.
Engineering copilot uses Mem0 to remember which files a developer has worked on and what conventions a project follows, layered on top of a coding agent.
Multi-agent research workflow uses Letta to give each agent a persistent identity and accumulated knowledge across runs.

What to watch for

Memory layers introduce a privacy surface. You are persisting user data, often verbatim, in a database. Plan for deletion, export, and access control before you ship. Both projects have docs on this, but the responsibility is yours.

Memory also has a recall quality problem at scale. Once you have thousands of memories per user, naive vector search starts surfacing irrelevant facts. Mem0's multi-signal retrieval helps. Letta's structured blocks help in a different way. Either way, evaluate recall quality on your own data before committing.

Finally, do not assume memory makes the agent smarter. It makes the agent more consistent. If the underlying reasoning is bad, persistent memory just makes it consistently bad.

External references: the Letta repository and the Mem0 repository both have current architecture docs.

Tools mentioned in this post

Letta: Stateful agents with hierarchical memory blocks, formerly MemGPT.
Mem0: Universal memory layer with extraction and multi-signal recall.

Letta and Mem0: What AI Memory Looks Like When You Actually Need It

Letta and Mem0: What AI Memory Looks Like When You Actually Need It

When memory matters and when it does not

Letta: hierarchical memory blocks

Mem0: a memory layer you plug in

How they actually differ

Integration patterns that work

What to watch for

Tools mentioned in this post

Related Tools

Letta

Mem0

More Articles

CrewAI vs AutoGen vs Pydantic AI: A Hands-On Agent Framework Shootout

The Agent Framework Landscape: A 2026 Buyer's Guide for Builders

DSPy and the Rise of Programmatic Prompting