GitHub All-Stars #2: Mem0 - Creating memory for stateless AI minds
Artur Skowroński
Head of Java/Kotlin Space
Published: Sep 3, 2025|17 min read17 minutes read
I’ve always claimed there’s no better way to learn anything than to build something… and the second best is to review someone’s code. As part of our GitHub All-Stars series, where we examine open-source gems, I stumbled upon a project that strikes at the heart of one of the most fundamental problems in modern AI.
So we’re continuing the trend that treats models as a Boring Black Box and shows that the interesting things happen around them in Context Engineering.
I’ll start with a hot (or perhaps more lukewarm) take - the current development path for AI agents, based on blindly chasing ever-larger context windows, is a dead end. It’s a brute-force approach that only delays the problem instead of solving it.
We’ve all experienced this. Modern AI agents seem incredibly smart. They hold fluid conversations, adapt tone, and even refer to recent statements. It feels personalized - until it doesn’t. Refresh the tab, close the session, or come back the next day - and for the agent it’s as if we never existed. This fundamental amnesia, the innate statelessness of language models, is the biggest obstacle to truly personalized and useful AI experiences.
The industry’s initial reaction was simple and seemingly logical: just make the context window bigger! And so we have models with context windows exceeding 200,000 tokens. The problem is that it’s like treating a headache with an ever-bigger hammer. It leads to two critical issues:
Performance and cost degradation: Latency and cost are directly proportional to the number of input tokens. Processing hundreds of thousands of tokens on every query is slow and astronomically expensive in production.
The “Lost in the Middle” problem: A model’s attention quality drops with very long contexts. Key information that appeared at the beginning of a conversation can be ignored or misinterpreted because it gets lost in a sea of other data.
No, this is not fine - this is where Mem0 enters the stage. Its philosophy is diametrically different. Instead of shoving the entire conversation history into the prompt, mem0 proposes an elegant memory-centric architecture that dynamically extracts, consolidates, and recalls only the information that’s truly relevant. It’s a shift from a model-centric paradigm to a system-centric approach - the very thing that interested me in deepagents.
To fully understand mem0, you need to know its creators and their motivation. As with deepagents and Harrison Chase, context is key (pun intended, couldn't resist). mem0 isn’t an academic experiment but a pragmatic tool born of real pain experienced by practitioners.
The main creators are Taranjeet Singh and Deshraj Yadav and their story matters, because they’re the founders of Embedchain, a popular RAG framework with over 2 million downloads. As they themselves admit, mem0 was created to solve a huge problem they faced while working on Embedchain: LLMs are stateless and forget everything after each session, leading to repetitive and inefficient interactions. That makes the project’s mission crystal clear and relatable to any developer who’s tried to build something more than a simple chatbot.
The project is accompanied by a formal research paper titled Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory, co-authored by, among others, Prateek Chhikara, Dev Khant, and Saket Aryan. This adds credibility and shows their solution is not only practical but also rigorously validated.
Looking at this genesis, it becomes clear that mem0 is the logical evolution of the RAG paradigm. Standard RAG retrieves information from a static knowledge base to enrich the prompt. Its basic limitation is that the knowledge base rarely learns from user interactions. mem0 goes a step further. You could say mem0 is RAG for conversation history. It applies the same retrieval-and-enrichment pattern, but not to external documents - rather to the dynamic, personal context generated during a conversation with the user.
Let’s get to the heart of mem0 - how does it actually work? The elegance of this system lies in its central mechanism: a two-phase pipeline consisting of Extraction and Update. This is what differentiates mem0 from a simple database and makes it an intelligent memory system.
Phase 1: Extraction - The art of distilling salient facts
It all starts with a new interaction, typically a pair of messages: the user’s query and the assistant’s response. Instead of blindly saving the entire exchange, mem0 intelligently decides what’s worth remembering.
The system draws context from three sources:
The latest exchange between the user and the assistant.
A rolling summary of the entire conversation so far.
The last X messages keep the immediate, short-term context.
This context is then fed to an LLM that’s instructed via a carefully designed system prompt. In the open-source version, that prompt (MEMORY_DEDUCTION_PROMPT) tells the model to focus on facts, preferences, and memories, and then generate a concise bullet list of “memory candidates.” It’s pure Context Engineering, just like we saw last week in deepagents.
Phase 2: The A.U.D.N. cycle - The LLM as a database operator
This is where the real “magic” happens (in Arthur C. Clarke’s sense) and where mem0’s most innovative pattern reveals itself. For each “memory candidate” extracted in the previous phase, the system runs a cycle called A.U.D.N. (Add, Update, Delete, No-op).
Semantic Search: The system first performs a semantic search in its vector store to find the k most similar existing memories.
Decision Delegated to the LLM: Instead of implementing complicated, brittle if/else logic to decide what to do with the new information, mem0 delegates that decision to the LLM. It presents the language model with similar found memories and the new candidate, and asks it to choose the appropriate “tool” to execute.
Operation Selection: The available tools are basic memory operations:
ADD: Add a new memory if it’s a completely new fact that didn’t exist before.
UPDATE: Update an existing memory if the new information complements, corrects, or refreshes it.
DELETE: Delete an existing memory if the new information contradicts it.
NOOP (No Operation): Do nothing if the new information is a repeat or irrelevant.
This pattern - in which the LLM serves as a decision component for data operations - is extremely powerful and flexible.
Consider how a traditional programmer would approach memory updates. They’d likely write a complicated state machine with dozens of rules:
1if new_fact.topic == old_fact.topic
2AND new_fact.timestamp > old_fact.timestamp AND is_not_contradictory(...) then update(...) else if....
At first glance, everyone can see such code would be a maintenance nightmare. mem0 flips the problem. Instead of coding the logic, it gathers the right data (the new fact, similar old facts) and asks the LLM a simple question:
1Given this context, which of these four actions (ADD, UPDATE, DELETE, NOOP) should I take?
The LLM’s ability to understand semantics and linguistic nuance lets it make decisions that would be extremely hard to explicitly program. For example, the model can understand that the statement “Actually, I don’t like cheese” is an update to a previously neutral state, not a completely new, unrelated fact.
It’s a brilliant example of shifting complex business logic into a configurable LLM. Complexity moves from procedural code to a declarative prompt (UPDATE_MEMORY_PROMPT). This simplifies the developer’s work and makes the system far more resilient to unforeseen conversational scenarios.
Now that we know how mem0 thinks, let’s look at where it stores those thoughts. Persistence is another area where the project shines. mem0 doesn’t rely on a single database technology. It uses a hybrid architecture combining various storage types for optimal performance: a vector store for semantic search, a graph database to model relationships (in the Mem0g variant), and potentially a key-value store for structured data.
Vector store: The foundation of semantic recall
The vector store is the core mechanism enabling mem0 to find “semantically similar” memories both in the Update phase and during user on-demand retrieval.
This is where mem0’s flexibility really shows. The system isn’t hard-wired to a single database. Instead, it has a pluggable architecture based on a Provider pattern. That means developers can integrate mem0 with their existing tech stack instead of being forced to deploy and maintain a new, unfamiliar technology. For example, configuration for Azure AI Search:
1memory_config = {
2 "vector_store": {
3 "provider": "azure_ai_search",
4 "config": {
5 "service_name": "your-search-service",
6 "api_key": "your-api-key",
7 "collection_name": "memories",
8 },
9 },
10 #... embedder and llm configs also required
11}
12memory = Memory.from_config(memory_config)
This plugin architecture drastically lowers the barrier to entry. Imagine a developer at a large enterprise standardized on Azure. If mem0 only supported, say, ChromaDB or another specific database, deploying it would require approvals, provisioning new infrastructure, and managing it. That’s a huge hurdle. With the plugin infrastructure, a developer can simply point mem0 at an existing service. This aspect can’t be overstated.
Graph database (Mem0g): Weaving facts into a knowledge network
Mem0g is an advanced variant that goes a step further. While vector search is great at answering “what’s similar?”, graph search excels at “what’s related?”. Mem0g translates conversations into a structured, directed graph where entities (people, places, preferences) become nodes, and relations between them (“lives in,” “likes,” “met with”) become edges. This structure enables much more complex, multi-hop reasoning. An agent can answer questions like: “Find Italian restaurants near the hotel my friend stayed at.” This is where Mem0g shows its strength, as evidenced by higher effectiveness on temporal and relational queries in the LOCOMO benchmark.
Let’s go a level deeper and look at the code - this is where architectural concepts translate into concrete patterns.
The Memory class: The system’s central hub
The main entry point for developers is the Memory class, defined in mem0/memory/main.py. It orchestrates the entire memory-management process. You can initialize the class in two ways. A simple Memory() call creates a default, in-memory ephemeral instance. But the real power lies in the factory method Memory.from_config(config), which lets you precisely configure every system component.
The interface is deceptively simple. The two primary methods are add and search. The add(messages, user_id, ...) method triggers the entire two-phase extraction and update pipeline described above.
1from mem0 import Memory
2
3m = Memory()
4
5messages = [
6 {
7 "role": "user",
8 "content": "I like to drink coffee in the morning and go for a walk"
Meanwhile, the search(query, user_id, ...) method retrieves relevant memories on demand. Under the hood, it runs a semantic search against the vector store.
1search_results = memory.search(
2 "What are this user's travel plans?",
3 user_id="demo_user",
4 limit=3
5)
6for i, result in enumerate(search_results['results'], 1):
As mentioned, mem0 heavily uses a Provider design pattern for external dependencies. This isn’t limited to vector stores. For instance, the system isn’t tied to OpenAI. It has an internal abstraction layer (likely a base class in mem0/llms/base.py, per standard practice) that lets you plug in different LLM providers. Want to use a local model via Ollama? No problem.
1import os
2from mem0 import Memory
3
4config = {
5 "vector_store": {
6 "provider": "qdrant",
7 "config": {
8 "collection_name": "test",
9 "host": "localhost",
10 "port": 6333,
11 "embedding_model_dims": 768, # Adjust to your model’s dimensions
12 },
13 },
14 "llm": {
15 "provider": "ollama",
16 "config": {
17 "model": "llama3.1:latest"
18 }
19 },
20}
21memory = Memory.from_config(config)
This dependency-abstraction pattern is the foundation of good software engineering. It makes the system testable, modular, and - most importantly - easy to extend. It’s one of the key lessons engineers can take from this project.
Two truisms to start: no AI tool lives in a vacuum, and theory is important, but nothing beats practice. To see how these abstract integration concepts perform in the wild, we applied them to a concrete, demanding business scenario.
We built a prototype agent-assistant for an insurance underwriter, using mem0 as its long-term memory and LangChain as the operational brain. This exercise showed us how deep and multifaceted the symbiosis between these two tools can be - and how to leverage it to build a truly useful system.
Before we continue, a quick explainer for the uninitiated. Underwriting is at the heart of any insurance company. It’s the process of evaluating and quantifying the risk associated with a prospective policyholder. Analysts (underwriters) review a wide range of information - from application forms to medical records to financial statements - to precisely determine the likelihood of a claim. Based on this, the underwriter decides whether to offer coverage, under what terms and conditions, and at what premium rate should be. You can read more here.
We based our approach on three key integration levels.
1. mem0 as the agent’s central memory module
In the simplest setup, the underwriter agent built in LangChain uses mem0 as its primary, persistent memory. This replaces default, often ephemeral memory mechanisms with an advanced, contextual knowledge store. Every piece of information - policy details, client claim history, specific exclusions, notes from prior assessments - is saved in mem0 and tied to a specific client or risk type. When the agent receives a new application from the same client, it doesn’t start from scratch. It immediately “remembers” the entire history.
1# The agent-assistant fetches key client info from the past
13 # Format retrieved memories for use in the prompt
14 serialized_history = ' '.join([mem["memory"] for mem in past_claims])
15 # ... further logic to build the agent prompt
2. mem0 as an analytical toolset
At a more advanced level, the underwriter agent gains autonomy in using its memory. mem0’s functions are exposed as dedicated tools (StructuredTool) that the agent can use deliberately. That means the agent decides for itself when to consult its knowledge base. For example, when analyzing a complex commercial property insurance application, the agent may first use a search_underwriting_memory tool to check whether it previously assessed similar risks in the same industry or location. Only if the internal knowledge base doesn’t yield enough information will the agent turn to other, slower or more expensive tools - like external flood-risk APIs or financial-data verification systems.
12 description="Useful for finding information on past risk assessments and policies.",
13 func=search_underwriting_memory,
14)
3. mem0 as a flexible component in existing infrastructure
The third integration level shows mem0’s flexibility and is crucial from an enterprise deployment perspective. If an insurance company already uses LangChain to manage its tech stack - for example, to connect to an internal corporate LLM or to an existing vector store containing risk reports - mem0 can plug seamlessly into that ecosystem. Instead of forcing a separate data silo, mem0 can use the company’s approved vector database as its provider. This drastically lowers barriers and deployment cost.
10 collection_name="mem0" # Required collection name
11)
12# Configure mem0 to use the existing DB
13config = {
14 "vector_store": {
15 "provider": "langchain",
16 "config": {
17 "instance": risk_reports_db
18 }
19 },
20 # ... other configs, e.g., pointing to the corporate LLM
21}
22memory_component = Memory.from_config(config)
Our underwriter-agent experiment showed that this multi-level integration strategy is compelling. mem0 isn’t a rigid tool but a flexible component you can fit into existing workflows. As a result, we designed an assistant that not only answers questions but builds durable, institutional knowledge - which, in a real scenario, would translate into a faster, cheaper, and more consistent risk-assessment process.
Analyzing mem0 is more than a code review. It’s a case study in modern AI systems engineering that yields several important lessons.
First: The LLM as a configurable logic engine. The most revealing pattern in mem0 is using an LLM not just for text generation but as the core decision engine. The A.U.D.N. cycle is a paradigm shift - instead of writing hundreds of lines of code to handle conflicts and data updates, developers can recast the problem as a tool-selection task for the LLM, guided by a precise prompt.
Second: Composition and abstraction are king. mem0 isn’t a monolithic, super-intelligent model. It’s a well-arranged system composed of interchangeable, composable parts (LLM, vector store, graph DB). It reiterates deepagents’ lesson of “composition over innovation” and proves that classic software-engineering principles matter more than ever in the AI era.
Third: The future is universal, portable memory. Finally, the broader vision. mem0 isn’t just a library; it’s a step toward something much bigger. The creators’ introduction of OpenMemory MCP (which I’m happy to cover another time) signals an ambition to create a unified memory layer that follows the user across different AI applications. This leads to the idea of a “Universal Memory Passport.” Your AI assistant shouldn’t have to relearn your preferences every time you switch tools. This vision positions mem0 not just as a tool but as a potential standard for interoperable, personalized AI - something the creators are already trying to codify under the openmemory label.
mem0 delivers a powerful, pragmatic, production-ready architecture to solve one of AI’s most fundamental problems. It’s a project that shows the key to overcoming the “shallowness” of current systems isn’t more compute, but smarter architecture. A well-deserved GitHub star from me 😉