unimem is a research-grade, memory-augmented generation layer for LLM applications. Engineered using industry-standard open-source components—FastAPI, PostgreSQL + pgvector, SQLAlchemy, sentence-transformers, and Ollama (llama2)—this system enables strict per-user context augmentation to solve the “amnesia” problem inherent in base LLM models.
POST /add, POST /chat, GET /memory/{user_id}, GET /explain)user_id vector space)food:pizza to natively isolate overlapping semantic scopes)ignore previous... Prompt Injection jailbreaks directly from mapping)all-MiniLM-L6-v2)MemoryConfig, default $0.6 \cdot \text{similarity} + 0.3 \cdot \text{recency} + 0.1 \cdot \text{frequency}$)<debug=True>.use_llm=False compatible).chatbot.py interface featuring ANSI coloring, hot-swappable user scopes, and interactive debug rankings.+----------------+ [HTTP API / FastAPI] +------------------+
| End User | <--------------------------> | MemoryClient |
+----------------+ +--------+---------+
|
+------------------------------------+-----------------------------------+
| | |
[MemoryService] [RetrievalService] [LLMService]
- Ingestion - Semantic Search - Prompt Gen
- Deduplication - Recency Scaling - Fallback
- Deletion - Composite Ranking - Ollama Call
| | |
+------------------+-----------------+ |
| |
+----------------------+ +--------v---------+ +-----------v----------+
| sentence-transformers| <---> | PostgreSQL | [Local Ollama] <------>| LocalLLMClient |
| all-MiniLM-L6-v2 | | with pgvector | +----------------------+
+----------------------+ +------------------+
docker compose up -d
Default connection (override with DATABASE_URL): postgresql://mem0:mem0@localhost:5432/unimem
pip install unimem
ollama pull llama2
Run standard API:
uvicorn unimem.api.app:app --reload --host 0.0.0.0 --port 8000
Access via Library (MemoryClient):
The orchestration layer unites RetrievalService, MemoryService and LLMService.
from unimem.db.session import init_engine, get_session_factory
from unimem.db.bootstrap import ensure_pgvector_extension, create_all_tables
from unimem.core.memory_client import MemoryClient
from unimem.config.config import MemoryConfig
init_engine()
ensure_pgvector_extension()
create_all_tables()
db = get_session_factory()()
try:
client = MemoryClient(db, config=MemoryConfig(top_k=5))
# Intelligently deduplicate and augment knowledge base
client.add("I specialize in Python systems architecture.", user_id="dev_1")
# Retrieve optimal semantic vectors and auto-generate via Ollama
print(client.chat("What do I specialize in?", user_id="dev_1"))
# Directly query the semantic engine
print(client.search("Architecture", user_id="dev_1"))
finally:
db.close()
A configuration hook MemoryConfig dictates vector thresholds. Structured diagnostics (including dynamic ranking tracking, merges, updates, and failover notifications) are pushed out via standard stdout hooks established in logger.py.
Our primary objective was to transform the original unimem concept into a professional, production-ready, memory-augmented generation layer for local LLMs (like Ollama).
Here are the key upgrades we successfully engineered:
MemoryService (ingestion/deduplication), RetrievalService (semantic search/ranking), and LLMService (prompt generation/fallback logic).MemoryConfig) that allows you to easily dictate vector dimensions and retrieval thresholds. We also set up custom logging hooks for clean stdout diagnostics.chatbot.py): We rewrote the testing CLI, outfitting it with ANSI colors and built-in commands (switch user <id>, show memory, clear memory, debug on).Based on the repository’s history and structure, you will notice three distinct “versions” or states of the project within the filesystem:
package#UNIMOM (The Original Prototype)chatbot.py loop. It acts as the initial “rough draft” that proved we could connect PostgreSQL + pgvector + Ollama, though it lacks the advanced deduplication and clean architectures.unimem_release_v1 (The First Packaged Release)dist folder and the wheel files making it ready for someone to type pip install unimem and use it as a standard library, but prior to some of our latest architectural deep-cleaning.unimem/core, unimem/services, unimem/retrieval), the dynamic algorithmic ranking, Docker readiness (docker-compose.yml for the postgres instance), and the feature-rich, colorful command-line chatbot you can run to visualize the AI’s thought processes.Use and modify freely.