A Topic-Scoped Memory Architecture for Persistent Context in Large Language Models
Souren Khetcho | February 2026
TAG is a memory architecture for LLMs that organizes conversational context into topic-scoped vector spaces with a lightweight metadata routing layer. Instead of searching a single flat embedding space (standard RAG) or paging context by token budget (MemGPT), TAG partitions memory by semantic coherence and routes queries through a two-tier retrieval mechanism.
- Topic Vector Spaces (TVS): Memory is partitioned into independent, per-topic vector stores that grow unboundedly with the topic rather than being constrained by token budgets.
- Summary-Based Routing: A continuously updated summary document per topic enables a small, fast router model to classify queries without scanning the full memory store.
- Relevancy Engine: Inspired by PageRank, a hot/warm/cold tiering system promotes and demotes topics based on user interaction signals — especially correction events ("I already told you this") — keeping routing latency near-constant regardless of total history size.
Memory management in conversational AI is not primarily a storage or retrieval problem — it is a routing problem. Knowing where to look before looking transforms global search into local search, improving precision, latency, and cost.
| System | Limitation TAG Addresses |
|---|---|
| Standard RAG | Context pollution from flat, undifferentiated embedding space |
| MemGPT | Token-budget-driven paging rather than semantic coherence |
| Mem0 | Coarse user/session scoping, no behavioral relevancy scoring |
| A-Mem | Rich linking but no scalable routing mechanism |
If you use or reference this work, please cite:
@article{khetcho2026tag,
title={Topological Adaptive Graphs (TAG): A Topic-Scoped Memory Architecture for Persistent Context in Large Language Models},
author={Khetcho, Souren},
year={2026}
}This work is licensed under CC BY-NC 4.0. You are free to share and adapt this work with appropriate credit, but not for commercial purposes.