Filesystem as Context: Building an AI Detective with bash-tool
Instead of stuffing documents into prompts, give your AI agent a filesystem and let it retrieve its own context. Here's how, using a murder mystery detective as the demo.
Technical articles on AI integration, web development, and emerging technologies.
Instead of stuffing documents into prompts, give your AI agent a filesystem and let it retrieve its own context. Here's how, using a murder mystery detective as the demo.
This article concludes the series by showing how a deliberately designed CLI becomes a powerful interaction layer, giving users precise control over an AI system’s conversational context, short-term memory, and long-term semantic knowledge.
This piece introduces background processors as autonomous AI agents that summarise conversations and extract critical facts to continuously enrich Long-Term Semantic Memory. By running asynchronously and optimising token usage, these processors enable a self-improving, increasingly personalised AI system that learns from every interaction.
In this article, we show how dynamic prompt engineering—via a `SessionManager` that intelligently layers short-term context, long-term semantic memory, and system instructions turns stateless LLM calls into genuinely contextual and personalised conversations.
In this article, we explain how Long-Term Semantic Memory uses vector embeddings and semantic search to give AI meaningful, persistent memory across conversations.
Learn how to build multi-agent systems with vector search, tool orchestration, and semantic understanding using Google's Agent Development Kit (JS/TS version).
In this article, we explore how Short-Term Conversational Memory creates the illusion of memory in otherwise stateless LLMs through careful context persistence and structured prompt reconstruction. We also show how token limits, cost, and context degradation are managed using asynchronous, AI-driven summarisation that preserves meaning while keeping conversations efficient and coherent.
This article explains why LLMs often “forget” earlier messages and how naive full-history prompting is costly and inefficient. It introduces a dual-memory architecture: a short-term store for immediate conversation flow and a long-term semantic store for durable knowledge across sessions. Together, these systems let an AI maintain coherent dialogue without overloading the model’s context window or budget.
These are the answers to the questions asked via Slido during my workshop on MCP at DevFest Taipei 2025.
A practical look at why hallucinations occur in modern language models, why current evaluation methods make them persist, and how to detect them using Natural Language Inference in JavaScript.