Roman`s notes

AI Agent’s Common Memory

15 July 2025

CleverChatty, AI Agents, Common Memory, Intelligent Assistants, Memory Systems

In this post, I want to explore an idea I’ve been experimenting with: common memory for AI agents. I’ll explain what I mean by this term, how such memory can be implemented, and why I believe it's worth exploring.

What Is Agent's “Common” Memory?

I’m not sure whether “common memory” is already a widely accepted term in the AI space, or even the most accurate label for the concept I have in mind — but I’ll use it for now until a better one emerges (or someone suggests one).

By common memory, I mean:

A shared repository of memories formed by a single AI agent from interactions with multiple other agents — including both humans and other AI agents. For eample, AI Chat can retain information learned from conversations with different users, and selectively reference that information in future interactions.

This is distinct from related terms:

Shared memory usually refers to memory shared across different AI systems or agents — not across users of the same assistant.
Collaborative memory comes closer, but often implies more structured cooperation and might be too narrow for what I’m describing.

So for now, I’ll stick with common memory to describe a memory system that allows an AI assistant to retain and selectively reference information learned across interactions with multiple users.

AI Chats do not use Common Memory

When we interact with AI chats like ChatGPT, they typically do not retain information across different users. Each conversation is isolated, and the AI does not remember past interactions with other users. This means that if you ask the AI about something you discussed with another user, it won’t have any context or memory of that conversation. Only the current user's context and history are considered.

🔍 Building an Agentic RAG System with CleverChatty (No Coding Required)

10 July 2025

CleverChatty, Agentic RAG, A2A Protocol, AI Applications, Query Processing

🔍 Building an Agentic RAG System with CleverChatty (No Coding Required)

With the recent addition of A2A (Agent-to-Agent) protocol support in CleverChatty, it’s now possible to build powerful, intelligent applications—without writing any custom logic. In this blog post, we’ll walk through how to build an Agentic RAG (Retrieval-Augmented Generation) system using CleverChatty.

🤖 What is Agentic RAG?

The term agentic refers to an agent's ability to reason, make decisions, use tools, and interact with other agents or humans intelligently.

In the context of RAG, an Agentic RAG system doesn’t just retrieve documents based on a user’s prompt. Instead, it:

Preprocesses the user’s query,
Executes a more contextually refined search,
Postprocesses the results, summarizing and formatting them,
And only then returns the final answer to the user.

This kind of intelligent behavior is made possible by using a Large Language Model (LLM) as the core reasoning component.

The goal of a RAG system is to enrich the user’s query with external context, especially when the required information is not available within the LLM itself. This typically involves accessing an organization’s knowledge base—structured or unstructured—and providing relevant data to the LLM to enhance its responses.

Integrating Mem0 (mem-zero) with CleverChatty

18 June 2025

Mem0, CleverChatty, AI, memory

Integrating Mem0 (mem-zero) with CleverChatty

In this post, I’ll walk through how to integrate the Mem0 memory model with CleverChatty-CLI, a command-line framework for building AI assistants.

Spoiler: It turned out to be a lot easier than I expected.

Quick Overview of the Projects

Before we dive into the integration, here’s a quick recap of the two key components involved:

Mem0 “Mem0” (pronounced mem-zero) adds an intelligent memory layer to AI assistants and agents. It enables personalized experiences by remembering user preferences, adapting to their needs, and continuously learning over time. It’s particularly useful for customer support bots, personal assistants, and autonomous agents.
CleverChatty-CLI A command-line interface for interacting with LLM-based chat systems. It supports MCP (Model Context Protocol), RAG (Retrieval-Augmented Generation), and I plan to add support for A2A (Agent-to-Agent) communication soon. The CLI is built for experimentation, testing, and prototyping AI interactions.

CleverChatty now supports Streamable HTTP for MCP servers!

9 June 2025

mcp, http, streaming

CleverChatty now supports Streamable HTTP for MCP servers!

CleverChatty, a lightweight AI Chat tool supporting multiple LLM providers, now includes support for Streamable HTTP with MCP servers. This update enables more flexible and efficient communication with AI models, making integration with custom tools and services even smoother.

🌐 What is CleverChatty?

CleverChatty is a minimalist AI chat interface that works with various large language model (LLM) providers — including OpenAI, Anthropic, Google, and local models like Ollama. It’s designed for users and developers who want a simple, extensible tool that supports MCP-based tool usage.

Until now, CleverChatty only supported STDIO and SSE (Server-Sent Events) as transport protocols for connecting with MCP servers. With the latest update, it now supports Streamable HTTP, expanding compatibility and flexibility.

What’s Missing in MCP

9 June 2025

mcp, ai-orchestrator, ai-assistant, rag, ai-memory, ai-agents

Over the past couple of months, I’ve been experimenting with the Model Context Protocol (MCP) — building AI agents and tools around it. While the experience has been promising, I’ve noticed a few areas where MCP could be expanded or improved.

These aren’t critical issues, but adding them would make MCP more complete and developer-friendly.

Here’s my current wishlist:

A Standard MCP Server Interface
Bidirectional Notifications
Built-in or Native Transport Layer

Let’s walk through each of these in more detail.

Adding Support for Retrieval-Augmented Generation (RAG) to AI Orchestrator

12 May 2025

RAG, MCP, AI Orchestrator, CleverChatty, LLM, AI Assistant

Adding Support for Retrieval-Augmented Generation (RAG) to AI Orchestrator

Good news! I've extended my lightweight AI orchestrator, CleverChatty, to support Retrieval-Augmented Generation (RAG) by integrating it using the Model Context Protocol (MCP).

Quick Recap

RAG (Retrieval-Augmented Generation) is an AI technique that enhances language models by retrieving relevant external documents (e.g., from databases or vector stores) based on a user’s query. These documents are then used as additional context during response generation, enabling more accurate, up-to-date, and grounded outputs.
MCP (Model Context Protocol) is a standard for how external systems—such as tools, memory, or document retrievers—communicate with language models. It enables structured, portable, and extensible context exchange, making it ideal for building complex AI systems like assistants, copilots, or agents.
CleverChatty is a simple AI orchestrator that connects LLMs with tools over MCP and supports external memory. My goal is to expand it to work with modern AI infrastructure—RAG, memory, tools, agent-to-agent (A2A) interaction, and beyond. It’s provided as a library, and you can explore it via the CLI interface: CleverChatty CLI.

Inside the LLM Black Box: що входить у контекст і чому це важливо

20 May 2025

LLM, контекст, ШІ, машинне навчання

Inside the LLM Black Box: що входить у контекст і чому це важливо

Великі мовні моделі (LLM), такі як GPT-4, Claude, Mistral та інші, здаються розумними у своїх відповідях — але справжня магія полягає в тому, як вони сприймають і інтерпретують контекст. Розуміння того, що входить у контекст LLM і як це впливає на результат, критично важливе для розробників, дослідників і дизайнерів продуктів, які працюють із генеративним ШІ.

У цій публікації я хочу дослідити складові контексту, його структуру, обмеження та взаємодію з найбільш поширеними сценаріями використання, такими як використання інструментів (Tools, MCP) і включення додаткових знань з Retrieval-Augmented Generation (RAG).

Implementing the Most Universal MCP Server Ever

23 May 2025

MCP, server, AI, machine learning

Implementing the Most Universal MCP Server Ever

It seems the MCP hype is starting to slow down a bit. After 6–8 months of high enthusiasm, the community is beginning to realize that MCP is not a magic bullet. In some MCP listings, you’ll find more than 10,000 servers doing all sorts of things. Naturally, many of them are useless—spun up by enthusiasts just to see what MCP is all about.

But some of these servers are actually useful.

In this post, I want to share my thoughts on building the most universal MCP server—one that can adapt to almost any use case.

Building More Independent AI Agents: Let Them Plan for Themselves

16 May 2025

AI, agents, planning, autonomy, MCP

Building More Independent AI Agents: Let Them Plan for Themselves

I continue to explore one of my favorite topics: how to make AI agents more independent. This blog is my way of organizing ideas and gradually shaping a clear vision of what this might look like in practice.

The Dream That Started It All

When large language models (LLMs) and AI chat tools first started delivering truly impressive results, it felt like we were entering a new era of automation. Back then, I believed it wouldn’t be long before we could hand off any intellectual task to an AI—from a single prompt.

I imagined saying something like:

"Translate this 500-page novel from French to Ukrainian, preserving its original literary style."

And the AI would just do it.

But that dream quickly ran into reality. The context window was a major limitation, and most chat-based AIs had no memory of what they'd done before. Sure, you could translate one page. But across an entire novel? The tone would shift, the style would break, and continuity would be lost.

Inside the LLM Black Box: What Goes Into Context and Why It Matters

12 May 2025

LLM, context, AI, machine learning

Inside the LLM Black Box: What Goes Into Context and Why It Matters

Large Language Models (LLMs) like GPT-4, Claude, and Mistral appear to produce intelligent responses — but the magic lies in how they consume and interpret context. Understanding what goes into an LLM's context and how it shapes output is critical for developers, researchers, and product designers working with generative AI.

This post explores the components of context, how it's structured, how it's limited, and how advanced use cases like tool usage and retrieval-augmented generation (RAG) interact with it.

Easily Switch Transport Protocols in MCP Servers

7 May 2025

transport, MCP, servers, programming

Easily Switch Transport Protocols in MCP Servers

I would like to expose one more benefit of the Model Context Protocol (MCP) — the ability to easily change the transport protocol. There are three different transport protocols available now, and each has its own benefits and drawbacks.

However, if an MCP server is implemented properly using a good SDK, then switching to another transport protocol is easy.

Quick Recap: What is MCP?

Model Context Protocol (MCP) is a new standard for integrating external tools with AI chat applications. For example, you can add Google Search as an MCP server to Claude Desktop, allowing the LLM to perform live searches to improve its responses. In this case, Claude Desktop is the MCP Host.

There are three common types of MCP server transports:

STDIO Transport: The MCP server runs locally on the same machine as the MCP Host. Users download a small application (the MCP server), install it, and configure the MCP Host to communicate with it via standard input/output.
SSE Transport: The MCP server runs as a network service, typically on a remote server (but it can also be on localhost). It's essentially a special kind of website that the MCP Host connects to via Server-Sent Events (SSE).

An Underrated Feature of MCP Servers: Client Notifications

6 May 2025

notifications, AI, MCP

An Underrated Feature of MCP Servers: Client Notifications

In recent months, the Model Context Protocol (MCP) has gained a lot of traction as a powerful foundation for building AI assistants. While many developers are familiar with its core request-response flow, there's one feature that I believe remains underappreciated: the ability of MCP servers to send notifications to clients.

Let’s quickly recap the typical flow used by most MCP-based assistants:

A user sends a prompt to the assistant.
The assistant attaches a list of available tools and forwards the prompt to the LLM.
The LLM generates a response, possibly requesting the use of certain tools for additional context.
The assistant invokes those tools and gathers their responses.
These tool responses are sent back to the LLM.
The LLM returns a final answer, which the assistant presents to the user.

This user-initiated flow is incredibly effective—and it’s what powers many AI assistants today.

However, MCP also supports a less obvious but equally powerful capability: tool-initiated communication. That is, tools can trigger actions that cause the MCP server to send real-time notifications to the client, even when the user hasn’t sent a new prompt.

Implementing AI Chat Memory with MCP

30 April 2025

Recently, I introduced the idea of using MCP (Model Context Protocol) to implement memory for AI chats and assistants. The core concept is to separate the assistant's memory from its core logic, turning it into a dedicated MCP server.

If you're unfamiliar with this approach, I suggest reading my earlier article: Benefits of Using MCP to Implement AI Chat Memory.

What Do I Mean by “AI Chat”?

In this context, an "AI Chat" refers to an AI assistant that uses a chat interface, with an LLM (Large Language Model) as its core, and supports calling external tools via MCP. ChatGPT is a good example.

Throughout this article, I’ll use the terms AI Chat and AI Assistant interchangeably.

Introducing CleverChatty – An AI Assistant Package for Go 🤖🐹

28 April 2025

AI, chatbots, Go

Introducing CleverChatty – An AI Assistant Package for Go 🤖🐹

I'm excited to introduce a new package for Go developers: CleverChatty.
CleverChatty implements the core functionality of an AI chat system. It encapsulates the essential business logic required for building AI-powered assistants or chatbots — all while remaining independent of any specific user interface (UI).

In short, CleverChatty is a fully working AI chat backend — just without a graphical UI. It supports many popular LLM providers, including OpenAI, Claude, Ollama, and others. It also integrates with external tools using the Model Context Protocol (MCP).

Benefits of Using MCP to Implement AI Chat Memory

18 April 2025

memory, AI, conversational agents

Benefits of Using MCP to Implement AI Chat Memory

Implementing memory for AI assistants or conversational AI tools remains a complex engineering challenge. Large Language Models (LLMs) like ChatGPT are stateless by design—they only retain knowledge up to their training cutoff and do not inherently remember past interactions. However, for a seamless and context-aware user experience, it’s crucial for AI chat tools to recall previous conversations, preferences, and relevant history.

To address this gap, different vendors have developed their own proprietary solutions for integrating memory. For example, OpenAI’s ChatGPT has built-in memory capabilities, and other platforms like Anthropic’s Claude (including the Claude Desktop application) offer similar features. Each of these implementations is unique, often tied closely to the platform’s internal architecture and APIs.

This fragmented landscape raises an important question: what if we had a standardized way to implement memory for AI assistants?

Model Context Protocol (MCP) was originally designed to provide a standard way to integrate external tools with large language models (LLMs). But this same concept could inspire a standardized approach to implementing memory in AI chat systems. Instead of inventing something entirely new, perhaps we can extend or repurpose MCP to serve this function as well.

Which MCP Server Transport is Better? Comparing STDIO and SSE

16 April 2025

In this post, I’d like to share some thoughts on the Model Context Protocol (MCP) and compare two types of server integration methods it supports—STDIO and SSE, especially from the security perspective.

Quick Recap: What is MCP?

Model Context Protocol (MCP) is a new standard for integrating external tools with AI chat applications. For example, you can add Google Search as an MCP server to Claude Desktop, allowing the LLM to perform live searches to improve its responses. In this case, Claude Desktop is the MCP Host.

There are two common types of MCP server transports:

STDIO Transport: The MCP server runs locally on the same machine as the MCP Host. Users download a small application (the MCP server), install it, and configure the MCP Host to communicate with it via standard input/output.
SSE Transport: The MCP server runs as a network service, typically on a remote server (but it can also be on localhost). It's essentially a special kind of website that the MCP Host connects to via Server-Sent Events (SSE).

Implementing Authentication in a Remote MCP Server with SSE Transport

15 April 2025

mcp, llm, authentication, sse, server-sent-events, golang, python

Implementing Authentication in a Remote MCP Server with SSE Transport

Today, I want to show how Model Context Protocol (MCP) servers using SSE transport can be made secure by adding authentication.

I'll use the Authorization HTTP header to read a Bearer token. Generating the token itself is out of scope for this post, it is same as usual practices for web applications.

To verify how this works, you’ll need an MCP host tool that supports SSE endpoints along with custom headers. Unfortunately, I couldn’t find any AI chat tools that currently support this. For example, Claude Desktop doesn’t, and I haven’t come across any others that do.

However, I’m hopeful that most AI chat tools will start supporting it soon — there’s really no reason not to. By the way, I shared my thoughts on how MCP could transform the web in this post.

For my experiments, I’ve modified the mcphost tool. I’ve submitted a pull request with my changes and hope it gets accepted. For now, I’m using a local modified version. I won’t go into the details here, since the focus is on MCP servers, not clients.

"Tool calling" from LLM. Understanding hot it works

25 March 2025

mcp, ai, llm, chatgpt, internet, web3, sse, server-sent events

I am interested in learning how LLMs can understand requests requiring a "tool call".

In this post "Tool Calling" and Ollama, there is a nice description of how "Tool calling" works with Ollama.

The idea of this feature is that LLMs can have access to some tools (aka external APIs) and can call them to get extra information. To be able to do this, the LLM has to understand the current request, determine that this request could be forwarded to a tool, and parse the arguments.

Here is a shorter example of the code from the original article:

#!/bin/bash 
SERVICE_URL="http://localhost:11434"
read -r -d '' DATA <<- EOM
{
  "model": "llama3.1",
  "messages": [
    {
      "role": "user",
      "content": "This is Bob. We are doing math. Help us to add 2 and 3. BTW. Say hello to him"
    }
  ],
  "stream": false,
  "tools": [
    {
      "function": {

MCP can have significant impact on habitual internet usage practices

13 April 2025

mcp, ai, llm, chatgpt, internet, web3, sse, server-sent events

MCP can have significant impact on habitual internet usage practices

Model Context Protocol (MCP) is now popular subject in discussions around AI and LLMs. It was designed to add a standard way to connect "external" tools to LLMs to make them more useful. Classic example is the "what is the weather in ..." too. Each AI chat tool could do this with own way. Now there is a standard and a plugin made for one Ai Chat system can work with others.

We can se burst of enthusiasm in implementig of MCP servers for everything. I expect this trend will grow. Especially usage of MCP servers with SSE transport. Implementing of MCP server with Server-Sent Events make it similar to SaaS server designed for LLM/AI tool as a client.

There are to reason i decided to write this artcile.

First. It is reported that internet users now often go to AI chat (often ChatGPT) to find something instead of going to google
Second. OpenAI anounced they will add support of MCP to ChatGPT Desktop soon. And they will add both STDIO and SSE transport protocols for MCP

Based on this i expect we will see some interesting changes soon.

Building MCP SSE Server to integrate LLM with external tools

15 April 2025

mcp, sse, llm, external-tools, linux, security, authorization-token

Building MCP SSE Server to integrate LLM with external tools

As large language models (LLMs) find real-world use, the need for flexible ways to connect them with external tools is growing. The Model Context Protocol (MCP) is an emerging standard for structured tool integration.

Most current tutorials focus on STDIO-based MCP servers (Standard Input/Output), which must run locally with the client. But MCP also supports SSE (Server-Sent Events), allowing remote, asynchronous communication over HTTP—ideal for scalable, distributed setups.

In this article, we'll show how to build an SSE-based MCP server to enable real-time interaction between an LLM and external tools.

For this example, I've chosen the "Execute any command on my Linux" tool as the backend for the MCP server. Once connected to an LLM, this setup enables the AI to interact with and manage a Linux instance directly.

Additionally, I'll demonstrate how to add a basic security layer by introducing authorization token support for interacting with the MCP server.