It seems the MCP hype is starting to slow down a bit. After 6–8 months of high enthusiasm, the community is beginning to realize that MCP is not a magic bullet. In some MCP listings, you’ll find more than 10,000 servers doing all sorts of things. Naturally, many of them are useless—spun up by enthusiasts just to see what MCP is all about.
But some of these servers are actually useful.
In this post, I want to share my thoughts on building the most universal MCP server—one that can adapt to almost any use case.
## Quick Recap: What Is MCP?
MCP stands for **Model Context Protocol**, an open protocol designed to standardize how language models communicate with clients, tools, memory systems, and even with each other.
The excitement around MCP is driven by its potential to enable more powerful and flexible AI systems—systems that can adapt to a wide range of tasks and environments.
Long story short, MCP allows an LLM to **do** something, not just **talk** about something. It gives large language models metaphorical hands, legs, eyes, and ears.
Sure, this kind of functionality was possible before MCP—but it was more complicated. There was no single standard. And without a standard, there weren’t 10,000 enthusiasts building tools and servers around a shared protocol.
## What Is the Most Powerful Tool?
So, the most powerful MCP server must be able to operate some tools. In fact, the true power of an MCP server is defined by the power of the tools it can use.
But what tool are we talking about?
There’s no single answer in general. But if we reframe the question as:
**"What is the most powerful tool accessible to an ordinary person?"**
—then the answer becomes obvious: **a computer**. Just a regular desktop or laptop.
At least for me, the most powerful tool is a computer. With it, I can write code, automate tasks, communicate with others, create art, control remote devices — I can do pretty much anything.
So, the most powerful MCP server must be able to **control a computer**.
When an LLM — or more broadly, an AI assistant — can control a computer, it becomes incredibly powerful. It can do anything a human can do with a computer: write scripts, design workflows, generate content, operate hardware, and so much more.
That’s what we’re aiming for—a server that turns an AI assistant into a full-fledged operator of the digital world.

## Connecting an LLM to a Computer
I started with the basics. I installed an Ubuntu Desktop virtual machine and wrote the simplest MCP server to run on it.
The initial version of the server has a very limited set of tools—but even in this early stage, it can already do some genuinely useful things.
### Command Execution
* **Run command**: The server can execute any command on the host machine. This is the most fundamental capability, and it opens up a world of possibilities. The tool returns the command’s output.
### File System Operations
* **Read file from disk**: The server can read files from the disk. This is useful for accessing configuration files, data files, or any other type of file. The tool takes a filename and returns the file contents.
* **Save file to disk**: The server can save content to a file. This is helpful for persisting data generated by the LLM or other tools. The tool accepts a filename and file contents. It also works well for saving code scripts.
* **List files in a directory**: The server can list files in a specified directory. This is useful for navigating the file system and finding files. The tool accepts a directory path and returns a list of files in that location.
### Browser Operations
All browser operations are implemented using **Selenium**, enabling powerful automation and interaction with web pages.
* **Start a browser window**: Opens a new browser window and navigates to a specified URL.
* **Navigate to a URL**: Navigates an existing browser window (by ID) to a given URL.
* **List active browser windows and current pages**: Returns a list of all active browser windows and the current page open in each. This helps track what the assistant is doing.
* **Get HTML content of a page**: Retrieves the full HTML of the current page in a specified browser window. Useful for scraping and analysis.
* **Click an element in the browser**: Clicks an element identified by a CSS selector in a specific browser window. The LLM determines the selector based on the page’s HTML.
* **Extract text from an element**: Gets the text content of a specific element on the page, based on a CSS selector.
* **Input text into an element**: Inputs specified text into an element (e.g., a form field) identified by a CSS selector.
This short list of tools is already enough to create a powerful and flexible MCP server:
* The AI assistant can automate tasks by generating and executing code on the host machine.
* It can scrape data from websites.
* It can interact with web apps: logging in, filling out forms, and more.
## More Ideas for the Future
One of my goals is to let the AI assistant use **any installed application** on the host machine—not just command-line tools or browsers.
To do this, the assistant needs the ability to receive **screenshots** of application windows and interact with UI elements based on what it sees. This would involve:
* Analyzing screenshots to identify interface elements.
* Mapping visual elements to coordinates.
* Simulating clicks or other inputs at the right positions.
It’s a complex challenge, but it’s definitely possible. And it would massively expand what the assistant can do.
## Where Can This Be Used?
Short answer: **everywhere**.
If this MCP server is built properly, it can be embedded into any workflow where a computer is involved. The real question becomes: *How smart does the AI assistant need to be to use it effectively?*
To truly take advantage of this server, the assistant must be capable of executing **complex, multi-step tasks**. I’ve written about this in more detail in my blog post:
👉 [Building More Independent AI Agents: Let Them Plan for Themselves](/blog/post/building-more-independent-ai-agents-let-them-plan-for-themselves/)
At the moment, my MCP server can perform some simple tasks—but it still requires constant human supervision.
My hope is that soon I’ll be able to ask something like,
**“Create a new Instagram account for me,”**
and the assistant will complete the entire process on its own—no additional hints or hand-holding required.
The tools are there: it can run commands, use a browser, and access the internet. Now it just needs the intelligence to use them effectively.
## Summary
In this post, I explored how to connect a large language model to a computer using a custom MCP server. Starting from a basic Ubuntu setup, I implemented a minimal yet powerful toolset: command execution, file operations, and browser automation via Selenium.
Even with this simple foundation, the assistant can already automate real-world tasks—like browsing the web, scraping data, and writing or running code. The next step is expanding the assistant's capabilities to interact with any GUI-based application by analyzing screenshots and simulating mouse clicks.
Ultimately, the goal is to create an AI assistant that doesn’t just respond with words—but one that can *act*, *build*, and *do*. With the right tools and enough autonomy, we move one step closer to assistants that can truly handle complex tasks with minimal supervision.