Siddhartha Gunti

Living Agents

I wrote this post by myself. If there’s slop, it's my proud slop. :P If a sentence is unclear, it means it's still unclear in my mind.

I’ve been building/learning with my own version of living agents for a few months. It started as a reminder bot. Now, it creates my reading digest, runs system design experiments, improves itself over time, and gives me ideas on how to improve the portions it can’t. This post is about my mind map of living agents and what’s working today for me.

What are living agents?

At the heart, they are:

  • Long-living code that self-evolves. 

  • As close to a smart, persistent, junior knowledge worker as we have today. Can get to higher levels pretty soon.

To make this work with the constraints of today, a few more attributes appear:

Self-scheduling: Able to spawn off async work by themselves. They should be able to decide when, how, and why. Old-school persistent cron jobs or new-school long workflows both work well.

Heartbeat/wake-ups: It shouldn’t need an external action to start. It should wake up on its own. Decide on its work (actual work or becoming better at work) and work on it with persistence. A 15-min interval seems the right fit for the current arch/models. Anything higher is too slow (missing out on info), and lower is too quick (no new info).

Memory: Has persistent and self-written memory. ChatGPT, Claude, LeChat, etc. have it built into their systems. But that works only at a task level and showcases value sporadically. What’s working for me is episodic, day-wise, task-wise, manager-wise, and long-term self-written memory. 

Evolving: Ability to improve own functionality. Not just the above three: getting better at self-scheduling, heartbeats, or memory management. But also to reuse its own learnings. Self-written and always-improving skills are working well for me. One direction I have mixed opinions on is its ability to improve its own core. Our infra and models are not there yet. But a few more experiments might give a better answer.

Next step: figuring out how we can make living agents work with current infrastructure. And make it work in our workspace today.

Implementing living agents in our workspace

Sandboxes: Giving an agent a persistent sandbox that it can read/write/execute from became a no-brainer. We are going to need a lot more CPUs and GPUs for living agents to use and reuse.

Entry points and modalities: We need a way to communicate with these agents. Telegram worked best so far. A web app as a standby is a decent option since Telegram is not a natural interface to track parallel tasks. As for modalities: audio (STT’ed), image, and PDFs (OCR’ed) are no-brainers. Any other modality (like Excel or high-res images), I upload to the sandbox and ask the agent to figure it out. 

Exit points and working style: Exit points are how the agent is going to communicate its work to us. Technically, all the entry points could be exit points. But it's not enough. For example, we might not want to talk to a living agent with HTML. But the living agent might want to send us an HTML or even better, a hosted version of it. A tunneling system and sandbox file previews are the ones that I am using now.

MCPs, Skills & APIs: A living agent, even if it's self-evolving, is useless without new data feeds to work on. That’s where integrations come into play. All our existing integrations have one big pro and one big con. The pro is that they are made to be deterministic and resilient. The con is that they are fixed in what and how they provide the data. An API that provides Reddit results of a thread will work the same way irrespective of the need of the agent. Also, our abstractions like MCPs on top of these APIs fill the context with so much bloat irrespective of whether they are needed or not. Self-built skills seem to solve this problem to some extent. Skills are the best way to crunch the learnings, but to consume APIs, I think they are still the worst way. I don’t have a best answer to this yet. This is not to say our APIs are bad; web search, for example, seems to work wonders for Living Agent output quality that I can’t imagine shipping one without it. 

Put these all together, and you have a really functional living agent that you could use today. 

The potential and the large unknowns

Before we discuss some unknowns, it's worthwhile to amp ourselves up on why living agents matter and why I am personally excited. 

Our brains: The core of LLMs and NNs came from our efforts in understanding how human brains work. I believe the closer we can understand and mimic human brains, the more we find the abstractions that work for the long term.

I have these notes on how I can become an expert at something:

  1. 10K hours

  2. Many repeated turns

  3. Valid feedback loops at each turn

  4. Valid environment with patterns to learn (chess and not roulette). less randomness and more logic. Humans learn patterns. If there are no patterns, we do worse than averages.

  5. On the edge of being uncomfortable. Concentration, and always push towards things you are NOT good at.

Look at all the parameters we defined so far for a living agent: heartbeats, memory, self-evolution, and persistence. As long as we leave a living agent with a valid environment, it should be able to eventually learn from the environment.

Not just the learning style, I am excited about another aspect of living agents:

Our working style: Living agents are the first ones that I see working like how we do in our existing systems. Let me explain with an example: LLMs were made to be reasonably accurate and grounding. For ex: When the LLM would say, “I good enough am," it wasn’t good enough for us to use it. We had to make the probability reliable. This is not new. We always needed deterministic and reliable digital systems. All of our APIs behave the same way (or we hope they would, and we have systems to catch when they don’t). All the LLM extensions like agents and workflows are being built in the same way. More or less like reliable APIs with systems to track when they don’t. 

But living agents, even though they are not completely deterministic or reliable, have something very, very close to how we work. Something that’s not present in agents or workflows. and something that comes naturally to us: apprenticeship.

When we hire a junior employee/intern, we don’t expect them to be perfect at their task. We expect them to learn the craft and become better over time with effort. And we have natural ways to work with them. Ex: We don’t throw all the tasks at them. We plan and schedule tasks with increasing thresholds and complexities over time. 

⇒ living agents will fit very naturally with our working style. If we look at living agents this way, there’s an explosion in different directions:

Specializations (like humans): We are already seeing specialization at the model layer. Coding models, video models, finance models, etc. Living agents will have specializations like humans. We don’t expect our accounting teams to do our marketing. The same way there will be specialized living agents on verticals or modalities. Specialization from the model layer will move to living agents. This will also fit super well into our ways of hiring. We hire juniors/interns based on raw energy (or the blank slate) or seniors/experts with specialization (memories/skills/sandboxes filled with exactly what the job requires).

Scalability (unlike humans): While it fits well into our ways of working, Living agents have something that humans don’t. The ability to scale. Say we train a junior engineer to become senior in a particular craft (say, DevOps at a particular company). If we hire another engineer, we have to start training from scratch (probably some training material can be reused). But with living agents, we can checkpoint the agents and spawn them off horizontally and vertically at an unprecedented scale.

While this does sound exciting. Let’s be realistic. This is all new. Our systems are not ready for this.

Observability: Our existing observability stack was built for human-written deterministic APIs and babysitting these APIs over time. We haven’t built a stack that helps us understand and improve living agents. We barely caught up to LLMs and agentic workflows. 

Benchmarks: We are still figuring out the best way to evaluate RAGs and agents once they are deployed and how to use the observability stack to complete the feedback loop. We have to start on understanding how to benchmark living agents that take time to learn and continue to learn.

Industrial engineering: Other than benchmarks, we also need to think of the metrics to track such a complicated system. We had human-level KPIs that track outcomes, but we need metrics that are far richer, deeper, and more nuanced to help us learn how to build these new systems.

All good problems. All for an exciting future.

Imagine a future of living agents that are self-evolving at super scale at every step of their architecture. They are writing their skills, spawning their checkpoints, and collecting traces like the ones we never did before and using those to fine-tune the base models that they are being built on. Exposing living agents themselves as products and APIs. The users don’t need to know what’s in the black box. We don’t need to know how the junior engineer is becoming a senior engineer as long as we have proof that they are and can consume the value that they are creating. 

-
Fun: I started this article in my head two weeks ago. The title then was ‘long-living agents.' Today, I believe, we can lose 'long.' Why not lose ‘living’? We have certain notions about an agent in 2026: an LLM with tool calls in a loop with MCPs. These are not enough to describe what a living agent is. So for now, ‘living agent’ it is.

Disclaimer: Living Agent is not a new invention. OpenClaw is one. Perplexity, NVIDIA, and Microsoft launched their own recently. Everyone else will release it in one form or the other.

built with btw btw logo