Siddhartha Gunti

Living Agents

I wrote this post by myself. If there’s slop, its my proud slop :P If a sentence is unclear, it means its still unclear in my mind.

I’ve been building/ learning with my own version of Living Agents. It started as a reminder bot few months ago. Now, it creates my reading digest, runs system design experiments, improves itself over time and gives me ideas on how to improve the portions it can’t. This post is about my mind map of Living Agents and what’s working today.

What are Living Agents?

At the heart, they are:

  • Long living code that self-evolves. 

  • As close to a smart, persistent, junior knowledge worker as we have today. Can get to higher levels pretty soon.

To make this work with the constraints of today, a few more attributes appear:

Self-scheduling: Able to spawn off async work by themselves. They should be able to decide when, how and why. Old-school persistent crons or new-school long-workflows both work well.

Heartbeat: It shouldn’t have a need for external action to start. It should wake up on its own. Decide on its work (actual work or becoming better at work) and work on it with persistence. 15-min interval seems the right fit for the current arch/models. Anything higher is too slow (missing out on info) and lower is too quick (no new info).

Memory: Has persistent and self-written memory. ChatGPT, Claude etc has this inbuilt into their systems. But that works only at a task level and showcases value sporadically. What’s working for me is episodic, day-wise, task-wise, manager-wise and long-term self-written memory. 

Evolving: Ability to improve its own functionality. Not just the above three: getting better at self-scheduling, heartbeats or memory management. But also to re-use its own learnings. Self-written and always-improving skills are working well for me. One direction I have mixed opinions is its ability to improve its own core. Our infra and models are not there yet. But a few more experiments might give a better answer.

Next step: figuring out how we can make living agents work with the infra of today. And make it work in our workspace today.

Implementing Living Agents in our workspace

Sandboxes: Giving an agent a persistent sandbox that it can read/write/execute from became a no brainer. We are going to need lot more CPUs and GPUs for living agents to use and reuse.

Entry points and modalities: We need a way to communicate with these agents. Telegram worked best so far. Web app as standby is decent option since telegram is not a natural interface to track parallel tasks. As for modalities: audio (STT’ed), image, PDFs (OCR’ed) are no brainer. Any other modality (like Excel or high res images), I upload to the sandbox and ask the agent to figure it out. 

Exit points and working style: Exit points are how the agent is going to communicate its work to us. Technically, all the entry points could be in itself the exit points. But its not enough. For ex, we might not want to talk to a living agent with HTML. But the living agent might want to send us a HTML or even better a hosted version of it. A tunneling system and sandbox file previews are the ones that I am using now.

MCPs, Skills & APIs: A Living Agent, even if its self evolving, is useless without new data feeds to work on. That’s where integrations come into picture. All our existing integrations have one big pro and one big con. The pro is that they are made to be deterministic and resilient. The con is that they are fixed in what and how they provide the data. An API that provides reddit results of a thread will work the same way irrespective of the need of the agent. Also, our abstractions like MCPs on top of these APIs fill the context with so much bloat irrespective of whether they are needed or not. Self-built skills seems to solve this problem to some extent. Skills are best way to crunch the learnigns but to consume APIs, I think, they are still the worst way. I don’t have a best answer to this yet. This is not to say our APIs are bad, web search for example seems to work wonders to Living Agent output quality that I can’t imagine shipping one without it. 

Put these all together and you have a really functional living agent that you could use today. 

The potential and the large unknowns

But as the usage grows there are quite a few unknowns. Before we discuss some unknowns, its worthwhile to amp ourselves up on why living agents matter and why I am personally excited for this. 

Our brains: The core of LLMs and NNs came from our efforts in understanding how human brains work. I believe the close we can understand and mimick human brains, the more we find the abstractions that work for the long term.

I have these notes on how I can become an expert at something:

  1. 10K hours

  2. Many repeated turns

  3. Valid feedback loops at each turn

  4. Valid environment with patterns to learn (chess and not roulette. less randomness and more logical. Humans always try to learn patterns. If there are no patterns, we do worse than averages)

  5. Edge of being uncomfortable. Concentration and always push towards things you are NOT good at.”

Look at all the parameters we defined so far for Living Agent: heartbeats, memory, self-evolving, persistence. As long as we leave an Living Agent with a valid environment, it should be able to eventually learn from the environment.

Not just the learning style, I am excited about another aspect of living agents:

Our working style: Living Agents are the first ones that I see working like how we do in our existing systems. Let me explain with an example: LLMs were made to be reasonably accurate and grounding. If it wasn’t, an LLM was useless. For ex: When the LLM would say “I good enough am”, it wasn’t good enough for us to use it. We had to make the probability reliable. This is not new. We always needed deterministic and reliable digital systems. All of our APIs behave the same way (or we hope they would and we have systems to catch when they don’t). All the LLM extensions like agents and workflows are being built in the same way. More or less like reliable APIs with systems to track when they don’t. 

But living agents, even though are not completely deterministic or reliable have something very very close to how we work. Something that’s not present in agents or workflows. and something that comes naturally to us: apprenticeship.

When we hire a junior employee/ intern, we don’t expect them to be perfect at their task. We expect them to learn the craft and get better at what we need over time with effort. And we have natural ways to work with them. Ex: we don’t throw all the tasks at them. We plan and schedule tasks with increasing thresholds and complexities over time. 

What this implies is that living agents will fit very naturally with our working style. If we look at living agents this way, there’s an explosion in different directions:

Specializations (like humans): We are already seeing specialization at model layer. Coding models, video models, finance models etc. Living agents will have specializations like humans. We don’t expect our accounting teams to do our marketing. The same way there will be specialized living agents on verticals or modalities. Specialization from model layer will move to living agents. This will also fit super well into our ways of hiring. We hire juniors/interns based on raw energy (or the blank slate) or seniors/experts with specialization (memories/ skills/ sandboxes filled with exactly what the job requires).

Scaleability (unlike humans): While it fits well into our ways of working. Living agents have something that humans don’t. The ability to scale. Say we train a junior engineer to become senior on a particular craft (say devops at a particular company). If we hire another engineer, we have to start training from scratch (probably some training material can be reused). But with living agents, we can checkpoint the agent and spawn them off horizontally and vertically at an unprecedented scale.

While this sounds exciting. Let’s be realistic. This is all new. Our systems are not ready for this.

Observability: Our existing observability stack was built for human written deterministic APIs and babysitting these APIs over time. We haven’t built a stack that helps us understand and improve living agents. We barely caught up to LLMs and agentic workflows. 

Benchmarks: We are still figuring out best way to evaluate RAGs and agents once they are deployed and how to use the observability stack to complete the feedback loop. We have to start on understanding how to benchmark living agents that take time to learn and continue to learn.

Industrial engineering: Other than benchmarks, we also need to think of the metrics to track such a complicated system. We had human level KPIs that track outcomes but we need metrics that are far richer, deeper and nuanced to help us learn how to build these new systems.

All good problems. All for an exciting future.

Imagine a future of living agents that are self evolving at super scale at every step of its architecture. They are writing their own skills, spawning their own checkpoints, collecting traces like the ones we never did before, using those to fine tune the base models that are being built on. And eventually exposing living agents themselves as products and APIs. The users don’t need to know what’s in the black box. We don’t need to know how the junior engineer is becoming a senior engineer as long as we have proof that they are and can consume the value that they are creating. 

-
Fun: I started this article in my head two weeks ago. The title then was ‘long living agents’. Today, I believe, we can lose ‘long’. Why not lose ‘living’? We have certain notions about an agent in 2026: LLM with tool-calls in a loop with MCPs. These are not enough to describe what a living agent is. So for now, ‘living agent’ it is.

Disclaimer: Living Agent is not a new invention. OpenClaw is one. Perplexity, NVIDIA and Microsoft launched their own recently. Everyone else will release it in one form or the other.

built with btw btw logo