Your Codebase Is Not Ready for AI Agents
Agentic engineering only works when you've already done the boring architecture work. Here's how to audit your system — and why AI agents are the best architectural linter you've never hired.
The vibe coding honeymoon is over. I know because I've lived through it — and now I'm dealing with the aftermath.
For the past several months, I've watched teams (including, occasionally, my own) chase the dopamine hit of spinning up entire services in an afternoon using AI pair programmers. It works. Until it doesn't. And when it breaks, it breaks in ways that make you genuinely question what you actually own.
Karpathy coined "agentic engineering" as the mature successor to vibe coding, and he's right that the terminology shift matters. This isn't just a rebrand. Vibe coding is a workflow — you and an LLM riffing until something ships. Agentic engineering is a discipline — you design a system where autonomous agents can safely read, reason about, modify, and deploy code with minimal human hand-holding per task. The difference is architectural. And most codebases I've seen, including the one I inherited when I joined Berlapak, are architecturally hostile to agents.
Here's my hot take up front: AI coding agents are the best architectural linter you've never hired. When an agent struggles with your codebase, so do your junior engineers. The agent is just louder about it.
The 60-Second Context Test
There's a simple diagnostic I use now before adding any new service or major feature area. I call it the 60-second context test.
If an AI agent needs more than 60 seconds of codebase context to add a single endpoint to an existing service, your service boundaries are broken.
That's it. That's the test.
In practice, this means: drop into a service directory, open the README and the route definitions, and ask an agent to add a new GET /orders/:id/timeline endpoint. If the agent immediately asks you to explain what pkg/shared/utils/db_helpers.go does, why there's a BaseCRUDController being extended three levels deep, or which of the four different auth middleware implementations to use here — you have a boundary problem. The service doesn't know what it is.
I hit this exact wall in our order management service. We run a B2B marketplace at Berlapak in Go, and this service had quietly become a dependency hotel. It imported from six other internal packages, called three downstream services synchronously, and had a utils folder with 47 functions that were each used by one to two callers. When I asked Cursor to add a timeline endpoint, it hallucinated a database table that didn't exist but should have existed based on the naming patterns it saw elsewhere. Confidently. With working SQL.
The problem wasn't the agent. The problem was the service had no coherent identity. It was doing too many things for too many people, and the agent correctly inferred "more things happen here" without being able to find the seam.
After we broke it apart — separating order state management from order enrichment from order reporting — the same context test took 15 seconds. The agent had a clear bounded context to work within.
Why Event-Driven Architecture Is Naturally Agent-Friendly
Synchronous RPC chains are cognitive debt. Every time you call Service A, which calls Service B, which calls Service C to complete a user-facing request, you've created a dependency graph that lives only in the heads of the people who built it. There's no artifact. No contract. Just institutional knowledge and a lot of Slack history.
Async event-driven systems force you to make that graph explicit. When order creation publishes an order.created event to Kafka, and the inventory service subscribes to it, the contract is the event schema. The event schema is a document. Documents are things agents can read.
I've been deliberately pushing our Berlapak platform toward more event-driven flows, and the side effect I didn't expect was how much cleaner the agent interactions became. When I ask an agent to implement a new downstream reaction to an existing event, the work is bounded: here's the event schema, here's the consumer pattern we use in this repo, add a handler. The agent doesn't need to understand the entire synchronous call tree because there isn't one.
To audit your own system: draw your synchronous RPC dependencies. If the diagram looks like spaghetti, that's what your agent is trying to navigate. Each synchronous hop an agent must trace to understand impact is context window tax and hallucination surface. Every event boundary is a place where the agent can stop reasoning and start working.
That said, async isn't free. Eventual consistency failures, out-of-order events, dead letter queues — these failure modes are less obvious and require explicit runbook documentation. More on that in a minute.
The Cognitive Debt Problem Nobody's Talking About
In 2026, the bottleneck in software development is not generating code. It's understanding what happens when generated code breaks in production at 2 AM.
I've seen AI-generated services that are aesthetically clean — good structure, reasonable naming, passing tests — but have no observable internals. No structured logging. No meaningful spans. No error context beyond 500 Internal Server Error. When something goes wrong, you're debugging by intuition in a codebase that was written by a model that no longer exists in the state it was in when it wrote that code.
This is cognitive debt, and it compounds. The original developer (human or agent) held context that isn't encoded anywhere. Every fix requires reconstructing that context from scratch.
My current rule: any service that passes through an AI agent's hands must log structured events at every domain boundary, not just errors. For our Go services, this means slog with a consistent schema — every request gets a correlation ID, every external call gets a duration and a result code, every state transition gets a before/after snapshot at DEBUG level.
For the NestJS services, we're stricter — interceptors that automatically emit a structured log at the entry and exit of every controller method, with the actor identity and a request fingerprint. This is tedious to enforce manually. It's exactly the kind of task you can delegate to an agent once you've written the pattern once. And it means the next agent — or engineer — that debugs a production incident has actual evidence to work with.
Observability isn't just operational hygiene anymore. It's how you maintain intellectual ownership of code you didn't fully write yourself.
A Practical Agent Readiness Checklist
Here's what I actually audit before I let an agent loose on a service:
1. Explicit API contracts exist as code, not comments. OpenAPI specs, Protobuf definitions, Zod schemas — it doesn't matter which. The contract must be machine-readable. If your API contract is "look at the controller and infer it," the agent will infer it differently than you would.
2. Test coverage is meaningful, not just metric-chasing. I don't care if you're at 80% line coverage. I care if you have integration tests that exercise real database state and unit tests for every non-trivial business rule. Agents will break things confidently when there are no tests to catch them. Coverage is the agent's safety net, not yours.
3. Bounded contexts are enforced structurally. If your Go service can import anything from anywhere in the monorepo, it will. And so will the agent. Use explicit module boundaries. In our case, we enforce this with linting rules — cross-domain imports outside of defined integration points fail CI.
4. Failure modes are documented in runbook format.
Every service should have a RUNBOOK.md that enumerates: what can fail, how you know it's failing (alert name or log pattern), and what the recovery steps are. This document doubles as agent context. When I ask an agent to implement retry logic or a circuit breaker, I paste the runbook section. The quality of the output is noticeably better when the agent knows what failure looks like.
5. Dependency injection is explicit. In NestJS this is natural. In Go it requires discipline. But if your service constructs its own database connections, HTTP clients, and config readers inline, an agent cannot easily swap implementations for testing or reasoning. Explicit DI is the difference between a service that's testable in isolation and one that requires a running environment to think about.
// Hostile to agents: implicit global state
func HandleCreateOrder(w http.ResponseWriter, r *http.Request) {
db := getGlobalDB() // where does this come from? who knows
// ...
}
// Agent-friendly: explicit dependencies
func NewOrderHandler(db *pgxpool.Pool, events EventPublisher, logger *slog.Logger) *OrderHandler {
return &OrderHandler{db: db, events: events, logger: logger}
}
The agent that reads the second version immediately knows what this handler needs. The first version requires archaeology.
Agent-Readiness Is an Architectural Proxy
Here's what I've come to believe, and I'll say it plainly: the agent-readiness of your system is a direct proxy for how well-designed it actually is.
Every property on that checklist is a property good architecture has always demanded. Explicit contracts. Bounded contexts. Observable behavior. Documented failure modes. We've been preaching these things for years. We just didn't have a forcing function that made the absence of them immediately, visibly painful.
AI agents are that forcing function.
When a junior engineer struggles to add a feature to a service, we say the codebase is complex. When an AI agent struggles to add the same feature, we suddenly notice that maybe the service has 11 responsibilities and no coherent ownership boundary. The agent's confusion is just less polite.
The teams I've seen get real, sustained value from AI coding agents aren't the ones who adopted the best tools. They're the ones who had already done the boring architecture work — the service decomposition, the contract documentation, the testing discipline, the observability investment. The agents made them faster. For everyone else, the agents made the underlying problems more visible and slightly more dangerous.
If you're a tech lead being asked by management to "add AI agents to the development process," my honest advice is this: before you evaluate tools, run the 60-second context test on your five most-changed services in the last quarter. If they fail it, you don't have an AI tooling problem. You have an architecture problem, and AI is about to make it louder.
Fix the architecture first. The agents will thank you — and so will your junior engineers.