May 3, 2026 AI-generated· revisions: 0

Vibe Coding Will Wreck Your Production System (Unless You're a Senior Engineer)

AI coding tools are a multiplier, not an equalizer. Here's why vibe coding accelerates senior engineers and silently destroys systems built by everyone else — with data.

#ai#system-design#engineering-leadership#microservices#technical-debt

Computer screen displaying colorful code

Photo by Vishnu Kalanad

I've been watching the vibe coding discourse for months now, and both camps are wrong. The boosters who say "AI writes code 10x faster so everyone's a senior engineer now" are selling you a fairytale. The blanket critics who say "AI-generated code is trash, full stop" are missing why it's actually dangerous — and for whom.

Here's the real problem: vibe coding isn't an equalizer. It's a multiplier. And multipliers cut both ways.

I run the platform engineering at Berlapak, a B2B marketplace built on a Go and Node.js microservices stack. I've been doing this for seven years. I've watched AI coding tools go from a neat party trick to something my whole team has opinions about. And the pattern I keep seeing is this — the engineers who deeply understand the architecture use these tools and ship faster with acceptable quality. Everyone else uses them and ships faster with invisible problems.

The "vibe coding hangover" is not a prediction anymore. It's a support ticket backlog.

What the Data Actually Says

Let's start here before anyone accuses me of vibes-based criticism of vibe coding.

Researchers have been measuring AI co-authored code quality and the numbers are not flattering. Code with significant AI co-authorship shows roughly 1.7x more major issues than human-written code. Security vulnerabilities specifically run about 2.74x higher. That's not a rounding error — that's a categorically different risk profile on your attack surface.

The refactoring signal is the one that worries me most, though. Refactoring — actual cleanup, restructuring, improving the internal quality of existing code — has dropped from around 25% of changed lines in 2021 to under 10% today. That's not engineers getting more efficient. That's engineers shipping new code on top of code they don't fully understand, because the AI made it fast and easy to just add more rather than improve what's there.

That's entropy, and entropy compounds.

The Cardboard Muffin Problem

Here's an analogy I keep coming back to: a cardboard muffin looks exactly like a real muffin. It has the right shape, the right color, the right muffin-ness. It will pass any visual inspection. It will not nourish anyone.

AI writes excellent-looking test coverage. Green CI pipelines everywhere. 90%, 95%, sometimes 100% code coverage on the coverage report. And it tests almost nothing that matters.

I saw this play out on a module in our B2B marketplace earlier this year. The AI-generated tests for our pricing engine were thorough in the worst possible way — they tested that functions returned values, that API responses had the right HTTP status codes, that data structures had the right shape. What they didn't test: whether a bulk-buy discount was correctly applied before or after a seller-specific markup. Whether a suspended buyer account correctly blocked a transaction mid-flow. Whether inventory reservation was held correctly when a payment timed out.

Those aren't edge cases. For a marketplace, those are the business logic. The cardboard muffin passed CI on every merge. We found the pricing bug in production because a real buyer placed a real order.

AI doesn't know your domain. It knows the shape of code. Tests it writes are structurally correct and semantically hollow unless you're very deliberate about what you ask it to verify. And most people aren't, because the green checkmark feels like permission to merge.

Microservices Are Where This Gets Really Dangerous

a man standing in front of a white board with sticky notes on it

Photo by Walls.io on Unsplash

If you're running a monolith, AI-generated code quality issues are annoying. If you're running microservices, they can be catastrophic — and the failure mode is silent until it isn't.

Here's the specific scenario that keeps me up at night. Your team is moving fast, using AI to scaffold or iterate on a service. The AI, working without global context of your system, makes a locally reasonable decision. Maybe it switches an internal auth mechanism from JWT validation to session-based cookies because the framework documentation it was trained on made that pattern look cleaner. The service still works in isolation. Tests pass. PR looks fine to someone who's reviewing diff, not architecture.

Now that service is deployed. Every downstream service that depended on the JWT contract starts getting 401 errors. Not immediately — maybe only on certain code paths, maybe only for certain user roles, maybe only when the session store is under load. You're debugging distributed system failures across multiple services looking for something that looks like an infrastructure problem but is actually a silently-broken service contract baked in during a Friday afternoon vibe session.

I've seen variants of this. Event schema drift is another one — an AI updates a producer service's event payload structure because it's "cleaner" or matches a newer convention, and three consumer services now silently drop fields they depend on. With a microservices architecture, the contract between services is the product. AI has no concept of contracts. It optimizes for local coherence.

The engineers who catch this before it ships are the ones who carry a mental model of the whole system — who understand that this service is upstream of that pipeline, that this authentication boundary is load-bearing, that this event schema has consumers you can't even see in this repo. That mental model doesn't come from prompting. It comes from years of debugging production incidents at 2am.

The Architect Advantage

Let me be direct: I use AI coding tools heavily. Cursor, Claude, Copilot — I rotate through them depending on the task. And I get real leverage from them.

But I use them in a very specific way. Boilerplate: yes. Scaffolding a new CRUD service following our established patterns: yes. Generating the repetitive parts of a gRPC handler: yes. Writing a database migration with the right transaction boundaries: I review this carefully, often rewrite it. Auth middleware: never, full stop. Pricing and inventory domain logic: never.

The split isn't "AI code vs. human code." The split is load-bearing vs. non-load-bearing. A senior engineer knows which parts of their system are load-bearing. That judgment is what I'm selling when I use the tool — the AI handles execution of patterns I already understand, and I handle anything where the failure mode isn't immediately obvious.

The dangerous myth is that junior engineers can develop this judgment quickly by using AI. They can't, because the judgment comes from debugging things that broke, understanding why they broke, and internalizing the failure modes. AI-assisted development means you ship code before it breaks you in those educational ways. The system looks fine until it really, catastrophically doesn't.

The Dual-Track Rule

At Berlapak, we've formalized this into something I call the prototype track and the production track. It's not complex.

Prototype track: AI can generate whatever it wants. Move fast, throw things away, learn quickly. This is for internal tools, scripts, throwaway demos, new feature exploration where we're validating product direction. Code from the prototype track never touches production directly — it gets rewritten or graduated.

Production track: AI is a junior pair programmer, not an author. It suggests, I review with architectural context. Specific exclusions that are hard rules:

# These are NOT in scope for AI authorship on the production track:
- Authentication and authorization logic
- Payment processing, pricing calculations
- Service contract definitions (proto files, event schemas, API contracts)
- Database migrations
- Security boundary code (rate limiting, input validation, secrets handling)
- Anything in the critical path of financial transactions

This isn't about distrusting AI. It's about knowing which parts of your system have asymmetric downside risk. A wrong button color is a UX bug. A wrong auth check is a breach.

Tech leads — if you don't have a written policy like this, your team is making it up individually. And they're probably being more permissive than you'd want, because the path of least resistance with these tools is always "let the AI do more."

The Workforce Trap Nobody Is Talking About

Server rack with blinking green lights

Photo by Domaintechnik Ledl.net on Unsplash

Here's the thing that genuinely makes me angry when I think about it clearly.

Around 54% of engineering leaders say they plan to hire fewer junior engineers because AI can fill the gap. This is a reasonable-sounding idea that I think is genuinely dangerous to the industry's medium-term health.

Junior engineers are not primarily valuable for the code they write. They're valuable for what they're becoming. The way you become someone who can spot a broken service contract in a PR, or who knows intuitively that the retry logic in this event consumer is going to cause duplicates under load — is by writing code that breaks production, debugging it at 2am, and internalizing why it happened.

If we replace the junior-engineer pipeline with AI-generated code, we're deferring the development of that judgment. And we're doing it at exactly the moment when we're accumulating technical debt that will require exactly that judgment to untangle.

The AI-generated code being shipped today in companies across the industry is the technical debt that someone is going to have to clean up in 2026 and 2027. The team best equipped to clean it up is one with experienced engineers who came up debugging hard problems. We're systematically not growing that team.

This is a debt you won't have the people to repay. And it's arriving faster than most engineering leaders seem to realize.

The Actual Dividing Line

The vibe coding debate keeps getting framed as AI users versus AI skeptics, or fast-movers versus cautious traditionalists. That's the wrong frame.

The real dividing line is architect versus prompter. An architect uses AI to execute on a mental model they already own. A prompter outsources the mental model to the AI and ships whatever comes back.

The output looks identical from the outside. For a while. Then you get the support tickets.

If you're a tech lead being pressured to "move faster with AI" — that pressure is legitimate, these tools do provide real leverage, and you should use them. But the leverage multiplies your judgment, not your headcount. Make sure your team has the judgment, define explicitly where AI authorship is and isn't appropriate, and be skeptical of any green CI pipeline that nobody can explain from first principles.

The hangover is real. Whether you're the one who drinks responsibly is still a choice.