Not Everything Can Be Vibe-Coded

Magesh Ravi on June 20, 2025

Two months ago, I set out to build what I assumed would be a simple pipeline: fetch a few hundred files from a specific Google Drive folder, process their contents, and load them into a database to support a RAG (Retrieval-Augmented Generation) system.

The folder had about 300 documents buried in a nest of subfolders — a mix of Google Docs, Slides, and Spreadsheets. I’d ignore everything else. Easy, right?

It wasn’t.

I sketched a basic DB schema and used GitHub Copilot to scaffold a Python CLI with typer. The core steps were simple in theory:

Read the files,
Generate vector embeddings from their contents,
Store embeddings with metadata in the database.

Step 1: Reading the Files

The CLI worked fine on my local machine — at first. Docs and Slides went through cleanly. But some spreadsheets threw timeout errors.

❌ Problem 1: Timeouts on Large Spreadsheets

Some files had 160K to 200K rows. Reading them in one go was out of the question.

✅ Fix: Batch reads in chunks of 10,000 rows. That solved the timeouts locally.

But when I tested the same script in production, inside a containerized Azure environment, it crashed. No errors in the logs. Just a silent failure.

❌ Problem 2: Memory Constraints

The container was limited to 1.5GB of memory — a limit I couldn’t change. Reading 10K rows at a time exhausted it.

✅ Fix: Reduce batch size. Eventually settled on 1,000 rows per batch. It worked — until it didn’t.

❌ Problem 3: Rate Limits

At 1,000-row batches, Google Drive’s API rate limits (60 requests/sec) started kicking in.

✅ Fix: Implemented a rolling-window rate limiter in code. Wait, retry, resume.

Stable now.

Step 2: Creating Embeddings

Using OpenAI’s embedding API was straightforward. Until…

❌ Problem 4: Token Limits

Even 1,000-row batches exceeded the model’s \~8K token limit.

✅ Fix: Used tiktoken to estimate token size and split batches into chunks.

But…

❌ Problem 5: Lost Context

Some rows got split mid-sentence or mid-table, losing coherence across chunks.

✅ Fix: Introduced overlap between chunks (100 tokens) to preserve continuity.

Reflections: More Than Code Completion

This pipeline ended up being one of the most complex pieces of software I’ve written in recent memory — and not because the logic was inherently difficult. It’s because robust software has to survive real-world constraints:

Infrastructure limitations (memory, runtime environment)
Input variability (a few rows vs. 200K)
External API restrictions (rate limits, token limits)
Design trade-offs (batch size vs. context integrity)

And yet, from the outside, it’s just "read files → embed → save." This is the paradox of production systems: they look simple only when they’re built right.

The Myth of “Vibe-Coding”

There’s a trend in tech to assume that you can just prompt your way to a working solution. And while AI tools are powerful, they don’t replace engineering judgment.

You can’t prompt your way around:

Knowing how much memory your code will consume at runtime.
Understanding why a 200K-row file behaves differently than a 500-row one.
Respecting third-party rate limits.
Making architectural decisions based on trial, error, and observation.

These aren’t "AI problems." They’re engineering realities.

Closing Thought

You don’t need to write everything from scratch. But you do need people who can recognize complexity when it’s hiding behind a seemingly “simple” task.

Some systems can be vibe-coded.

But the systems that matter? They need engineers.

Last updated: June 20, 2025, 5:50 p.m.