Do vibe-coding they said, it will be fun they said.
I wanted to test vibe coding seriously. I'd used it for toys like boardgames, simulations and terminal space shooters, but what if you tried to go beyond that? Could it handle production-ready code?
What follows is a story of a Flask app that was functional, but also kept breaking somewhat randomly.
A Flask app
Imagine that I have a Flask webapp, one that starts like this:
from flask_sqlalchemy import SQLAlchemy
from flask import Flask
# Init app
app = Flask(__name__)
app.config = {...}
# Init DB
db = SQLAlchemy()
db.init_app(app)
# Init the rest
...
Then I might also have some tests, these use fixtures and might look a little bit like this:
from app import app as flask_app
from flask import session
@pytest.fixture
def client():
flask_app.config['TESTING'] = True
flask_app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///:memory:' # Use in-memory SQLite for tests
with flask_app.app_context():
db.create_all() # Create all tables
# Create a test user if needed for your tests
test_user = User(email='test@example.com', is_verified=True, is_paying=False, message_interval_days=7)
db.session.add(test_user)
db.session.commit()
with flask_app.test_client() as client:
yield client
with flask_app.app_context():
db.session.remove()
db.drop_all() # Clean up after tests
This client
-fixture is used all over the place in the test suite, and what's nice here is that it makes sure that each test starts with a fresh database. This feels good, vibe-coding checks out, so let's move on and have it write some more code.
Odd behavior
What could go wrong, you might ask?
A lot! It's very subtle, but consider the flask app one more time.
from flask_sqlalchemy import SQLAlchemy
from flask import Flask
# Init app
app = Flask(__name__)
app.config = {...}
# Init DB
db = SQLAlchemy()
db.init_app(app)
# Init the rest
...
The fixture overrides app.config
variables, but these changes happen after db
is initialized. So if you accidentally have a SQLALCHEMY_DATABASE_URI
environment variable pointing to production... the fixture will still use it! And notice the fixture's cleanup code:
...
with flask_app.test_client() as client:
yield client
with flask_app.app_context():
db.session.remove()
db.drop_all() # Clean up after tests
Imagine debugging this. Tests don't fail - instead, production user tables drop randomly.
I initially suspected a misconfigured migration. The AI assistant suggested Docker deployment issues. But it was simpler: an environment variable problem. The app didn't protect against environment variables, leaving the door open to accidentally dropping production tables.
The quick fix, suggested by the bot, was to add this to the conftest.py
file:
import os
# This runs before any tests are collected
def pytest_configure(config):
# Explicitly override DATABASE_URL to ensure we never connect to a real database
os.environ["DATABASE_URL"] = "sqlite:///:memory:"
This will prevent the database losing tables when running pytest, but it doesn't address the fact that the project itself needs to be more strict with regards to separating local/production environments.
My conclusion for now
So who'se fault is it that the setup broke? It is easy to blame the AI here, but the more that I think about it the more that it feels silly. The project was set up in the wrong way because I did not pay attention to it. I could have easily been more safe and pragmatic by strictly following two .env.prod
and .env.local
files. The fact was that I was allowing the LLM to YOLO around because that was the vibe that I was in when I let it do that.
So in short, vibe-coding ... it is both extremely impressive and extremely untrustworthy at the same time. It's stuff like the story above, but also things like "forgetting about csrf-tokens" in web forms that constantly reminds me of this. But at the end of the day, the lesson here isn't that "AI tools suck", but rather that I need to get better at steering it.
I'm still eager to play around with vibe coding agents, if only because they offer a new way to learn about code, but after trying to use it for "production" I find myself increasingly eager to code defensively. Maybe start more with TAB-completions and only let it vibe around when I am sure the structure of the project allows for it.