Hire me
← All posts

The $0 Architecture: Building a Serverless ETL Pipeline (and the Key I Leaked Along the Way)

Everyone says serverless is cheap. Spin up a Lambda, process your data, scale to zero. What nobody tells you is that "cheap" is not a default state. It is an engineered outcome, and getting there usually means breaking a few things first.

Building Bill-E — an automated financial ETL pipeline — taught me that the hard way. Looking back at the commit history, the path to a hardened, zero-idle-cost architecture ran straight through a monolithic local script, a dependency nightmare, and one genuinely stressful security incident.

The V1 monolith and why it had to go

Bill-E did not start in the cloud. It started as a single Python script running on my machine. It worked in isolation, but local scripts do not scale, and they cannot back a web dashboard. The real problem showed up when I thought through what happens if a user uploads a massive financial CSV: a synchronous server would block the main thread, keep the user waiting, and rack up uptime costs the entire time.

So I scrapped V1 entirely and re-architected it as an event-driven serverless pipeline. The new flow looks like this:

S3 Bucket
User drops a CSV file
SQS Queue
S3 event queued for processing
Lambda
Runs the ETL script, then exits
DynamoDB
Transformed data written and stored

No persistent server. No idle capacity. If no file is uploaded for a week, the AWS bill for that week is exactly $0.00. The Lambda spins up, does its job, and disappears.

$ $0.00 AWS cost during idle periods, with no files uploaded and no Lambda invocations running

The Christmas Eve security panic

Before I could feel good about the new architecture, I learned a blunt lesson in cloud security. While rushing to test my AWS integration on December 24th, I pushed a commit containing hardcoded credentials directly to GitHub. Active AWS keys, sitting in a public repository.

Security Incident

Within minutes I deleted the commit, revoked the keys in the AWS Console, and rotated everything on the spot. I then spent the rest of the day refactoring the entire pipeline to pull secrets through environment variables only, with nothing sensitive ever touching source control.

The real lesson was not "be more careful before you push." That kind of promise does not hold under pressure. The lesson was to architect the project so that secrets are structurally impossible to commit in the first place. Environment variables enforced at the pipeline level, not left to memory.

The deployment grind

Getting the backend working locally was one thing. Deploying the frontend to the cloud was a different fight entirely. Here is roughly how those last few days of December went:

Dec 30, attempt 1
Deployed the Streamlit dashboard. Build crashed immediately. Cloud environment had no dependency list to install from.
Dec 30, attempt 2
Pushed fix: Added requirements.txt. Still missing a backend dependency. Pushed the fix again.
Dec 31
Hit a state management wall. Streamlit is inherently static but the Lambda was processing data asynchronously. Engineered a polling loop so the dashboard updated in real time as the pipeline finished.

The "it works on my machine" problem is not about the code being wrong. It is about the environment being different in ways you did not think to account for. Writing a requirements file feels tedious until you have deployed a broken build twice in the same afternoon.

Locking it down

Once the dashboard was live, I wanted to make it publicly accessible for my portfolio. The problem was that any visitor could trigger a Lambda invocation by uploading a file, which would cost real money and pollute my data. I spent the first week of January building a custom Role-Based Access Control system that split the app into two layers: a public read-only view and a locked upload layer that requires authentication.

Access Design

I also added a silent admin alert that pings me whenever someone logs into the upload layer. The public can explore the dashboard freely. Any actual write operations require credentials and notify me immediately.

What it actually taught me

Serverless is not just about skipping server management. It is a shift from paying for capacity to paying strictly for execution. That distinction matters a lot when you are building something that needs to be free to run but fast when it does.

More than any specific AWS service, building Bill-E taught me that real engineering does not follow a straight line. You write a monolith, realise it does not scale, tear it apart, leak a credential, fix the pipeline, fight a dependency, and eventually lock it all down. That is not a sign you are doing it wrong. That is just how production software gets made.