What I learned while building and debugging AWS infrastructure from scratch with Terraform

Introduction
Recently, I took on a hands-on challenge to simulate a real-world lift-and-shift cloud migration using AWS.
My goal was to build modular infrastructure from scratch using Terraform. I wanted this to be more than a demo: a production-style, reusable foundation for deploying applications in any environment.
In this blog, I’ll walk you through what I built, how I modularized everything, and most importantly, the real errors I hit and how I resolved them.
What I Built (Scope of the Infra)
Using Terraform, I created:
- Custom VPC with public & private subnets, NAT Gateway (Honestly, due to pricing, I commented out the NAT Gateway code block 🤫)
- EC2 Auto Scaling Group via Launch Templates
- Versioned S3 Bucket with optional force-delete toggle
- All built with separate modules and environment-specific configuration
This infrastructure supports auto-scalable workloads while keeping services modular and cost-effective.
Note: Since this was a personal POC, No RDS is used in this case to avoid costs 😬I chose not to include RDS or Load Balancers to keep costs minimal. The infra can be easily extended to support those.

Folder & Module Structure
I followed a clean multi-environment layout:
modules/ : Reusable logic for each AWS service
environments/dev/ :Binds modules together for a specific environment like dev/stage/prod
Architectural Overview

Implicit Dependencies — No depends_on
Used
I loved that Terraform automatically figures out the order of execution using references between modules.
For example, when EC2 needed vpc_id
I simply used:
vpc_id = module.vpc.vpc_id
No need to explicitly write depends_on
— Terraform builds the graph behind the scenes using a DAG (Directed Acyclic Graph).
I even visualized it with:
terraform graph | dot -Tpng > graph.png

Troubleshooting & Errors I Faced
This was the most fun (and painful!🥲) part of the process. Here are some real errors and how I solved them:
- Missing
azs
variable: Forgot to pass availability zones — fixed withterraform.tfvars
- Undeclared variable errors: Tried referencing
var.vpc_id
without declaring or passing it - Terraform prompting for input: Resolved by using
module.vpc.public_subnet_ids
instead ofvar.subnet_ids
- Module access violations: Accidentally tried to access one module from inside another — Terraform doesn’t allow that!
terraform destroy
failed: State mismatch — deleted resources manually and cleaned up local state
Every mistake taught me something new — I documented them all in a TROUBLESHOOTING.md
to help others.
Safe Cleanup Strategy
Before destroying infrastructure, I always:
terraform plan -destroy -out=destroy.tfplan
terraform apply destroy.tfplan
This ensures controlled teardown, especially when resources like NAT Gateways or EC2 instances are involved.
GitHub Repo Link
Want to try this yourself?
Next Steps
- Add monitoring with CloudWatch
- Introduce ALB + Route53
- Extend to production/stage environment
- CI/CD pipeline (Jenkins or GitHub Actions) in a future blog
Final Thoughts
This wasn’t just about learning Terraform — it was about learning how real infrastructure comes together, component by component.
If you’re looking to learn Terraform or AWS, I highly recommend doing a real project like this. It’s the fastest way to go from “I know the basics” to “I can deploy production-ready infra.”
Let me know what you think, and feel free to connect or follow me for more AWS + DevOps content!