Terraform Modules: How to Build a Scalable AWS Infrastructure.

What I learned while building and debugging AWS infrastructure from scratch with Terraform

Introduction

Recently, I took on a hands-on challenge to simulate a real-world lift-and-shift cloud migration using AWS.

My goal was to build modular infrastructure from scratch using Terraform. I wanted this to be more than a demo: a production-style, reusable foundation for deploying applications in any environment.

In this blog, I’ll walk you through what I built, how I modularized everything, and most importantly, the real errors I hit and how I resolved them.

What I Built (Scope of the Infra)

Using Terraform, I created:

Custom VPC with public & private subnets, NAT Gateway (Honestly, due to pricing, I commented out the NAT Gateway code block 🤫)
EC2 Auto Scaling Group via Launch Templates
Versioned S3 Bucket with optional force-delete toggle
All built with separate modules and environment-specific configuration

This infrastructure supports auto-scalable workloads while keeping services modular and cost-effective.

Note: Since this was a personal POC, No RDS is used in this case to avoid costs 😬I chose not to include RDS or Load Balancers to keep costs minimal. The infra can be easily extended to support those.

Folder & Module Structure

I followed a clean multi-environment layout:

modules/ : Reusable logic for each AWS service

environments/dev/ :Binds modules together for a specific environment like dev/stage/prod

Architectural Overview

Implicit Dependencies — No `depends_on` Used

I loved that Terraform automatically figures out the order of execution using references between modules.

For example, when EC2 needed vpc_id I simply used:

vpc_id = module.vpc.vpc_id

No need to explicitly write depends_on — Terraform builds the graph behind the scenes using a DAG (Directed Acyclic Graph).

I even visualized it with:

terraform graph | dot -Tpng > graph.png

Troubleshooting & Errors I Faced

This was the most fun (and painful!🥲) part of the process. Here are some real errors and how I solved them:

Missing azs variable: Forgot to pass availability zones — fixed with terraform.tfvars
Undeclared variable errors: Tried referencing var.vpc_id without declaring or passing it
Terraform prompting for input: Resolved by using module.vpc.public_subnet_ids instead of var.subnet_ids
Module access violations: Accidentally tried to access one module from inside another — Terraform doesn’t allow that!
terraform destroy failed: State mismatch — deleted resources manually and cleaned up local state

Every mistake taught me something new — I documented them all in a TROUBLESHOOTING.md to help others.

Safe Cleanup Strategy

Before destroying infrastructure, I always:

terraform plan -destroy -out=destroy.tfplan
terraform apply destroy.tfplan

This ensures controlled teardown, especially when resources like NAT Gateways or EC2 instances are involved.

GitHub Repo Link

Want to try this yourself?

View the full code on GitHub

SourceCode

Next Steps

Add monitoring with CloudWatch
Introduce ALB + Route53
Extend to production/stage environment
CI/CD pipeline (Jenkins or GitHub Actions) in a future blog

Final Thoughts

This wasn’t just about learning Terraform — it was about learning how real infrastructure comes together, component by component.

If you’re looking to learn Terraform or AWS, I highly recommend doing a real project like this. It’s the fastest way to go from “I know the basics” to “I can deploy production-ready infra.”

Let me know what you think, and feel free to connect or follow me for more AWS + DevOps content!