AWS Services Every DevOps Engineer Should Know: Security & FinOps

If you read Part 1 of this series, you already know how I think about classifying AWS services, by the type of work they belong to, and by how deep you actually need to go. We covered Infrastructure, Application, and Observability there.

Part 2 goes into TWO categories that I find most DevOps engineers underinvest in early in their careers: Security and FinOps.

Security because it’s not just the security team’s job anymore; especially in cloud-native environments where DevOps engineers are provisioning resources, managing credentials, and configuring access controls every single day.

And FinOps, because cost awareness has quietly become a core DevOps responsibility. Nobody gets praised for the infrastructure that runs perfectly, but everyone hears about the unexpected $4,000 AWS bill.

As mentioned in Part 1, this comes from my experience.

I’m a Cloud Consultant with 7+ years in AWS. The depth labels below reflect the kinds of production environments and projects I’ve worked on, including the infrastructure build I’m documenting at github.com/kbrepository/aws-infra-terraform. Your context may shift these labels. Use this as a starting point, not a definitive rulebook.

Quick reminder on depth labels:

  • 🟢 Daily Driver — You’ll touch this constantly. Know it well enough to configure, debug, and explain it without looking everything up.
  • 🟡 Weekly Touch — Shows up regularly. You don’t need to memorize every API, but you need a confident working knowledge.
  • 🔵 Specialist Territory — Situational. When the situation calls for it, shallow knowledge won’t cut it. Go deep when you need to.

🔐 Category 1: Security

Security in AWS isn’t a single service — it’s a layer that runs across everything you build. The services in this category aren’t always the most visible, but they’re the ones that quietly determine whether your infrastructure is trustworthy or one misconfiguration away from a bad day.

I used to treat security services as something to “add later, once the core infrastructure is done.” That mindset cost me more rework than I’d like to admit. Now security is wired in from the first module, not bolted on at the end.

Secrets Manager 🟢  Daily Driver

Secrets Manager is AWS’s managed service for storing, rotating, and retrieving sensitive credentials, database passwords, API keys, OAuth tokens, and anything you would otherwise be tempted to put in an environment variable or a config file.

And I’ll be direct: if you’re still storing secrets in environment variables hardcoded into your deployment pipeline, Secrets Manager is the first thing you should change.

How I actually use it:

In my Terraform project, Secrets Manager shows up immediately alongside RDS.[ AWS has native integration that handles automatic credential rotation without any application downtime.]

  • The application fetches the secret at runtime via the Secrets Manager API.
  • Rotation happens transparently in the background.
  • No more manually updating passwords across environments and hoping you didn’t miss one.

I also use it for third-party API keys that Lambda functions need:

  • Rather than passing them as environment variables, I store them in Secrets Manager.
  • The Lambda execution role is granted permission to fetch them at runtime.
  • Cleaner, auditable, and rotatable.

The pattern that surprised me early on: Secrets Manager integrates with CloudTrail, so every secret access is logged. That audit trail has been useful more than once when tracing which service was calling which credential and when.

What I was wrong about early on: I used to think Secrets Manager was overkill for smaller projects and that environment variables were “fine for now.” The problem with “fine for now” is that secrets stored insecurely have a way of staying insecure indefinitely. The friction of switching later, updating application code, CI/CD pipelines, and IAM policies is always higher than just doing it right from the start.

What you need to know:

  • Secret types – credentials, API keys, arbitrary JSON blobs.
  • Automatic rotation – native integration with RDS, Redshift, DocumentDB; Lambda-based rotation for everything else.
  • Resource-based policies – control which principals can access which secrets.
  • Secrets Manager vs Parameter Store – Secrets Manager for sensitive credentials with rotation; SSM Parameter Store for non-sensitive config values.
  • VPC endpoints – for accessing Secrets Manager from private subnets without internet routing.
  • Cross-account access – for sharing secrets across AWS accounts securely.

Practical tip:

  • Use the Secrets Manager SDK’s built-in caching client in your applications.
  • It caches secret values locally and only refreshes when the cache TTL expires.
  • This significantly reduces API call costs and latency compared to fetching the secret on every request.

KMS — Key Management Service 🟡 Weekly Touch

KMS manages encryption keys — the cryptographic material used to encrypt and decrypt data across AWS services. S3 server-side encryption, RDS storage encryption, EBS volume encryption, Secrets Manager encryption — all of it can run through KMS.

It’s one of those services that’s easy to overlook because it works invisibly in the background, right up until you need to audit, rotate, or control access to your encryption keys.

How I actually use it:

My default approach in any production environment: create customer-managed KMS keys for sensitive data stores rather than relying on AWS-managed keys.

The reason is control:

  • With a customer-managed key, I can define exactly who can use the key.
  • Audit every encrypt and decrypt operation via CloudTrail.
  • Rotate the key material on a schedule.

AWS-managed keys are convenient but give you far less visibility and control.

In my Terraform project, KMS keys are provisioned early; one per environment, with key policies that follow least privilege.

S3 buckets holding sensitive data, RDS instances, and Secrets Manager all reference the appropriate customer-managed key. It sounds like overhead, but setting it up in Terraform takes very little effort and pays dividends in auditability.

My opinion: KMS is one of those services that feels optional until you’re in a compliance conversation or a security review. At that point, not having customer-managed keys becomes a gap you have to explain. Building the habit early is significantly easier than retrofitting encryption key management into an existing environment.

What you need to know:

  • AWS-managed keys vs customer-managed keys – know the difference and when each is appropriate.
  • Key policies vs IAM policies – KMS uses both; key policies are the primary access control mechanism.
  • Envelope encryption – how KMS actually encrypts data at scale.
  • Key rotation – automatic annual rotation for customer-managed keys.
  • Multi-region keys – for disaster recovery and cross-region replication scenarios.
  • KMS grants – for temporary, delegated key access without modifying the key policy.

Practical tip:

  • Always define a key deletion waiting period when creating KMS keys via Terraform.
  • KMS key deletion is irreversible. If you delete a key with data encrypted under it, that data is gone permanently.
  • The waiting period exists to prevent accidents. Treat it as mandatory, not optional.

IAM Identity Center 🟡 Weekly Touch

IAM Identity Center ( previously called AWS SSO) is how you manage human access to multiple AWS accounts from a single place. Instead of creating IAM users in every account, you define permission sets centrally and assign them to users or groups across your entire AWS Organization.

If you’re working in a multi-account environment, this is the right way to handle human access.

How I actually use it:

In any environment with more than two or three AWS accounts, manually managing IAM users per account becomes a maintenance nightmare fast. IAM Identity Center solves this cleanly:

  • Connect it to your identity provider.
  • Define permission sets — essentially IAM policies packaged for assignment.
  • Assign them to accounts.

Engineers get a single login portal, temporary credentials per session, and access only to what they need.

The operational benefit I didn’t fully appreciate until I was using it: when someone leaves the team, you remove them from the identity provider and their access to every AWS account is revoked instantly. With per-account IAM users, that offboarding process is a manual checklist across every account and checklists get missed.

What you need to know:

  • Permission sets – reusable IAM policy packages assigned to accounts and users or groups.
  • Identity sources – built-in directory, Active Directory, or external identity provider.
  • SCIM provisioning – for automatic user sync from your identity provider.
  • AWS CLI integration – the aws configure sso workflow for developer access.
  • Account assignment – mapping permission sets to specific accounts for specific users or groups.
  • Integration with AWS Organizations – the foundation for multi-account access management.

Practical tip:

  • Set up IAM Identity Center even if you only have two or three AWS accounts right now.
  • The effort to configure it early is low; migrating from a tangled web of per-account IAM users later is high.
  • It’s one of those foundational decisions that compounds — either positively or painfully — over time.

WAF — Web Application Firewall 🔵  Specialist Territory

WAF sits in front of your web-facing resources — CloudFront distributions, Application Load Balancers, API Gateway endpoints — and filters HTTP traffic based on rules you define.

  • Block requests from specific IPs, countries, or user agents.
  • Throttle requests that exceed rate limits.
  • Filter out common attack patterns like SQL injection, cross-site scripting, and path traversal.

It’s your application-layer defense line.

How I actually use it:

WAF is not something I configure on every project from day one — that’s what makes it Specialist Territory. But when it’s needed, it needs to be configured properly or it provides false confidence.

I typically introduce WAF when a public-facing API or web application is moving toward production and needs protection beyond what security groups and NACLs provide at the network level.

AWS Managed Rule Groups are the right starting point:

  • AWS maintains pre-built rule sets for common threats.
  • Attach them to a WAF web ACL without writing rules from scratch.
  • Layer custom rate-based rules on top for specific endpoints like login flows, password reset, and public APIs without authentication.

The thing that catches people off guard with WAF: it generates a significant volume of logs if you enable full logging, and those logs go to CloudWatch Logs or Kinesis Firehose — both of which cost money at scale. Plan your logging strategy before you enable it, not after the first bill.

What you need to know:

  • Web ACLs – the container for your WAF rules, associated with a resource.
  • AWS Managed Rule Groups – pre-built rule sets maintained by AWS; start here before writing custom rules.
  • Rate-based rules – throttle requests from a single IP exceeding a threshold per five-minute window.
  • Custom rules – for application-specific filtering beyond what managed rules cover.
  • WAF logging – essential for understanding what’s being blocked.
  • Count vs Block mode – use Count mode first to understand traffic impact before switching to Block.

Practical tip:

  • Always run new WAF rules in Count mode for at least a few days before switching to Block mode.
  • WAF rules applied directly to Block mode in production can cause legitimate traffic outages.
  • Count first, analyze the logs, then block.

Security Hub 🔵  Specialist Territory

Security Hub is AWS’s centralized security findings aggregator. It pulls findings from across AWS security services — GuardDuty, Inspector, Macie, Config, IAM Access Analyzer — and third-party tools, normalizes them into a standard format, and gives you a single pane of glass for your security posture.

It also runs automated checks against security standards like CIS AWS Foundations and AWS Foundational Security Best Practices.

How I actually use it:

Security Hub earns its place in environments with meaningful compliance requirements or multiple AWS accounts generating security findings across different services.

In a single-account project, it’s probably overkill. In a production multi-account environment with GuardDuty, Inspector, and Config all generating findings, it’s genuinely useful — without it, you’re manually checking each service’s console separately.

The automated standard checks are where I’ve gotten the most immediate value:

  • Enable Security Hub and run the AWS Foundational Security Best Practices standard.
  • Get a prioritized list of security gaps immediately.
  • Move faster than doing a manual audit service by service.

My opinion: Security Hub is one of those services that pays for itself in the first week if your account has been running without systematic security checks. The findings it surfaces in a fresh audit are usually equal parts useful and uncomfortable.

What you need to know:

  • Security standards – CIS AWS Foundations, AWS FSBP, PCI DSS.
  • Findings aggregation – how Security Hub normalizes findings from GuardDuty, Inspector, Macie, and others.
  • Cross-account aggregation – centralizing findings from multiple accounts into a delegated administrator account.
  • Automated response – integration with EventBridge for triggering Lambda remediations.
  • Suppression rules – for muting known false positives or accepted risks.

Practical tip:

  • When you first enable Security Hub, don’t try to fix every finding immediately.
  • Filter by severity first, pick the top findings by count, and work through them systematically.
  • Treat it like a backlog, not a fire drill.

💰 Category 2: FinOps

FinOps (financial operations for cloud) is the practice of understanding, optimizing, and taking ownership of cloud costs. A few years ago, this felt like something only finance teams cared about. Now, in most organizations I’ve worked with, DevOps engineers are expected to understand the cost implications of the infrastructure they provision.

I’ll be honest — I didn’t take FinOps seriously early in my career. I cared about whether things worked, not what they cost. The first time I watched a staging environment rack up a significant bill because someone forgot to turn off a NAT gateway over a long weekend, my perspective shifted.


Cost Explorer 🟢  Daily Driver

Cost Explorer is AWS’s built-in cost visualization and analysis tool. It lets you view your AWS spending over time and break it down by service, account, region, tag, or usage type.

It’s not glamorous — but it’s the first place I go when I want to understand where money is going in an AWS account.

How I actually use it:

I check Cost Explorer at the start of every week on any active project. Not because I’m paranoid, but because cost surprises in cloud are almost always the result of not noticing something early enough.

  • A NAT gateway processing more traffic than expected.
  • A CloudWatch Logs group growing faster than anticipated.
  • An EC2 instance left running in a non-production environment over the weekend.

Cost Explorer surfaces these patterns before they become billing problems.

The grouping options are where Cost Explorer gets genuinely powerful:

  • Grouping by tag lets you see cost per environment, team, or project.
  • But only if you’ve been tagging resources consistently.

This is the reason I’m so insistent on tagging from day one in every project. Without tags, Cost Explorer gives you totals. With tags, it gives you accountability.

What you need to know:

  • Cost breakdown by service, region, account, tag, and usage type.
  • Daily vs monthly granularity.
  • Rightsizing recommendations.
  • Savings Plans and Reserved Instance recommendations.
  • Cost anomaly detection.
  • Forecasting.

Practical tip:

  • Enable Cost Anomaly Detection and wire it to an SNS topic that sends to your Slack channel.
  • It uses machine learning to detect unusual spending patterns.
  • Setup takes very little time and can catch unexpected costs early.

AWS Budgets 🟢  Daily Driver

AWS Budgets lets you set spending thresholds and receive alerts when your actual or forecasted costs cross them.

Think of it as the guardrail layer on top of Cost Explorer — Cost Explorer tells you what you’ve spent, Budgets tells you when you’re about to spend more than you intended.

How I actually use it:

Every AWS account I work on gets at least two budgets on day one:

  • A monthly cost budget with alerts at 80% and 100% of expected spend.
  • A zero-spend budget on any account that should be idle.

The zero-spend budget is particularly useful for sandbox or testing accounts — the moment any charge appears, an alert fires. It’s caught forgotten resources more times than I can count.

For production environments, I also set up forecasted spend alerts. Budgets will alert you when AWS forecasts that you’ll exceed your budget by end of month, even if you haven’t exceeded it yet. That early warning gives you time to investigate and act rather than react to a bill after the fact.

My opinion: AWS Budgets is one of the lowest-effort, highest-value things you can set up in any AWS account. Five minutes to configure. There is genuinely no good reason not to have it running everywhere.

What you need to know:

  • Budget types – cost budgets, usage budgets, Savings Plans budgets, reservation budgets.
  • Alert thresholds – actual spend alerts and forecasted spend alerts.
  • Budget scope – filter by service, account, tag, or region.
  • Budget actions – automatically apply IAM policies or stop EC2/RDS instances when a threshold is crossed.
  • SNS integration – for routing budget alerts to Slack, email, or PagerDuty.

Practical tip:

  • Use Budget Actions for non-production accounts.
  • You can configure a budget action to automatically apply a restrictive policy or stop running instances when a threshold is crossed.
  • It acts as a real safety net for sandbox and learning environments.

Trusted Advisor 🟡 Weekly Touch

Trusted Advisor is AWS’s built-in best practices checker. It inspects your environment across five categories — Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits — and surfaces recommendations for improvement.

Think of it as a periodic health check for your AWS account that runs automatically in the background.

How I actually use it:

I review Trusted Advisor findings during infrastructure reviews or before major deployments, not daily. The most immediately useful checks are in Cost Optimization:

  • Idle EC2 instances.
  • Underutilized EBS volumes.
  • Unassociated Elastic IPs.
  • Old snapshots that nobody is using.

These are the quiet cost leaks that accumulate over time in any active AWS account.

The Service Limits checks are underappreciated. Trusted Advisor monitors your usage against AWS service quotas and warns you when you’re approaching a limit. Running into an EC2 or VPC quota limit mid-deployment because nobody checked is an avoidable incident.

One honest caveat: the depth of Trusted Advisor checks depends on your AWS Support plan. Basic and Developer plans get a limited set of checks. Business and Enterprise plans unlock the full catalog. If your organization is on Basic support, Trusted Advisor will feel underwhelming.

What you need to know:

  • Five check categories – Cost Optimization, Performance, Security, Fault Tolerance, Service Limits.
  • Support plan dependency – full checks require Business or Enterprise support.
  • Refresh cadence – most checks refresh every 24 hours, not real time.
  • CloudWatch integration – surface Trusted Advisor findings as CloudWatch metrics and alarm on them.
  • Organizational view – for aggregating findings across multiple accounts in an AWS Organization.

Practical tip:

  • Set a recurring calendar reminder — monthly for stable environments, weekly for actively changing ones — to review Cost Optimization findings.
  • The idle resources it surfaces are almost always things that got forgotten, not things anyone consciously decided to keep running.

Compute Optimizer 🔵  Specialist Territory

Compute Optimizer uses machine learning to analyze your actual resource utilization and recommend optimal configurations for EC2 instances, EBS volumes, Lambda functions, ECS services on Fargate, and Auto Scaling Groups.

Where Trusted Advisor tells you something is underutilized, Compute Optimizer tells you exactly what to change it to — and models the cost and performance impact of each recommendation.

How I actually use it:

Compute Optimizer earns its place in environments that have been running long enough to generate meaningful utilization data — typically at least two weeks of consistent workload. I pull it out during cost optimization cycles, not as part of daily operations.

It’s particularly useful for EC2 instance rightsizing in environments that were initially over-provisioned “to be safe” and never revisited.

The Lambda recommendations are genuinely useful and often overlooked:

  • Compute Optimizer analyzes memory utilization patterns across Lambda invocations.
  • Recommends optimal memory settings, which directly affect both cost and performance.
  • Lambda allocates CPU proportionally to memory. A function configured with 1GB that consistently uses 200MB is an easy win.

What you need to know:

  • Supported resource types – EC2 instances, EBS volumes, Lambda functions, ECS on Fargate, Auto Scaling Groups.
  • Enrollment – must be explicitly enabled per account or organization.
  • Recommendation categories – Over-provisioned, Under-provisioned, Optimized, Not enough data.
  • Enhanced infrastructure metrics – opt-in feature using CloudWatch metrics for more accurate recommendations.
  • Savings estimation — projected monthly savings if recommendations are applied.

Practical tip:

  • Don’t apply Compute Optimizer recommendations blindly, especially for production EC2 instances.
  • The recommendations are based on historical utilization patterns, which may not account for seasonal traffic spikes or planned growth.
  • Validate recommendations against your knowledge of the workload before downsizing anything load-bearing.

The Full Picture — Part 2

Here’s everything in one reference table:

CategoryServiceDepth Label
SecuritySecrets Manager🟢 Daily Driver
SecurityKMS🟡 Weekly Touch
SecurityIAM Identity Center🟡 Weekly Touch
SecurityWAF🔵 Specialist Territory
SecuritySecurity Hub🔵 Specialist Territory
FinOpsCost Explorer🟢 Daily Driver
FinOpsAWS Budgets🟢 Daily Driver
FinOpsTrusted Advisor🟡 Weekly Touch
FinOpsCompute Optimizer🔵 Specialist Territory

The Bigger Picture Across Both Parts

If you put Part 1 and Part 2 together, you now have a classification across five categories: Infrastructure, Application, Observability, Security, and FinOps, covering 20 AWS services with honest depth labels based on real DevOps work.

NOTE: What would you move or add in the Security or FinOps categories? I’m particularly curious whether others are treating Security Hub as a Daily Driver — it feels like it could go either way depending on your environment. Drop a comment below.

If you haven’t read Part 1 of this series yet, that’s the place to start — it covers Infrastructure, Application, and Observability with the same format. And the production-grade Terraform infrastructure project I reference throughout both posts is publicly documented at github.com/kbrepository/aws-infra-terraform.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top