VPC Dealers
“Wheeler Dealers” is a British television series where 2 middle-aged men are restoring cars. Every episode follows the same plot where they first buy a cheap classic car in need of repair. Then bring it to the workshop and demonstrate repair techniques, showcasing the work involved in bringing the car back to sellable condition. And then list the car for sale, detailing the repairs and improvements made.
But lets face it, format is getting aged and fixing cars is a bit niche topic these days. So here is my proposal for the face-lift and synopsis for the pilot episode of …
VPC Dealers
In this episode I’ve inherited an old web service build on common VPC design for its time. Architecture is basicly ok and fit for the purpose. Public and private resources are in their separate subnets. Only the application load balancer (ALB) is exposed to internet and EC2 instances in private subnets are protected from direct internet access. But today we could do better and with little effort, make it run better, cheaper and be more secure. Pick any 3 ;-)
More Secure
Security is job zero, so lets start from there. If you could remove public subnets and not to expose any part of your VPC to internet, it would mitigate the possibility of misconfiguration exposing server or database to internet. As current configuration is using NAT gateways in public subnets as a route to internet you would have to have an alternative route before decommisioning NAT gateways and being able to remove public subnets. Luckily the centralised egress is easy to setup with Transit Gateway. If you don’t already have that in your organisation here is how you can configure a shared route to internet for all your VPCs. And it comes with CDK build too.
Great thing about central egress VPC is it doesn’t affect your developer experience and it gives you a perfect locatation to add egress filtering later as additional layer of security. See Centralized egress in AWS Prescriptive Guidance for best practices of securing egress traffic.
Cheaper
As we managed to replace multiple NAT gateways with TGW attachment it will be cheaper to run. This is especially important when you have 10’s or 100’s VPCs but every VPC counts. At the same time we also removed 2 public elastic IPs (for NAT gateways) and 2 or more public IPs for ALB. As you might remember all public IPv4 addresses cost you. Removing those will keep your FinOps -team happy.
Better
But what happened to public ALB and why I didn’t delete the internet gateway with public subnets?
In November 2024 a new feature Cloudfront VPC Origins was announced. This is was such a big thing there was not just one but also a second announcement of the same feature on that very same day.
You should read the details how to deploy VPC Origin from above blog posts but in short this allows us to serve content via CloudFront from EC2, ALB or NLB in private subnets. An Origin will be deployed as AWS managed ENIs in your subnets.
For this to work you must have an internet gateway attached to VPC but you don’t have to have a route pointing to it. This is just a safety measure preventing building an unallowed access from internet, similar to Global Accellerator -case in the past. There is also the usual small print you should read before deploying.
Obviously adding CloudFront will enchance user experience by cutting down the latency between browser and service. But I don’t think this is the most important aspect of “better” in this case. This setup solves the problem many development teams have with centralized ingress where front-end load balancer is placed into central VPC out of their reach.
I believe this way it is possible to both enable developers to deploy their service independently of network team using any tools and workflows they are familiar with, and at the same time not have route (as routable public IPs) from the internet to every VPC. Keeping both devs and net/sec happy is no small feat!
With Little Effort
All of above would be nice but not very impressive if that would only apply to new VPCs build from ground-up, or require re-deploying all existing resources. Luckily you can apply these changes to existing VPCs with running services with only a short glitch.
- First deploy internal ALB and register the same targets as with original internet facing ALB.
- Configure CloudFront VPC Origin pointing to internal ALB.
- Attach TGW for cental egress into private subnets.
- Change default route of private subnets from NAT GWs to TGW attachment.
- Switch DNS from ALB to CloudFront and wait until there is no more traffic to ALB.
- Delete NAT GWs and attached EIPs. If you suspect you might have out-bound integrations that depend on your IP address, you can keep EIPs for a moment. Just don’t forget to delete them once you are sure you don’t need them.
- Delete ALB, public subnets and public route table(s).
- Done.
Summary
In this episode of VPC Dealers we modernised a classic VPC architecture to run better, cheaper and be more resilient to operator errors. All this without sacrificing developer team independence or having a service break. I’m sure there will be many happy deployments ahead with new improved VPC architecture.
If you liked this, please stay tuned for next episode of VPC Dealers by subscribing RSS-feed.
Until then. Ta-Da!