How to disconnect or isolate an instance from VPC ? Typically you have the opposite problem. How to get 2 services talking each other and figure out what is blocking the connectivity. But what if you need to disconnect compromised instance from VPC and terminate all established connections !?!

Let’s take this as a challenge and see if there can be found more effective way of isolating compromised instance than blocking it with security group. Or if security group would actually work for majority of cases and the problem is just with specific corner cases when existing connections are not effected by rule changes …

Blocking with security group

First thing is to replace the security group attached to ENI with one that doesn’t have any rules, ie. all connections are blocked. This will take care of any new connections attempts. However there can be existing connections that won’t get blocked because security groups are statefull and once the connection is allowed it won’t be checked again for the next packet.

Blocking with NACL

NACLs are stateless and changes will have immediate affect for present and future connections. Unfortunately NACLs are attached to subnets, so blocking traffic will affect the whole subnet which might do more harm than good. But what if you would use NACLs to drop connections by denying traffic only for a short moment, and then security group would take care of blocking reconnect attempts ?

Unfortunately it doens’t work like that. NACL will just drop packets but connection itself will remain and continue working after blocking NACL rules are removed. I believe you would have to keep NACL blocking for 10 minutes to get a connection timeout from security group.

However if the network architecture allows you to use NACL to disconnect instance, without causing too much collateral damage, it will block both established and new connections.

Routing to a black hole

If security group or NACL isn’t the optimal solution, maybe you could route traffic of compromised instance to a black hole and that would isolate it.

VPC used to have local route that couldn’t be modified, or routes with more specific destination than VPC CIDR be added. This changed 2021 when VPC routing got new features to support for north-south and east-west traffic inspection. However there are still limitations for routes within VPC CIDR.

  • When destination is equal or more specific than VPC CIDR(s), target must be ENI or instance.
  • Route destination must match one (or multiple) VPC subnet CIDRs.

First one isn’t a problem. You can create an orphan ENI and use it as target for black hole route. Second one is causing problems, because it allows adding only routes between subnets. You would need to specify a route for a single IP address. Unfortunately any changes to routing would affect atleast the whole subnet which isn’t any better solution than blocking traffic with NACLs :-(

Security group connection tracking

Security group connection tracking keeps track of traffic in and out, ie. that is what makes security groups statefull. Side-effect of it is, when you change rules, tracked connections are not interrupted. But since not all connections are tracked maybe there is a way …

If a security group rule permits TCP or UDP flows for all traffic (0.0.0.0/0 or ::/0) and there is a corresponding rule in the other direction that permits all response traffic (0.0.0.0/0 or ::/0) for all ports (0-65535), then that flow of traffic is not tracked, unless it is part of an automatically tracked connection. The response traffic for an untracked flow is allowed based on the inbound or outbound rule that permits the response traffic, not based on tracking information.

It looks like if you only limit inbound and allow all outbound traffic, none of the connections would be tracked and all changes to security group rules would have immediate effect! This seems to be also very well inline with simple test case where I SSHed from one instance to another within VPC. When I had wide open outbound rules, changes to inbound rules had immediate effect blocking the connection.

But what are those automatically tracked connections?

Connections made through the following are automatically tracked, even if the security group configuration does not otherwise require tracking. These connections must be tracked to ensure symmetric routing, as there could be multiple valid reply paths.

  • Egress-only internet gateways
  • Gateway Load Balancers
  • Global Accelerator accelerators
  • NAT gateways
  • Network Firewall firewall endpoints
  • Network Load Balancers
  • AWS PrivateLink (interface VPC endpoints)
  • Transit gateway attachments

Unfortunately that is the list of almost all services you use for connecting outside of VPC, ie. all important connections to/from compromised instances would likely to through one of these and therefore be automatically tracked regardless security group rules. Looks like my test case was too simple and allowing all outbound traffic won’t leave all connections untracked :-(

Other dead-ends

I was also thinking other options like moving the instance into an isolated subnet but as you can only attach new network interfaces but never detach the primary one, this wouldn’t be helpful.

Another approach would be going all-in and deploy network firewall between all communications, but east-west -inspection can only be inserted between subnets. Adding a route to firewall for local subnet CIDR would cause a loop when packets are getting back from firewall.

The ultimate disconnection tool is stopping the instance. Downside is you might loose some forensic information when instance is shutdown. Some instances can be hibernated but while it will retain content of memory, it is just pushing the problem into future as you can not do much with hibernated instance.

Was it worth it?

So, in the end it seems that security group is in many cases the most precise and easiest to use tool for isolating instances from VPC. For traffic between instances this is all that is needed when connections are untracked. If security group ins’t enough, NACL can disconnect the whole subnet on one go, but you should be careful not to cause collateral damage.

I failed the challenge, but wanted to do this post anyway as I did learn some new things about AWS networking. I would say the failure was more educating than what success could have been :-)

Resources

Here are links to Hacking The Cloud -blog posts about connection tracking and AWS solution for automating isolation and collecting forensics info from compromised instances if you want to dive deeper into the topic.