I’ll continue with network topics with the new episode of VPC Dealers ;-)

If you missed the first episode you can find it here.

To improve your mean time to innocence (MTTI)

Those of you who have been working as network admin or someone who’s work depends on network, you know when things stop working, network is always first to blame until proven innocent. Legal principle of one presumed innocence until prove guilty just doesn’t apply here. To get to correct root cause it is important to quickly rule out, or find evidence of, network problems. This is where AWS Reachability Analyzer can help network admins but also let anyone to verify basic requirements for connectivity from point A to B (and back again) are in-place.

What is AWS Reachability Analyzer?

AWS Reachability Analyzer is a network diagnostic service that helps to analyze and troubleshoot network connectivity within your VPC environment. It is designed to verify whether there is path between 2 points in network and identify potential connectivity blockers on the path. Hop-by-hop analysis and pointing out configuration issues can help also non-network specialist to fix connectivity issues. It isn’t actually a new service but has been around since Dec 2020.

Since then it has got more features such as ability to analyze path across multiple VPC connected via AWS Transit Gateway and most recently ability exclude resources that should not part of the path. The later got me interested again to take another look what it would be capable of.

Analysis is performed by looking at the configuration of AWS network resources. Reachability Analyzer doesn’t sent any packets. This makes it safe to use in any environment but can also limit it’s ability test paths traveling through some networks. See “Small print” -chapter later in this post.

Is there a path from A to B?

To demonstrate how Reachability Analyzer works I wrote a template you can use to create VPC similar to below diagram. It shows just a single availability zone but you can extend it using AvailabilityZones parameter and list extra AZs as lower-case letters separated by commas. For initial deployment you shoud leave BrokenNACL to false. We will get back to that later.

Template will not create any EC2 instances but use ENIs as source or target in network analysed paths. From analysis point of this is the same as EC2 instances and other resources are seen as one or multiple network interface in VPC. ENIs itself are free of charge but you would pay for public IPs and analysis.

Template also creates 3 paths you can test and see how the analysis works.

  • Path from Public ENI to Private ENI.
  • Path from Private ENI to Internet (8.8.8.8).
  • Path from Public ENI to Internet (8.8.8.8).

If you deploy the template for multiple AZs then there is going to be a similar set of 3 paths created for each AZ. You will find Reachability Analyzer in AWS Network Manager in AWS console. There you can select one of the paths and click Analyze path from Actions -menu as shown below.

Let’s analyze first the path from private subnet to Internet. Here is how the results look like. Besides just telling you this path works it is worth while reading through as it also shows you step-by-step what are all things you need to have to make that happen. Or if there is a misconfiguration in any of those it would block the access. And when you expand “Outbound header” you see how the packet source address changes when it goes out from Internet Gateway, not at the NAT gateway as you might first think.

Let’s try next path from public subnet to Internet. This should be more simple as there is no NAT gateway in the path but route is directly from ENI to Internet Gateway. If you look how public subnets are configured, you can see there is MapPublicIpOnLaunch: true. This does assign a public IP for things launched to subnet and you would expect the route from instance to Internet work. Right?

Type: AWS::EC2::Subnet
    Properties:
        VpcId: !Ref VPC
        AvailabilityZone: !Sub ${AWS::Region}${X}
        MapPublicIpOnLaunch: true       

But it fails and says there is no public IP associated for egress ?!? To understand what happened here you would have to read the VPC documentation rather carefully, especially the part that says

Therefore, when you launch an instance into a subnet that has this attribute enabled, a public IP address is assigned to the primary network interface that’s created for the instance. A public IP address is mapped to the primary private IP address through network address translation (NAT).

But since we created just stand-a-lone ENIs, these are not considered as primary interfaces and thereforee don’t get a public IPs. The NAT above text refers is not your NAT Gateway but Internet Gateway that is a NAT device too. I will leave fixing this as an excersice for the reader ;-)

Now you can run analysis for the last remaining path from private ENI to public ENI to verify you VPC routing and security groups are properly configured. Once you have verified this path is working, go back to Cloudformation console and change this parameter from false to true to add misconfigured NACL to VPC.

Then re-run analysis for private to public path to see how analysis finds a common mis-configuration where NACL allows only one-way traffic, and because it is stateless, return packets are not automatically allowed as it would be for security groups.

Advanced use-cases

Since the introduction Reachability Analyzer has gained many new skills. It can analyze paths through Transit Gateway and multiple AWS accounts as shown in this demo. Another feature that caught my eye was ability to exclude components that can be in the path. Being able to say certain component can not be part of the path lets you to find paths that you didn’t want to have, but were accidentially created due to misconfiguration.

In my simple setup one such case could be testing path from private ENI to Internet with exclusion of NAT Gateway in the same AZ. If this analysis fails, you know your routing is correctly configured. If it succeeds, e.g. because you added route from private subnet to internet gateway (and attached public IP to private ENI) or it goes via NAT gateway in another AZ, you know there is something wrong. With this feature you can now not only test if you have a connectivity between A and B, but also test various compliancy scenarios where you want traffic being blocked.

So it is good to remember not all analysis that have failed mean there is something broken.

Small print

Reachability Analyzer supports most of common AWS network resources, but not all. You can find the list of supported resources and scenarios from documentation.

Service is priced per analysis run ($0.10/analysis). If this is cheap or expensive depends on your viewpoint. If you think this can save couple of hours debugging time when your production systems are down, it is very cheap. But if you run continous analysis of complex multi-vpc network it can be very expesive solution to feed data to network operations center dashboard showing everything is working.

And if you are using Amazon Q Developer to help you connectivity issues, it is actually using Reachability Analyzer behind the scenes.

Summary

AWS Reachability Analyzer does a good job in analyzing network paths. It all happens by analyzing configuration and not sending any packets. This can be a good thing, separating network from application level issues, but one should understand it doesn’t actually test if your service is functioning.

Pre-configured paths can help non-network specialists to check most common paths to either rule out or get insight what might be the problem in network very quickly, without support from network team. Another tool to keep in mind is Q Developer that can be more human friendly interface for debugging network issues, using Reachability Analyzer.