When deploying NSX there certain are points during deployment that lend themselves to spending a couple of extra minutes to double checking configuration and connectivity before moving on to the next deployment step.
A couple of quick minutes here and there can save hours of troubleshooting later on.
Overview
What to Test and When
Typically during a deployment, there are three stages where a quick ping test comes handy prior to moving on with the NSX deployment. These are:
-
Host TEP Connectivity - Typically just after the ESXi hosts have prepared for NSX - Confirm host TEPs can ping the gateway on the physical network with a recommended MTU of >=1700 bytes.
-
Edge TEP Connectivity - Typically just after the edge nodes have been deployed - Confirm edge TEPs can ping the gateway on the physical network with a recommended MTU of >=1700 bytes.
-
Tier-0 Uplink Connectivity - Once the Tier-0 has been configured with a External (uplink) interface(s). These interfaces carry North/South traffic out/in to the NSX environment. Recommended MTU for these is 1500 bytes.
All three networks need to be routable.
NSX Gateway Components
Before we get into the testing we need to appreciate the make up of an NSX gateway:
A gateway can be either a Tier-0 or a Tier-1 gateway, depending on the design requirements:
- A Tier-0 gateway provides north-south connectivity. In a single-tier topology, the Tier-0 gateway also provides east-west connectivity.
- A Tier-1 gateway provides east-west connectivity.
A Tier-1 and a Tier-0 gateway can have Distributed Router (DR) and Service Router (SR) components.
A Distributed Router (DR) has the following features:
- Provides basic packet-forwarding functionalities
- Spans all transport nodes (host and edge transport nodes)
- Runs as a kernel module in the ESXi hypervisor
- Provides distributed routing functionality
- Provides first-hop routing for workloads
A Service Router (SR) has the following features:
- Provides north-south routing
- Provides centralized services such as NAT and load balancing
- Required for the uplinks to external networks
- Deployed in edge transport nodes
A Distributed Router is always created when creating a gateway.
A Service Router is automatically created on the edge node when you configure the gateway with an edge cluster.
Maximum Transmission Unit (MTU)
OK, with DRs and SRs understood, we need to briefly talk about MTU. Given that our NSX Tunnel Endpoints (TEPs) encapsulate our standard 1500 byte MTU packets using the Geneve protocol, we need to ensure that our overlay networks are able to handle packets larger than 1500 bytes without fragmenting the packet.
The VMware NSX Reference Design Guide v3.2 (pdf). Page 236 onwards covers our situation (emphasis mine):
A minimum required MTU is 1600. However, MTU of 1700 bytes is recommended to address the whole possibility of a variety of functions and future proof the environment for an expanding Geneve header.
So we also need to test MTU with our ping testing.
Host TEP Connectivity
From the GUI, these are the interfaces we are interested in:
OK, let’s SSH to our first ESXi host and list it’s VMKernel network interfaces:
From the below we can see that our host has two NSX VMKernel interfaces (denoted by them being on the vxlan NetStack (last column):
They also have an MTU configured of 1700 (eighth column).
Lets construct our command to ping the host TEP physical network gateway and check for an MTU of 1700. (See VMware KB 1003728 for syntax).
The command looks like this:
Nice. That is working perfectly.
Edge TEP Connectivity
From the GUI, these are the interfaces we are interested in:
Given that our edges can span tier-0 and tier-1 gateways and that each of those gateways can have DR and SR components each doing their own job, we need to ensure that not only are we using the correct gateway but the correct router too.
Let’s SSH to our first edge and take a look using the following command:
As you can see, I have SRs and DRs for my Site-A Tier-0 and Tier-1 gateways configured.
When testing Edge TEP connectivity, we are interested in the TUNNEL router as that deals with our TEP traffic.
Using the TUNNEL router UUID, let’s take a look at it’s forwarding table:
Yep we can see that the tunnel router is using our configured Edge TEP gateway of 192.168.13.1.
Lets construct our command to ping the host TEP physical network gateway and check for an MTU of 1700. (See NSX-T Command-Line Interface Reference for syntax).
Which when run:
Nice. That is also working perfectly.
Tier-0 Uplink Connectivity
Finally, lets check North bound connectivity out of the NSX environment to the gateway out on the physical network.
From the GUI (Tier-0 > Interfaces > External and Service Interfaces), these are the interfaces we are interested in:
Again, we can check that from our edges. Let’s remind ourselves of the logical routers present on our edge:
As we are talking North/South traffic we need to be using the Service Router context of our Tier-0 router. In the case above, that’s VRF ID 1.
A reminder of the command, remember we are not so much worried about the larger MTU value here, 1500 is fine. (Again, see NSX-T Command-Line Interface Reference for syntax):
Conclusion and Wrap Up
As Sean Connery once said:
Yes I’m old. Referencing a 33 year old movie at this point (2023) is …yeah… For those that have missed out or simply just forgotten: One Ping Only - YouTube.
Not quite as badass as Sean Connery (RIP) or Sam Neill, using five pings instead of one ping only, but hey I’m OK with that!
-Chris