Chris Hall bio photo

Chris Hall

Principal Technical Consultant

PolarCloudsUK Chris LinkedIn Github
Chris Hall Nutanix Certified Master - Multicloud Infrastructure 6 Chris Hall VMware vExpert 2024 Chris Hall VMware vExpert NSX 2023 Chris Hall Nutanix Certified Professional - Multicloud Infrastructure 6 Chris Hall Nutanix Certified Professional - Unified Storage 6 Chris Hall VMware vExpert 2023 Chris Hall VMware vExpert 2022

Sawing Off the Branch

Overview

Question: When is the worst time to find the NSX Distributed Firewall System Excluded VM list is empty?

Hint:

Setting Default Rules

Yep that’s it. When running NSX manager VM(s) on a cluster prepared for NSX and then setting the default NSX distributed firewall rules to Drop or Reject.

And that’s when all hell doesn’t break loose. Quite the opposite in fact.

Oh F\/£k !!!!!1!!

NSX Manager Unknown

Yeah… a whole lot of nothing going on.

No ping!

Yep. My NSX Manager is completely firewalled from the rest of my network and for all intents and purposes offline.

Choice words were spoken… Very choice words indeed…

Background

Whilst writing an upcoming post on deploying NSX using the vCenter plug-in, I had my vSphere 8.0 and NSX 4.0.1.1 environment all in and spinning nicely. Final thing to talk about were the very bottom three rules of the application category - that is the very last three rules to be evaluated by the distributed firewall - which are set by default to allow any traffic from any source to any destination.

Yep, just set those to drop (or reject) and publish.

Aaaand that is where we find ourselves; a firewalled NSX manager that we are unable to get at via the network to turn off said firewall.

Recovery

So how do we recover from this situation?

How can we disable the NSX distributed firewall without using the NSX web GUI?

From the NSX command line interface (CLI) perhaps?

What is the “magic” command to allow access again?

After much googling and CLI bashing, I discovered this command to be run on the ESXi server hosting my NSX Manager VM (this is a lab so I only have one manager - production environments should have three) to recover access to NSX Manager again:

vsipioctl clearallfilters -Override

Which when run on the ESXi host in question, looks like this (I suggest reading discussion points below BEFORE running):

vsipioctl clearallfilters -Override

I was then able to access my NSX Manager and change my default rules back.

For good measure after recovery I rebooted my ESXi host too.

Recovery - Discussion Points

Some points to consider:

  1. Migrate (vMotion) all VMs other than the affected NSX Manager VM(s) away from the ESXi host that the command is run on. This command will unilaterally remove ALL firewall rules from ALL VMs on the ESXi host it is run on.

  2. Searching for the vsipioctl clearallfilters command in the VMware Knowledge Base results in just one hit: vNICs are disconnected after NSX is uninstalled from ESXi (74556). At the time of writing this post, the related products section of this KB article lists only NSX for vSphere (aka NSX-v), not NSX-T or NSX as it is now known as.

  3. After rebooting the ESXi host, you may find that firewall rules are still not being applied to VMs running on the . To resolve disconnect, save, reconnect and save each network connection for each VM affected:

Network Adapter

Temporarily moving VMs to a different VDS port group and back may help too. Anything to get the ESXi host to rebuild it’s dvfilter table.

Making Sure This Does Not Happen Again

How can we make sure that this doesn’t happen again?

By making use of the user configured Distributed Firewall Exclusion List:

Select Distributed Firewall Exclusion List

Whilst we are here, a quick look in the System Excluded VMs list, shows nothing going on as suspected and experienced:

System Excluded VMs List

Back to User Excluded Groups, let’s add, name and save our DFW-Excluded group:

Save Exclusion List

Navigate to Inventory > Groups and edit the newly created DFW-Excluded group. Let’s set some members:

Edit DFW-Excluded Group

I’ll add my NSX Manager by VM name:

Add NSX Manager VM to DFW-Excluded

As it’s external to the NSX prepared environment (it’s running on another ESXi host), I’ll add my vCenter server by IP as well for good measure:

Add vCenter IP to DFW-Excluded

Save (twice) and lets look at the members of our DFW-Excluded group. First VMs:

DFW-Excluded Member VMs

Looks good. Checking IP addresses, we can see both the IPs of our vCenter and our NSX Manager listed (Remember production environments should have three NSX Managers deployed - this is just a lab):

DFW-Excluded Member IPs

Making Sure This Does Not Happen Again - Testing

Let’s pop in a distributed firewall rule to drop ping from any source to any destination:

ICMP any/any Drop

Yep, we can still ping our NSX Manager:

ICMP Test

Finally let’s set the default rules to Drop again, and publish:

Set Default Rules to Drop Again

A refresh of the browser to check that the vCenter NSX plug-in reloads…

All Good

Yep we are all good! Phew!

Conclusion and Wrap Up

Well, a post that I did not think I would need to write. That said, a good post to have in the back pocket should I/you need to stop the NSX distributed firewall in a hurry in the future.

Certainly reading through the Manage a Firewall Exclusion List section of the NSX documentation on the VMware website:

NSX has system excluded virtual machines, and user excluded groups. NSX Manager and NSX Edge node virtual machines are automatically added to the read-only the System Excluded VMs list.

Not in an NSX 4.0.1.1 vSphere 8.0 plug-in environment they are not!

Finally as perhaps proof of a small world, take a look at the third command listed towards the bottom of this post: NSX-T Nested ESXi Host Preparation Failed or Timed Out.

It would seem that I had the answer all along, yet I didn’t know it.

…story of my life… :wink:

-Chris