HPE Gen9, ESXi 7.0 and NSX-T 3.0: For the love of the homelab!
In my home-lab, I always keep my physical hosts up to date (to the latest) and do versioning in my different nested environments.
Two days ago I upgraded NSX-T to 3.0 and the upgrade went smoothly without any issues, edge nodes, hosts and manager all went fine through the coordinator without a single error, all that I needed to do was add an additional 100GB as a secondary disk to the manager and all went well.
Last night, I had started planning to upgrade the ESXi hosts and I always check vendor support for OEM servers, in my case it is HPE, I went to the Support and Certification Matrix and sadly I found no support for Gen9 servers.
What to do? What to do? What to do? The answer is pretty simple, just do it and let us see what happens, this is the only place where you can safely say “YOU CAN TRY THIS AT HOME %)”.
When I attempted upgrading via HPE’s offline ZIP, it got cranky about a couple of VIBs:
- QLC_bootbank_qfle3f
- HPE_bootbank_scsi-hpdsa
- QLC_bootbank_qedi
- QLC_bootbank_qedf
I didn’t have any devices using those specific versions that were mentioned and as such I removed them “esxcli software vib remove –vibname qedf –vibname qedi –vibname qfle3f –vibname scsi-hpdsa“.
Then installed the ran the update “esxcli software vib update -d /<datastore path>/VMware_ESXi_7.0.0_15843807_HPE_700.0.0.10.5.0.108_April2020_depot.zip” and voila the upgrade was successful and obviously required a reboot.
After the first server rebooted, I checked it status and everything was fine, so I went and did the other two servers in parallel and both got back online fine.
My mistake was that I didn’t check on NSX-T on the spot, I was focused on the ESXi hosts and their ability to get back online without a purple screen or any issues with the drivers that I had missed looking at their status from an NSX-T perspective, and this is more tied to that fact that the management network is separate from the data/applications network.
The symptoms that made me go back to NSX-T were:
- I powered on the first VM and I didn’t have any reachability.
- The vMotion network comes from an NSX-T segment as the uplinks reside on 10Gbps and it made sense to have vMotion there and when I attempted to vMotion a VM I always ended up with an error “vMotion fails with the error: A general system error occurred. Invalid fault (1014371)” and after some research, it points out that there is an issue with the network.
- The 10Gbps NFS network is also there but access to the datastore was fine.
Before again looking at NSX-T, I tried to do the same test with a VLAN backed segment rather than a GENEVE backed segment and vMotion worked fine there without any issues, and there is your AHA! moment where I thought: “So this must be related to an issue with the NVDS perhaps, so lets take a look at NSX-T”.
The moments I went to the NSX-T Manager and went to the check on the transport nodes status, I found out that they were all down, yes all 3 servers, they were having issues apparently, however when I go and do check on the status of the transport node I got everything up “Manager Connectivity, Controller Connectivity, PNIC/Bond Status and Tunnel Status”.
What to do? What to do? What to do? I did some digging in the logs, however, I couldn’t get my hands on something useful or (most likely) I didn’t grasp the correct log file, event/error.
Alas, it falls down to going non-production [I hope] rogue ;-)…
- I removed the VMKernel interfaces bound to the different segments.
- Dragged the host out of the cluster.
- Removed the NSX-T binaries “REMOVE NSX” and used force the deletion option.
- Dragged the host back to the cluster and waited for it to get prepared, waited until the preparation was complete and now the status of the NSX Configuration is “Success”.
- Restored the VMKernel configuration and powered on a virtual machine and got connectivity successfully.
- I went through the same steps for the remaining hosts and results were good.
Lessons learned from this, alhamdulillah that I am now on vSphere 7 and I do not need to buy new hardware %).
Stay safe, stay strong, do what you do best and helpout as much as you can =).
Great to know that it works in HPE GEN9 ,
Where in the HPE vib depot to you get that zip bundle? For instance if I need the latest 6.7 where do find it?