VMware VSA Export VMs from broken appliances

So the title sounds disastrous and its a fact that I had recently faced the issue at one of our customers where one of the VSA ESXi nodes failed due to hardware issues and the second VSA node didn’t export the replica datastore and instead it went into maintenance mode:

Even when we got the faulty node up the status of the VSA appliances remained the same and both were in maintenance mode, whenever you try to exit them from maintenance mode (using both GUI and CLI) the VSA appliances will reboot around 5 times and will get themselves back into maintenance mode.

After calling VMware support, it seems that the database on both VSA appliances was corrupted so the logical question was to ask them if there is away to just export the NFS datastore so that we can retrieve the virtual machines, finally after jumping through a couple of engineers someone from the escalation team gave me what I want, and I thought I would share it here because it WILL help you a lot in case you got stuck in such a situation.

Lets do this:

  1. First log-on to any of the ESXi hosts and power off one of the VSA appliances (only one appliance should be present).
  2. Open the console and logon to that VSA appliance using svaadmin:svapass.
  3. Change into super user: sudo su –
  4. Type: vgchange -ay ( you should get something like [2 logical volume(s) in volume group “SVAVolGroup00” now active]).
  5. Type: ls /dev/SVAVolGroup00/ (you should get something like [V0000000D0000001  V0000001D0000000] where these are the split datastore).
  6. MDADM[1]: This is the Linux utility to manage the RAID array as such type (the result will include MD_LEVEL, MD_DEVICES, MD_NAME, MD_UUID, MD_UPDATE_TIME, MD_DEV_UUID, MD_EVENTS):
    1. mdadm – -examine – -export /dev/SVAVolGroup00/V0000000D0000001
    2. mdadm – -examine – -export /dev/SVAVolGroup00/V0000001D0000000
  7. Now we need to stop the firewall:
    1. /etc/init.d/SuSEfirewall2_setup stop
    2. /etc/init.d/SuSEfirewall2_init stop
  8. MDADM[2] (result will be like [mdadm: /dev/md0 has been started with 1/2 drive (out of 2)]:
    1. mdadm – -assemble – -run – -force /dev/md0 /dev/SVAVolGroup00/V0000000D0000001
    2. mdadm – -assemble – -run – -force /dev/md0 /dev/SVAVolGroup00/V0000001D0000000
  9.  Start the NFS server service: service nfsserver start (output will be something like [Starting kernel based NFS server: idmapd mountd statd nfsd sm-notify done]).
  10. Mounting the volumes:
    1. mkdir /exports/V0000000D0000000
    2. mkdir /exports/V0000001D0000001
    3. mount -rw -o barrier=0,data=ordered,nodelalloc,errors=panic /dev/md0 /exports/V0000000D0000000
    4. mount -rw -o barrier=0,data=ordered,nodelalloc,errors=panic /dev/md1 /exports/V0000001D0000001
    5. exportfs -o rw,no_root_squash,sync,no_all_squash,insecure :/exports/V0000000D0000000
    6. exportfs -o rw,no_root_squash,sync,no_all_squash,insecure :/exports/V0000001D0000001
    7. After using the exportfs command you should get something like [exportfs: No host name given with /exports/V0000000D0000000 …etc… to avoid warning] this means that the volumes were exported successfully.
  11. Verify volumes before adding them to the ESXi hosts:
    1. Type: ls -als /exports/V0000000D0000000 and ls -als /exports/V0000000D0000000  and you should see the virtual machines residing in them.
  12. Finally you can go to the host configuration -> storage -> add the new NFS datastores /exports/V0000000D0000000 and /exports/V0000000D0000000 using the IP address of the VSA appliance.
  13. Make sure you move the virtual machines out of the NFS datasore and keep them safe because you will need to delete the VSA cluster and start a new one from scratch.

Hopefully this will send you back home early to sleep rather than staying up all night eating your fingernails and drinking caffeine ;-).

(Abdullah)^2

9759 Total Views 15 Views Today

Abdullah

Knowledge is limitless.

12 Responses

  1. Dennis says:

    Hi,

    you saved my butt!

    Thank you!

    Dennis

  2. Phuoc says:

    Thanks for help ,i hope to make a friend with you.

  3. DaGuru says:

    This is excellent, so much digging without any simple answer except yours. If you could fix the few type-o’s it would help others that are following the instructions to the letter (as i was). I don’t want this to come as a complaint as i am super thrilled by the solution you provided. As I don’t use Linux often and it took two rounds through the instructions to get it correct. Things like the — instead of – in mdadm -–examine -–export, also mdadm –-assemble –-run –-force /dev/md0 with the next line being mdadm –-assemble –-run –-force /dev/md1, not 0
    Thank You

    • doOdzZZ says:

      Glad I could be of help, didn’t notice those dashes were corrupted. Most probably its the WP text editor playing smart, I will fix them this evening.

      Thank you :-).

  4. Jan says:

    Hello,

    guys thnx for the info but im stuck with the mdadm –assembl –run –force /dev/md0 /dev/SVAVolGroup00/…… command.

    mdadm: Cannot open device /dev/SVAVolGroup00/V…………: Device or resource busy
    mdadm: /dev/SVAVolgroup00/V……. has no superblock – assembly aborted

    Cant some help with this error

  5. SF says:

    Don’t forget to stop VSA Cluster Service before. ;-)

    • doOdzZZ says:

      It doesn’t matter actually because at that point the VSA won’t respond to any of the cluster service commands, and you will always need to have one of the VSA appliances shutdown.

  6. Aaron says:

    When I run the mdadm – -assemble – -run – -force /dev/md0 /dev/SVAVolGroup00/V0000000D0000001 commands I get the device or resource busy result. Is there a work around?

    • doOdzZZ says:

      Hello,

      The second VSA must be shutdown, also this might be the result of the other half of the datastore being mounted. Can you confirm either VSA-0 or VSA-1 datastores being mounted and accessible?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.