VMware Fault Tollerance

srothman

Expert Member
Joined
Mar 30, 2010
Messages
2,756
Reaction score
10
Location
Pretoria
One for the VMware experts out there,

I'm trying to understand how VMware handles fault tolerance in terms of node failure.

Scenario:

I have a 4-node cluster. I rip out one of the nodes. What happens to the virtual machines that is/was hosted on that node? Obviously, vMotion didn't happen. Do the VMs move to a different host and then start up? Same as Hyper-V?

What mechanisms exist to allow for fault tolerance? Does this get handled by VMware Fault Tolerance? I assume this is a per-VM setting that gets enabled, the premise being that the VM data is memory-resident in all the nodes at the same time? Or how does it handle memory state? Does it create a second instance of the same VM, i.e. a replica same as Hyper-V?
 
Yeah your VM data should be sitting on shared storage ideally. The ESXi/vSphere hosts are just compute. Usually the vSphere instance is even installed on SD/Flash memory on the node as you need no disks at all if you have a SAN. The other thing to consider is the licenses in place. You need to have VMware Essentials Plus at a minimum to have vMotion functionality.
 
Haven't worked with VMware in a while but my understanding is VMotion is for hardware maintenance and load balancing. For fault tolerance you set the behavior of each VM should one of the host nodes go down. and then VMware starts them up on a new host. It's as if the VM was a physical machine with the power unplugged, dirty shutdown, no memory state saved.
 
https://pubs.vmware.com/vsphere-4-e...sphere.availability.doc_41/c_useha_works.html

VMware HA provides high availability for virtual machines by pooling them and the hosts they reside on into a cluster. Hosts in the cluster are monitored and in the event of a failure, the virtual machines on a failed host are restarted on alternate hosts.

HA/DR/VMotion is not a clustered live-migration memory-aware solution,it's more a cold-standby power-up-when-it-fails solution
 
OK, so this is more of an academic question than anything else. I work extensively with Hyper-V and wanted to undrstand how VMware might handle it, as there isn't such a feature natively available in Hyper-V.

Why didn't vMotion happen?
Which version of Vmware are you using?
Are you using a SAN?

We are using SAN, but I am not referring to vMotion, of which the alternative in Hyper-V is live migration. Live migration/vMotion is a controlled failover of VMs to another host. I am talking about an unexpected failure of a host.

HA/DR/VMotion is not a clustered live-migration memory-aware solution,it's more a cold-standby power-up-when-it-fails solution

Exactly this. From what I gather VMware's Fault Tolerance feature is still very limited in functionality and scalability, and not natively available as part of ESXi, but rather an additional feature that gets licensed with vSphere.
 
https://www.vmware.com/files/pdf/techpaper/VMware-vSphere6-FT-arch-perf.pdf

vSphere FT enables a virtual machine to survive a physical server failure by creating an exact replica virtual
machine on another host
that can take over at the time of failure. During failover, the transition of a vSphere FT virtual machine from one physical server to another is similar to a migration using vSphere vMotion®: it is completely seamless. That means there is zero downtime, zero data loss, zero connection loss, continuous service availability, and complete transaction integrity.
 
Top
Sign up to the MyBroadband newsletter
X