Hyper -V Alarm on unmerged files' number or a way to limit such files
There had been cases with one of the customers where all the VMs hosted on Hyper-V cluster used to hung\stop responding, causing bigger outage. This happened in every three month. To resolve this we had to take reboot of all the cluster nodes. Worked on the root cause and found below point:
Once the VM backup is completed it should merge the .avhd to .vhd. But this was not happening in many VMs due to backup. The backup was continuously trying to access the vhd\vhdx file but could not complete it due to interruption\disconnection. So there were many heavily sized snapshot files were created and remained there without deletion on the cluster shared storage. It eventually affected server and cluster performance.
Just a suggestion to Microsoft if in case such snapshots are created due to the failure on backup part - If we can get an alarm on their number or a way to limit such unmerged files, that will be great. As that will avoid major outages.