VDDK errors with a human face



The beauty and horror of VDDK errors is that, on the one hand, it is absolutely clear where it broke, and on the other hand, it is completely incomprehensible why and how to fix it now. It's like the RPC call function failed in the Windows world.



Although not everything is so terrible, of course. Some errors have very specific causes and treatments. And some - a long-known list of the most common causes and options for correcting them.



Our Veeam Technical Support, of course, accumulates such knowledge, and today we will take a look at their entries. Therefore, it is with great pleasure that I present to you the top most common VDDK errors and methods for eliminating them.

 

VDDK errors. What is it and how are they obtained?



 As you might guess from the name, these are some kind of problems at the level of the VDDK Api (Virtual Disk Development Kit) - the best way to interact with the vSphere infrastructure. It doesn't matter if it is a separate ESXi host or a sprawling vCenter, but if we need to write or read something from our infrastructure, the best way to do this is the free VDDK.



To simplify as much as possible, this interaction looks like this: the Veeam server wants, for example, to read something from the host (or write) and sends it a request. A read call is created indicating from which disk, how much you want to read, from which offset and to which buffer in memory. Or write, similarly, from the specified buffer. It's simple.



But this is in a perfect world. 



In real life, sometimes errors occur along the way of this simple algorithm, due to which it is impossible to complete the request. And instead of the expected response, an error number comes to us, which is carefully recorded in the logs.



 Today we will talk about the most common such mistakes.

 

Important disclaimer!

 

Not sure - don't! Don't press and don't touch anything at all! Calling or writing to Veeam support is always better than experimenting with your product. Fortunately, our support is Russian-speaking and extremely technical.



If you have the slightest doubt, call and ask: "I have such a problem, I found this solution on the network, will it help me to solve it?" - normal and correct. What is not normal and not right is, being not sure of your actions, do a lot of things, and then ask to restore everything from the ruins in five minutes, and so that nothing is lost.



Yes, we, of course, will help in this case, but the best battle is the one that never happened. Therefore, always try to critically evaluate your actions, and all the big uptime.

 

VDDK error 1: Unknown error



In fact, we have a whole HF article on this mistake . And, as it says, most often this error occurs if you have too many performance counters installed - and download a patch from VMware that will fix everything for you.



On the one hand, there is even nothing to comment on. Here's the problem, here's a description (even if it's not very clear), and, most importantly, here's a link to the medicine. However, not all so simple. According to our observations, this error can occur not only because of a boring problem with counters, but also because of:



  1. VMDK . , , . — — . , . , , .

  2. datastore. . , .

  3. HBA . , . . ? 

  4. , : ESXi vCenter.



 Well, well, I’ve caught up with it, you say. And then what? How to understand that it is time to urgently run for new discs - or is it enough to put on a patch and exhale?



And I’ll answer you - keep a set of simple tests that will help you make the right decision if something happens.



  • We launch Storage vMotion or simply clone the suspicious machine to another datastore, and then try to start the backup. If the cloning fails, there is definitely a problem somewhere in the disk subsystem. Paranoia mode to the maximum - and check everything from disks to controllers.



    If it was cloned and saved successfully, it means that the VMDK was damaged, because during cloning, VMware recreates its contents, and now there are definitely no errors there.   

  • , . , . « — » .

  • , , , — VMware.

  • , . , . 



VDDK error 2: Value: 0x0000000000000002 



Almost always goes hand in hand with VDDK error 1. According to our statistics, the appearance of an error is usually associated with certain versions of the vCenter / ESXi bundle, so the best advice here is to upgrade to at least version 6.7. And better and 7.0.



If it doesn't help, then go to plan B. 



The error itself appears when the ESXi host runs out of memory allocated for the NFC read buffer. By default, Veeam operates in asynchronous NBD / NFC read mode, which under normal conditions may require expanding this buffer. But this does not always happen. Therefore, to disable this mode, there is a special key:



Name: VMwareDisableAsyncIo
Path: HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication
Type: REG_DWORD
Value: 1


After creating it, you need to restart Veeam Backup Service and be prepared for performance that has sagged by about 10%.



Another option is to log in from the host side and restart management agents:



/etc/init.d/hostd restart
/etc/init.d/vpxa restart


The procedure is described in detail in the KB from VMware , so we will not rewrite it.



And a standard set of options that will not be superfluous to sort out during the diagnostic process:



  • Migrate machines with errors to another host.
  • Try another Transport mode - HotAdd with virtual proxy or DirectSAN.


VDDK error 3: One of the parameters is invalid



 An error that almost always happens when using the Virtual Appliance mode (aka HotAdd mode).



There is nothing special to tell here, I will just give links to our two KBs, where many options are described, and even if you immediately come to support, you will be asked to do everything written there.



KB1218 - General description of possible problems and methods of their elimination.



KB1332 - If your Veeam server works as a proxy for HotAdd mode

 

VDDK error 13: You do not have access rights to this file



And for this case we have KB2008 . Yes, there are many options for eliminating this problem, but such a mistake. It is almost impossible to say unequivocally what exactly happened in your case, so you need to take and iterate over the entire list. 



What I would like to say additionally. Be very careful with the Additional Troubleshooting section. Yes, there are written, perhaps too obvious for many things. But even such platitudes elude the most professional professionals. There are often cases when, after a week, trying to solve everything on their own, they come to the support only to find out that they have not read the list of technical requirements carefully, or something like that. And it's a shame and a pity for the time spent.



And two tips for all time:



  • Veeam proxy , UUID . - , . , , . 
  • ( — ), , VDDK .
 

 VDDK error 18000: Cannot connect to the host 



In most cases, the fault for this error lies with a bug in the VDDK itself. Specifically, the gvmomi.dll library is to blame. And he shows himself only under heavy load. For example, when many machines are backed up in parallel, one of the functions becomes 0, and the library may collapse. And then everything else falls.



Such is the sad story. 



But the worst thing in this story is that it is impossible to accurately reproduce the conditions of the bug. This is what testers call floating bugs. Therefore, it is impossible to say exactly how many parallel machines are causing the crash.



However, according to the official release notesthis bug has been completely fixed. So the right way out is to update your host. But if for some reason it is impossible to do this, the only way we can help is to advise you to reduce the number of machines processed simultaneously.



No other way.



 

VDDK error 14008: The specified server could not be contacted



 So, if this trouble befell you, then the first thing to do is to check the network. Most likely, communication between vCenter and Veeam proxy is down. Check if all ports are open and accessible, if all DNS names are correctly resolved to the expected IP addresses. Moreover, you need to check the specific proxy involved in the failed job, and not the one standing next to it exactly the same (there are cases).

95% of cases with this error are closed with the mark “Problem with DNS / ports in client infrastructure”.



Therefore, once again I urge you to check very carefully whether the correct DNS server is indicated everywhere, whether there are closed ports and to which IPs the FQDN names are resolved.



 In older versions of VDDK, there was a similar error when using a non-default port for working with vCenter, which accounted for the remaining 5%, but now VMware has hidden the KB with its description, which probably means that the KB is no longer relevant. But you can search for it in the Internet archives at 2108658 (Backup fails when a non-default port is specified for VMware vCenter Server).

 

VDDK error 14009: The server refused connection



 And the last mistake in our today's top is The server refused connection. Everything is absolutely banal here: something prevents the connection between the host and the proxy. In most cases, the firewall is to blame. But - the subtle point - not because of the closed ports, but because of the introduced delays. So, first of all, we check the openness of port 443, and then we look at the timeouts.

If both options did not give anything, go to support. We'll have to check the host itself. Perhaps he is simply too busy and does not have time to respond in time, and perhaps something else.

 

And finally, some useful links:






All Articles