Tag Archives: Sizing

vCenter connection limitation and backup in big environments

Hi Team,

Update from 2019-05-20: Since some years the below SOAP modifications within vCenter are not needed anymore as Veeam caches all needed vCenter information in RAM which reduced the vCenter connection count drastically at the backup window. See Broker Service note here: https://helpcenter.veeam.com/docs/backup/vsphere/backup_server.html?ver=95u4

My friend and workmate Pascal Di Marco ran into some VMware connection limitation while backing up 4000VMs in a very short backup window.

If you ran a lot of parallel backup jobs that use the VMware VADP backup API you can run into 2 connection limitations… on vCenter SOAP connections and on some limitation on NFC buffer size on ESXi side.

All backup vendors that use VMware VADP implement in their product the VMware VDDK kit which help the backup vendor with some standard API calls and it also helps to read and write data. So all backup vendors have to deal with the VDDK own vCenter and ESXi connection count in addition to their own connections. VDDK connections vary from VDDK version to version.

So if you try to backup thousands of VMs in a very short time frames you can hit these limitations.

In case you hit that limitation, you can increase the vCenter SOAP connection limitation from 500 to 1000 by this VMware KB 2004663 http://kb.vmware.com/kb/2004663
EDIT: In vCenter Server 6.0, vpxd.cfg file is located at C:\ProgramData\VMware\vCenterServer\cfg\vmware-vpx

As well you can optimze the ESXI Network (NBD) performance by  increasing the NFC buffer size from 16384 to 32768 MB and optimize the Cache Flush interval from 30s to 20s by VMware KB 2052302  http://kb.vmware.com/kb/2052302

Tips & Tricks for Backup & Replication not directly related to Veeam (continuously updated)

Hi,

there are some general tips and tricks for Backup & Replication that are not directly related with Veeam Software. I will update this blog post from time to time to share these tips.

 

1) Format Backup Target disks with “/l” to avoid  that NTFS blocks access to your very large and frequently updated (fragmented) backup files.
format /FS:NTFS /L
This will take a while and will overwrite the selected folume (data loss be carefull).
If you have Win7/Win2008R2 you need to first install the following patch: http://support.microsoft.com/kb/967351/en-us

2) Fix CPU load VMXnet3 network card bug if you use one virtualized backup server/role or an VM with high disk load:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495

Veeam Backup & Replication Target Sizing Calcualtor (non official)

Hi everybody,

kudos to my colleague Timothy Dewin who wrote an excellent Veeam Backup & Replication Restore Point Simulator.

Keep in mind that this is not official and has not support or garantie that it work correct.
Also deduplication and compression is dependent to the data and is from customer to customer different.

The Default values are OK from my Feeling. If you have Domino DBs this is a bit too low because of Domino own compression.

And don´t forget to select Reverse Incremental if you want max space saving.

Here you can find the tool.
http://rps.dewin.me/?strat=fwd&exec

CU Andy

HA? DataProtection? DataRecovery? – IBM San Volume Controller SVC and Veeam Backup & Replication

Many DR scenatios didn´t reflect the need of the companies. Sometimes because of budget problems, sometimes of other things.
Why you want to bring data to another site?
HA? DataProtection? DataRecovery?

If you look at syncron mirrors. This is a typical HA szenario. If you go active-active there it can help to bring the servers back online as fast as of a boot. (VMware HA or othe cluster/failover solutions at legacy systems). Because this is an expensive one for legacy systems, this leeding to a scenario which VMs are ways better protected (HA) than my Tier 1 legacy systems. This and the advantages for maintenance (vmotion), power and cooling savings brings customer to a point that more and more Tier 1 applications are placed on virtualization.
Products that can be helpful here are IBM SVC, Datacore, Netapp, others.

Why this szenario has nothing to do with DP or DR.
You replicate only the disk data, so applications and DBs are not in a consistent state. Also the Applications and DBs are not in a Application Restore aware state. (see below)
Software errors are mirrored as well and if there are bugs in the code of the solution both sides are affected.
If you look at the storage and storage virtualization systems and their bigger and bigger fail domains you need a backup solution that fits your datarecovery needs.
Many customers looking because of the big fail domain at Replication solutions and looking for storage replication that can store more then 1 restore point (replicated snapshots).
Here you need to regard that your Servers/DBs/Applications are in the following states that you have no problems in case of restore/recovery. This is pretty the same demand for BackupSoftware.
a) Consistency (Application/OS/DBs). Basically before you do a replication/snapshot all RAM caches needs to be written to disk and no open write commands of the filesystem are there.
b) Application awareness: You have to set some settings in the OS/Application/DBs that after next boot(after restore/recovery) they jump into a mode that they needed in that secenario to avoid problems. A typical problem that is a good example for this is if you start active directory servers without this from different snapshots/recovery points you end up with an inconsistent and not supported active directory database.

So in most cases you need some software that can do this in addition to the storage snapshot replication base. IBM Flash Copy Manager or Netapp SnapManager for example. Or if you do this on the Software side Veeam (Virtualization) is a good example for this.

If you look at the size of your fail domains and your demand to bring back a lot of servers in a fail szenario. You need tools that can do this with an easy to use solution. Because of budget concerns many go only here with their backup software.

Most Backupsoftware can help to restore a Server in a timewindow you need (SLA) but if more than one server are affected this leeding into the situation where normal SLA are exceeded. There are some Backup Software that can help with this and can start systems directly out of the backup, start the OS and Applications and user can work, while later the data is transfered back to the repaired storage system. Veeam do this for example with VMs since 2010, but there are other solutions that can do this, for example NetBackup has announced that they will have that sort of recovery for VMs. Veeam holds the patents for this Instant VM Recovery.

Because this solutions uses the backup environment ressources to bring up the systems this can only be done with a limited number of systems, depending on the backup environment (20-100 VMs). So the idea was to add software based replication (Veeam for example) that has no direct interaction with storage System (storage fails can not harm this system) to replicate most critical systems to a separated datacenter over IP WAN links. There you can start (application aware) your core systems and you can recover over the next time your not so critical systems.

One cool thing I want to add from Veeam side. You are able to automatically test your Replicas (with v7 or newer) or Backups (v5 or newer) if they are able to restore your workloads. This scheduled restore checks test the  OS boot,  Network Connection and application Response.This include problems that are based on source side server. Simple example is a corrupt windows boot file that you do not detect in production. All backup solutions checks only the data with checksums > if same = backup/replication successfull. And you fail in case of restore. So this can be helpful to detect problems before you run into a failover scenario.

This described scenario reflect most budgets and can dramatically increase your operating safety.

Examples for a small szenario:

2 ESX Hosts with shared storage (redundant controllers)
Third ESX host on separate fire section or hosted datacenter. Which hold allsVMs as replicas.
Backup on the first side for fast restore and application/file restore.
Backup and recovery cheks on the second side of the replicas will help to prevent you from trouble in case you need your DR. 
Implementation of automatic Restore Checks with SureBackup and SureReplica.
 HW: Small Budget (Standard solution)
SW: Hyper-V or VMware Essentials + Veeam Essentials
Powerfull solution I think

Example for midsize – enterprise szenario:
VMware Hosts
Active-Active mirrored Storage (vdisk mirroring) for example with IBM SVC splitted over to datacenter on two fire sections. Third site with SVC Quorum.
Another datacenter many miles away with another Vmware environment which you use with Veeam replication of most critical systems.
Backup to third site (SVC Quorum) with Veeam (Fast Restore of Files/Objects/Servers).
Implementation of automatic Restore Checks with SureBackup and SureReplica.

What do you think? Yes it is very virtualization related but you get more operation safety than in legacy environments. Why not virtualize your biggest DBs on a 1:1 ratio and profit from this DR scenario. If you are concerned about VMware Snapshot commit szenarios or 2TB volume size limit, have a look at Hyper-V 3. The main idea behind this described szenario works there as well.

CU Any