In Version 8 the Veeam Solutions Architect Team released a new format of the Best Practice Guide.
You can find the most actual version under:
It will be updated for v9 soon.
In Version 8 the Veeam Solutions Architect Team released a new format of the Best Practice Guide.
You can find the most actual version under:
It will be updated for v9 soon.
If you ever wanted to test or demo Veeam Backup & Replication functionallity without installing the product, now you can do this here: http://veeam.foonet.be/
We see more and more customers enabling Windows deduplication within a VM to save space on the file servers. Even more with Windows 2016 this will become more and more the standard.
With deduplication enabled you will see a ~20x higher change rate every 28 days at the block level backups (e.g. Veeam). The root cause is the garbage collection run of the Windows deduplication engine.
You can find more informations here:
…and can discuss the solutions here:
my colleague and friend Tom Sightler created an toolset to backup SAP HANA with Veeam Backup & Replication. He documented everything in the Veeam Forum:
Basically it follows the same way that storage systems like NetApp use for Backup of HANA. You implement in Veeam Pre and Post Scripts that makes HANA aware of the Veeam Backups. As well Logfile Handling is included (how many backup data do you want to keep on HANA system itself?).
In case of a DB restore, you go to HANA Studio and can access the backup data on HANA system directly. If you need older versions you can restore them with Veeam File Level Recovery Wizard or more comfortable with the Veeam Enterprise Manager File Restore (Self Services) and hit the rescann button at HANA Studio restore wizard. They are detected and you can proceed with the restore.
as you might know Veeam do not install backup agents on the VMs to process application aware and application- and filesystem consistent backups. Veeam looks into the VM and it´s applications and register plus start an according run time environment that allow application aware backups.
We had lately an internal discussion about this topic and Anton Gostev Vice President of Product Management at Veeam Software allowed me to share his thoughts and ideas behind Veeam’s unique approach.
Andreas Neufert: “Let´s talk first about the definition of Agents. According to http://en.wikipedia.org/wiki/S
Anton Gostev: “All problems which cause issue known as “agent management hell” are brought by the persistency requirement
…(of that Agents from other solutions)…
– Need to constantly deploy agents to newly appearing VMs
– Need to update agents on all VMs
– Need to babysit agents on all VMs to ensure reliability (make sure it behaves correctly in the long run – memory leaks, conflicts with our software etc.)
Auto-injected temporary process addresses all of these issue, and the server stay clean of 3rd party code 99.9% of time.”
Andreas Neufert: “I think we all were at the point where we need to install a security patch in our application and have to wait till the backup vendor released a compatible backup agent version. Or I can remember that we have to boot all Servers because of a new version of such an agent (before I joined Veeam). But what happens if the Application Server/VM is down?”
Anton Gostev: “… Our architecture address the following two issues …
– Persistent agent (or in-guest process) requires VM from running at the time of backup in order to function. But no VMs are running 100% of time – some can be shutdown! We are equally impacted, however the major difference is that we do not REQUIRE that in-guest process was operating at the time of backup (all item-level recoveries are still possible, they just require a few extra steps). This is NOT the case with legacy agent-based architectures: shutdown VM means no item-level recoveries from the corresponding restore point.
– Legacy agent-based architectures require network connectivity from backup server to guest OS – rarely available, especially in secure or public cloud environments. We are not impacted, because we can failover to network-less interactions for our in-guest process. This is NOT the case with legacy agent-based architectures: for them it means no application-aware backup, and no item-level recoveries from the corresponding restore point.
Andreas Neufert: “Everyone who operate a DMZ knows the problem. You isolated the whole DMZ from your normal internal network, but the VMs need a network connection to the backup server which hold as well data from other systems. So the Veeam approach can bring additional security to the DMZ environment. Thank you Anton!”
Thanks for reading. Please send me comments if you want more interviews on this blog.
my friend and workmate Pascal Di Marco ran into some VMware connection limitation while backing up 4000VMs in a very short backup window.
If you ran a lot of parallel backup jobs that use the VMware VADP backup API you can run into 2 connection limitations… on vCenter SOAP connections and on some limitation on NFC buffer size on ESXi side.
All backup vendors that use VMware VADP implement in their product the VMware VDDK kit which help the backup vendor with some standard API calls and it also helps to read and write data. So all backup vendors have to deal with the VDDK own vCenter and ESXi connection count in addition to their own connections. VDDK connections vary from VDDK version to version.
So if you try to backup thousands of VMs in a very short time frames you can hit these limitations.
In case you hit that limitation, you can increase the vCenter SOAP connection limitation from 500 to 1000 by this VMware KB 2004663 http://kb.vmware.com/kb/2004663
EDIT: In vCenter Server 6.0, vpxd.cfg file is located at C:\ProgramData\VMware\vCenterServer\cfg\vmware-vpx
As well you can optimze the ESXI Network (NBD) performance by increasing the NFC buffer size from 16384 to 32768 MB and optimize the Cache Flush interval from 30s to 20s by VMware KB 2052302 http://kb.vmware.com/kb/2052302
on customer request I created a video that shows backup and single mail restore for lotus domino with Veeam Backup & Replication.
A Lotus Domino is non VSS aware (anyway this is the case under Linux). So you have only 2 options for consistent backups as IBM do not support VSS Filesystem only backups:
The question is why should I use a non Domino Backup API based backup?
For Veeam the answer is:
Enjoy the video
just wanted to share a very cool promo for Hyper-V Users.
If you use Microsoft System Center Operations Manager and Microsoft Hyper-V, check out the following link and profit from 100 Hyper-V Socket promo.
Veeam enhanced it´s outstanding virtualization Management Pack in v7 to support Hyper-V now. Forget you very basic Microsoft Hyper-V Management Pack. This one brings you the best tools to fix your broken environments in a very short time.
Happy hands on experience…
there are some general tips and tricks for Backup & Replication that are not directly related with Veeam Software. I will update this blog post from time to time to share these tips.
1) Format Backup Target disks with “/l” to avoid that NTFS blocks access to your very large and frequently updated (fragmented) backup files.
format /FS:NTFS /L
This will take a while and will overwrite the selected folume (data loss be carefull).
If you have Win7/Win2008R2 you need to first install the following patch: http://support.microsoft.com/kb/967351/en-us
2) Fix CPU load VMXnet3 network card bug if you use one virtualized backup server/role or an VM with high disk load:
sometime the easiest things do not work and you have a tough time to find the root cause.
A customer of mine uses an virtual Windows 8.1 as a Veeam Backup Server and also as a VMware HotAdd Proxy for Branch Offices. For security reasons the UAC and Windows Firewall was enabled and username administrator disabled.
We found 3 major challenges in this situation:
– Backup & Replication was not able to run on an other local user than “Computername\Administrator” (“Can not access admin$ share” error.
– Random disconnect of VMware Tools with stunning Backup Jobs.
– After adding the Branch office Backup & Replication Server itself as a Veeam proxy to Head Quarters Backup & Replication server, local hot add processing was not possible anymore. Manual hotadd of disks was still possible… strange
Solutions for this situation:
– Enable File&Print sharing to use another local admin user than “Administrator.
– The second one was fixed by enabling “high performance” or “Höchstleistung” at the windows power options.
– Hotadd processing problem was related to different patch levels of B&R in the branch office. The HQs Backup & Replication Server was on a higher patch level and local branch office server was not able to process hotadd anymore. Running same patch level solved it.
Happy backup…. Andy
Update 1: Added Exchange & Virtualization + NFS Support statement. Split of DAG failover / 20 sec timeout tips.
Update 2: At 1.4. blogger completely destroyed my article as a April-Joke, so I had to rearrange the text again. I added some small updates to the text based on your feedback. Thanks for sending me change ideas.
Update 3: 11.April 2014 Added some planning informations round about CPU/RAM/DISK to section 2. and 3.
Update 4: 24. April 2014 Added Exchange Update Link with Exchange 2013 CU3 example.
Update 5: 23. June 2014 Added Transport Role and CAS role specific backup informations. (See chappter 5. and 6.)
Update 6: 31. July 2014 Added recommended Hotfixes to DAG recommendations section.
Update 7: 01. August 2014 Added a tip from a forum member regarding Public Folder databases and VSS timeouts.
Update 8: 06. November 2014 Added Tip “x)” that addresses VSS Timeouts.
Update 9: 27. November 2014 Corrected a statement at the CAS HA section.
Update 10: 18.05.2015 Added the IP Less DAG section.
Update 11: 20.09.2015 Added Tip y) and z) regarding existing SnapShots
Update 12: 27.01.2016 Added new VMware Exchange on vSphere guide and updated h) because of actual SnapShot changes in vSphere6 actual Patch level. As well I placed h) more prominent on the beginning of the chappter.
Update 13: 04.03.2016 Changed tips f), i) j) k) l) v) y); Changed Priority for Tips; Changed SnapShot background story; Added tips: aa), ab)
Update 14: 29.03.2016 Update the “3. VMware+Exchange Design + Background informations:” section with more tips.
Update 15: 20.04.2016 Added Restore Tips and tricks ac)-am)
Update 16: 29.06.2016 Added chappter “7.” with “Storage Spaces are not supported within a VM”.
Update 17: 31.08.2015 Added new NFS discussions and statements from the internet. Added tip an)
Update 18: 28.02.2017 Changed the NFS statement slightly to reflect my latest discussions with other Exchange experts. + Minor updates on the other topics make them better understandable. Thanks for the feedback here.
Here you can find my updated general recommendations for Exchange/Exchange DAG on VMware together with Veeam Backup & Replication.
1. Check Microsoft Exchange 2013 Virtualization Topic:
– Exchange 2013 DAG are supported on virtualization platforms.
– Microsoft say: Snapshots are not supported, because they are not application aware and can have unintended and unexpected consequences.
Because Microsoft do not support Snapshots at Exchange themselves with, you need to contact your backup or virtualization vendor for any snapshot related questions/support requests. So why using a Snapshot based Backup instead of a Agent based backup method? You can easily restore servers in a minimum time (Instant VM Recovery can bring back the Server in a minute + boot time), Backup time window can be achieved with virtualization backup more easily (so you can perform backups more often on exchange full level), in case of Veeam it will support any new Patch/Service Release by design without an Veeam own Patch… and with Veeams “Veeam Explorer for Exchange” you can restore Exchange objects very fast and uncomplicated. Compared to many others Exchange object restore backup software, Veeam is in many cases very affordable.
However if you use Virtualization based backups (VMware/Hyper-V) you need snapshots for backup. In case of Veeam by enabling Veeam Guest Processing, Veeam uses VSS to bring Exchange in an Application aware backup/restore state before snapshot are started. You can check this by Windows Event Log. There you can find Messages that say “Exchange VSS Writer ((instance GUID) has processed pre-restore events successfully.” and “Exchange VSS Writer (instance GUID) has processed post-restore events successfully.”. The Event IDs are around number 96xx. They are different in each Exchange Version, so search for the text to check if your Backup application use VSS on a Microsoft recommended way.
Link to the Microsoft compatibility list for VMware: https://www.windowsservercatalog.com/results.aspx?&bCatID=1521&cpID=11779&avc=0&ava=0&avq=0&OR=1&PGS=25&ready=0
2. VMware NFS based Datastores and Exchange?
Back in the old times Microsoft had created a KB entry that stated that NFS storage is not supported. Now you will say NFS and Windows? NFS is used in many cases as main storage for VMware and the VM hard disks are written as vmdk files to it.
Originally the statement was created at a time when some non matured NFS storage had some reliability issues under VMware.
Over time Microsoft kept this NFS non supported statement in their documentation:
“All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2016 doesn’t support the use of network attached storage (NAS) volumes, other than in the SMB 3.0 scenario outlined later in this topic. Also, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.”
There is a deep discussion going on in the industry why this is case technically. As a lot of customers run their Exchange systems on NFS based VMware Datastores without a problem in the last years, the question is if the statement is still there for technical reasons.
At least from my field experience as backup architect I can say that there is a specific thing to look for when it comes to reliable VMware Backups when Exchange run on NFS. The Snapshot commit should not lead to any Exchange DAG cluster failovers after you followed the below described tuning tips. You can easily test this without a backup product. Follow all the recommendations and create a manual VMware Snapshot at the production exchange (load) and release it after 15 min. There should not be any fail over in the DAG cluster. It looks like that specifically Hyperconverged Systems tend to run into issues there and blame the backup vendors to fix it. As it is easy to reproduce without the backup vendor, you should test this when you consider to run Exchange on them. There are some newer Hyperconverged vendors that it looks like do not have those issues.
If you want to dig deeper into the NFS unsupported discussion you can read the following:
Please check out this article, it describe the main idea why NFS Datastores for Exchange loads are not an good idea. (Teasers: Exchange Error -1018 (JET_errReadVerifyFailure); NFS can abort transactions only on a best effort base and can cause corruptions in the database.)
However Microsoft added neccessary functionallity to SMB3.0, so in case you run Exchange on Hyper-V Win2012/2012R2 you can use SMB3.0 shared storage on Windows File Server 2012/2012R2. You have to place Exchange data into VHD/VHDX then.
There is lately a discussion that VMware virtualized SCSI driver will abstract the abort transaction and answers to the request virtually as there us no such command comming from the NFS storage under it that can we forwarded.
Check out Eric_W comments on the following website: https://social.technet.microsoft.com/Forums/en-US/c8b4a605-3083-4d0f-b3aa-62ea57cc6d43/support-for-exchange-databases-running-within-vmdks-on-nfs-datastores?forum=exchangesvrgeneral
He describes why this is not a final answer for NFS Storage under VMware and Exchange Databases. As NFS Storage can not abort the transaction and the virtual SCSI driver in case there is a bad thing happen with Exchange. The virtual SCSI driver will give a answer back but the driver do not know by design of NFS if the transaction was really aborted or it was written to the storage. As Exchange recovery process relay on the physical correct data this is problematic and the corruption detection proccess will report a Jet Error 1018. If you alread ran into this error, check the following: http://www.exchangerecover.com/blog/exchange-server-error-messages.html
3. VMware+Exchange Design + Background informations:
Please check this link and the whitepapers linked on the bottom. The whitepapers containing outstanding background informations for your exchange platform design.
There are also 2 interessting Webinars from VMware
For VMware and Hyper-V
– Do not go higher than 2-to-1 in Virtual CPU to Physical Core ratio. Microsoft strongly recommend a ratio of 1-to-1 on the host where the Exchange Server runs.
– Do not use Hyper-Threading ressources when you do the Exchange sizing. Calculate with the physical cores.
– Exchange is not NUMA Aware. So if you can, please size the Exchange VM that way that it can run within a single NUMA node. The idea is to not give Exchange more ressources just because they are free and available if you can run the exchange within one NUMA node with enough ressources. Let Exchange work with the direct memory access if you can.
– Use thick provisioning for the disks for better performance but use thin provisioning for backup and restore optimization.
– Give the OS disk enough spare space and place it on a fast storage system as well (do not place it on a datastore with hundreds of other VM boot volumes.)
– Do not enable deduplication (not supported) or compression (not recommended) on your primary storage that runs the virtualized Exchange workload.
– Use anti-affinity rules to acoid that members of the same DAG run on the same virtualization host.
– Do not use Dynamic Memory
– Reserve 2 physical Cores for the Host OS
– If you perform on Host Backups reserve Ressources for this as well (RAM/CPU)
– Use VHDX whenever it is possible (max 64TB + improved sector alignment + more corruption resistent on power failures)
– Don´t forget to plan the space for the *.bin file (equal to memory size)
Check if you have the latest Exchange Updates installed. For Example:
Cumulative Update 3 for Exchange Server 2013
to address random backup problem with Event ID 2112 and 2180
4. More Veeam specific Exchange Tips and Tricks for DAG and mailbox role backups:
In general, if you use virtualization based backups for Exchange, there is a chance to hit one of 2 problems:
– Exchange DAG cluster failovers
– Exchange VSS timeouts (Exchange hard coded 20 second timeout between start of consistency processing and release.
If you design your Exchange environment that way you will likely prevent those problems and you address the VMware DRS, Storage DRS and vMotion scenarios in most of the cases as well.
Below these tips you can find some insights regarding restore.
Exchange DAG uses a network based heartbeat between the DAG members. When VMware Snapshots are committed, the delta changes are written back to the original storage space. Over time VMware enhanced the process starting from a “Hold on and write a data chunks back and release the VM … repeat” to a process that is similar to Storage vMotion within ESXi 6 U1 Februar 2016 patch levels. It depends as well on the sizes of the disks. If you want to read more about it, you can use the following 2 resources:
An aggressive DAG Cluster heartbeat (default settings) can detect one or more of that VM holds as a network outtake and may perform a DAG cluster failover. Typically vMotion, Storage vMotion and manual SnapShot removal processes lead then as well lead to cluster failovers.
These failovers are likely transparent to the Exchange users but can cause 2 problematic situations:
To address this, check out the following tips and tricks:
Tips for preventing Exchange DAG cluster failovers:
In general you need to prevent big snapshot files. This can be achieved by:
Increase the DAG heartbeat time to avoid cluster fail over (no reboot or service restart needed, they are online after you press enter).
On a command line (with admin rights)
cluster /prop SameSubnetThreshold=20:DWORD
cluster /prop SameSubnetDelay=2000:DWORD
cluster /prop CrossSubnetThreshold=20:DWORD
cluster /prop CrossSubnetDelay=4000:DWORD
cluster /prop RouteHistoryLength=40:DWORD (need to be double the amount of CrossSubnetTrashhold)
You can check the settings with:
After setting this cluster setting, perform an manual Snapshot at VMware Client and release it after 3 seconds.
If a DAG failover is performed, please check tips below and work with VMware Support till this process works without exchange failover. After that you can work on the Veeam side by optimizing backup infrastructure.
If you face [VSS_WS_FAILED_AT_BACKUP_COMPLETE]. Error code: [0x800423f3] and tip a) didn´t helped, check for dismounted databases and mount or delete them.
Use new Veeam Storage Snapshot Feature (HPE StoreVirtual including VSA, HPE 3PAR StoreServ, NetApp ONTAP, EMC VNX(e), Cisco HyperFley) if you can (after Veeam v7 release) => Reduces Snapshot Lifetime to some seconds => No load and problems at commit because of less data. (This option can be counterproductive if you experience the 20 sec VSS timeout). Veeams Cisco HyperFlex integration avoid usage of VMware VM Snapshots at all.
Use at minimum VMware vSphere 5.0 because of changes in the snapshot places and Background things. vSphere 6u1 lates patch level (Jan2016 upated) completely changed the way Snapshots are committed. I highly recommend to update to it if you have any problems round about Snapshot commit.
To reduce Snapshot commit time (and to reduce data in the snapshot), try to avoid any changes at the backup time window (User, Background processes, Antivirus, ….). Also try to avoid that on all LUNs on the storage System itself (faster writes at snapshot commit).
If you cannot avoid many changes on block level at your backup window? Use (Forward) Incremental with or without synthetic fulls. Reverse Incremental will take a bit longer than the other backup methods as they perform 3x more IOs at backup target within the snapshot lifetime. This lead to longer snapshot lifetime and at the end the Snapshot removal process has to handle more data.
To reduce backup time window and snapshot lifetime, use Direct SAN or Direct NFS Mode. If you can not use one of them, use NBD-Network mode for Exchange backups. Avoid in any way HotAdd processing with NFS Datastores (which are not supported by Exchange in any way either). Keep an eye on Proxies that run on AutoDetect transport mode if they use the correct mode!
Disable VDDK Logging for Direct SAN Mode if your backups themselves run stable (ask Veeam support for the registry key and consequences). This tip is important if you have more than 10 LUNs connected to the Veeam Proxy.
As well very important is that you install and configure the correct multipath driver for your storage on the Proxy. Run the storage with host profile VMware even if the Veeam Proxy is a Windows System. (In no way use Windows Failover Cluster host profile at the Storage-Hosts settings)
To reduce Snapshot lifetime and reduce amount of data to snapshot commit, use new Veeam parallel processing with enough resources to backup all of your disks at the same time (after v7 release). => Add more Proxy Ressources.
Use actual VMware Versions (newest VADP APIs and SnapShot processses with a lot of updates in it) and actual Veeam Versions (newer VDDK Integration). And install actual ESXi/vCenter patches!!!
Still problems: Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage !!!). It is very important to not place the OS disk on a datastore with hundreds of other OS boot disks.
Keep an eye on the Storage Latency. On average it should not be higher than 30ms for all disks (including OS) and maximum latency should not go above 40ms. Original latency numbers from Microsoft are 20ms and max. 50ms, but specifically the max latency should not go above 40ms BEFORE you perform a backup and process Snapshots.
Less VM VMware disks can help to reduce snapshot commit time. By default even with no delta change data each disk will slow down snapshot commit. This time was reduced by 4x with ESXi6U1Feb2016 patch level.
If you run all disks within the same datastore, increase of maximum parallel snapshot commits Veeam setting can help to reduce the needed snapshot commit time (parallel vs. sequential). Check with Veeam support the registry settings and needed storage performance for this.
In a worst case scenario and no other tips help, you can check the following VM setting. This an undocumented VM setting and you have to check with VMware the support statement. This was a tip from one of my customers with 13TB+ Exchange environment, who had a long run with VMware Support.
snapshot.maxConsolidateTime = “1” (in seconds) (again do this only together with VMware support). This setting was used with ESXi 4 and I think at least for vSphere 6 it is not usable as the SnapShot process was changed.
If you have problems with cluster failover at Backup, one option is to backup DAG member(s) that hold only inactive databases (no cluster failover because of no active databases) (Logfile Truncation will be replicated by Exchange in whole DAG). This give you also the option to restart the server or services and Exchange process VSS consistency more faster afterwards. If you restart the services, take care that you wait long enough afterwards that also the VSS Exchange writers come up again, before you backup.
If you add an additional DAG member server for this, you have to check with you Exchange Architect the situation, because you change the member count for quorum failover selection. (e.g. you have 2 Exchange DAG members on different datacentres and a whiteness disk on datacentre 3 and you add an additional Exchange Server on one side, the failover is affected because of different server count on the one datacentre.)
To check if replication happend you can check that Windows Logs\ MSExchangeRepl Event 2046 occurs (Backup is happening for Database XXX) and there will be as well other messages that state that replication of log truncation happens. If you perform backup of an active Database there are other log messages for replication:
Applications and Services Logs \ Microsoft \ Exchange HighAvailability \ TruncationDebug
• Event 224 – The replication service decides which logs to truncate
• Event 225 – No logs will be truncated (either not enough logs or Circular Logging is enabled )
• Event 299 – The replication service truncates the logs ( or will tell you that there is no minimum amount of logs for truncation)
Install MS recommended fixes to avoid cluster failover problems, because of different storage latency at VM snapshot commit. http://blogs.technet.com/b/exchange/archive/2011/11/20/recommended-windows-hotfix-for-database-availability-groups-running-windows-server-2008-r2.aspx
Keep in mind that there are hotfixes that are recommended but not yet rolled out within Windows Update!!!
Delete all existing VM Snapshots before you start. Existing SnapShot can slow I/O at Storage processes like SnapShot commit. As well it avoid enablement of VMware Change Block Tracking which lead into 100% data read at backup (Snap and Scan Backup) and cause longer SnapShot Lifetimes with longer snapShot commit phases.
Maybe not directly related but some of the customer reported that they had high CPU/RAM usage and where abel to fix logfile truncation problems by adding more CPU/RAM ressources to the VM.
Use newest avilable VMware Tools version within the VMs. This is NOT optional.
Tips for preventing VSS (timeout) problems:
If you perform an VSS based consistency on an exchange server, hard coded 20 second timeout release the Exchange VSS writer state automatically if the consistency state are hold longer than these 20 seconds. The result is that you cannot perform consistent backups.
In detail, you have to perform Exchange VSS Writer consistency, VM snapshot and Exchange VSS writer release in these from Microsoft hard coded 20 seconds.
With newer Veeam Backup & Replication versions, Veeam will detect this automatically and perform another kind of VSS SnapShots. See tip x)
If you see Exchange VSS Timeout EventLog 1296 => Change Log setting =>
Set-StorageGroup -Identity “<yourstoragegroup>” -CircularLoggingEnabled $false
Use at least Veeam Backup & Replication v8. If standard VSS processing of a Microsoft Exchange Server times out, the job will retry processing using persistent in-guest VSS snapshot, which should prevent VSS processing timeouts commonly observed with Microsoft Exchange 2010. Persistent snapshots are removed from the production VMs once backups are finished. All VM, file-level and item-level restore options have been enhanced to detect and properly handle restores from Exchange backups created with the help of persistent VSS snapshots.
In many cases Exchange can perform consistency more faster if you add more CPU/Memory to the VM. Based on customer feedback this solved many of the VSS timeout problems.
Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage as well !!!)
The worst thing you can do is to place the OS disk on a datastore with hundreds of other boot vmdk volumes on a Raid5/Raid6 storage.
Use at minimum VMware vSphere 5.0 because of changes in the snapshot creation area.
Use actual VMware Versions (newest VADP/VDDK Kits with a lot of updates in it) and actual Veeam Versions (newer VDDK Integration). And install actual ESXi/vCenter patches! => Perform Snapshots more faster
Important one on VMware side: Less VM disks will reduce snapshot creation time. Check how long it take to start a snapshot of the VM in VMware vSphere (web) client. Think about that you need to perform Exchange VSS writer consistency + VM snapshot in 20 seconds.
To optimize snapshot creation time:
Check your vcenter load and optimize it (or use direct ESX(i) Connections for Veeam VM selection, so that the snapshot creation took less time.)
Check your health an configuration of Exchange itself. I saw some installations where different problems ended up with a high cpu utilization at indexing service. This prevented VSS to work correct. Check also all other mail transport-cache settings. Sometimes the Transport Service cache replicate shadows of the mails over and over again and nobody commit them (if you have multiple transport services together with firewalls between them).
Veeam specific: Veeam performs VSS processing over the network. Check with Veeam UserGuide TCP Port Matrix that B&R Server can perform Veeam Guest Processing over the network (open Firewall Ports).
If this is not possible Veeam failback (after a timeout) to networkless VMware Tools VIX communication channel (Veeam own) In-Guest processing. If you use networkless In-Guest processing, change the veeam registry key, so that VIX based processing is performed before network In Guest processing. => No wasted time because of waiting for timeout.
However network based In Guest processing is performed faster and I recommend it.
One of the Forum member report VSS problems when a PublicFolder database is present on the server. If the above mentioned tips did not help, temporarly disable your PublicFolder DB or replicate (DAG) it to another server. If the backup then runs fine, check with your Exchange Architect the PublicFolder DB configuration. Another option can be to upgrade to Exchange 2013 because no special PublicFolder DB needed (PublicFolder are covered by normal mailbox DBs).
Delete all existing VM Snapshots before you start. Existing SnapShot can slow down SnapShot processing.
Tips for restore planning with Veeam:
It doesn´t matter if the VM holds and active or inactive DAG member for Veeam restore. Enable InGuest processing of Veeam so that Veeam can interact with the OS and Exchange to bring it into a consistent state and let Veeam set some restore awareness settings that the OS and Exchange know that they where recovered when you perform an VM restore.
– Instant VM Recovery/VM Restore/QuickRollback. OS will boot and because of the restore awareness settings Exchange will perform automatic recovery steps at the DB. You can find this step documented in the Event Log. Depending of the Exchange Version it can have another EventID. For Example Event ID 9618; Event Source MSExchangeIS; General Exchange VSS Writer (instance GUID) has processed post-restore events successfully. To enable this automatic step, you need to boot the VM at restore with network.
– Veeam Explorer for Exchange based Single Iteam (Mail/Calendar/…) Restore is as well compatible with inactive and active Databases.
Veeams Explorer for Exchange will be started within the Veeam console if you want to restore a single iteam like Mails, Calendar Entries,… . It will load the actual ese.dll from the Exchange Backup and access the Exchange Database Files directly out of the backup. This way Veeam is automatically compatible with all Exchange Updates and fixes even if there are changes within the DB structure (ese.dll abstract this to Veeam).
With this process you have to think about 2 things at the design:
– ese.dll can load by the way Veeam uses it “only” 64 databases. database number 65 will not be loaded. But you can undload a database and add higher databases without problems.
– Veeam need a specific time per database for the mounting. The time depends on the backup target storage that you use. For example if it will take 10 seconds per database, it will take 1 Minute to mount 6 databases, but it will take 10 minutes to mount 60 databases. So a kind of best practices is to keep the database count at a resonable level. A workaround can be to start the Win File Level Recovery and to start the Veeam Explorer for Exchange from there and select only the needed databases manually to shorten up the restore process if you have these 50+ databases per server.
If you “only” have a crash consistent backup or snapshot, you can use the Veeam Explorer for Exchange as well. Start the Veeam Win File Level Recovery Wizard and from there the Veeam Explorer for Exchange. As Veeam didn´t had the chance to collect the database places at backup (crash consistent) you can add each database and it´s logfile manually.
Veeam support single object restore of Public Folders as well. Depending on the version and settings you maybe have to export data to a pst file with the Veeam Explorer for Exchange wizard if you can not send the data back to the public folder database/mailbox. The PST data can be mounted to a Outlook and you can copy and paste back the data if needed. If you face an empty Public Folder with Veeam Explorer for Exchange, please contact Veeam support there is a hotfix available.
If you want to use Export to PST, Veeam needs an Outlook 64bit on the server/workstation which run the Veeam Explorer for Exchange.
PST Explort places a documentation in the root of the pst that is compatible with the demand of court and lawyers. A restore protocoll which document all restores can be found at the Veeam UI at the “History-Restore” part.
If you are not allowed to access mails directly at restore, you can use the Veeam Enterprise Manager and it´s iteam restore possibilities to restore only changed/deleted files from a specified time period (without to get access or see the mails themself). This is for example sometime needed to be inline with the law and employee council (Betriebsrat).
Veeam Explorer for Exchange uses Exchange Web Services (EWS) for restore. Please check that EWS is working correctly and the Veeam Explorer can acess a CAS Server by TCP 443. DNS is in most cases needed. As the Explorer write directly into the mailbox of the user, the at restore defined user needs to have write access to the mailbox. You can define the restore user that write to the mailbox at restore. It can be the original user, and admin user or a user that you give temporary access to the mailbox for restore.
Veeam do not use any recovery database processes. So you can save this space on the Exchange Server. As these process isn´t used for any restore you save a lot of time at restore as well and reduce complexity dramatically. On the other hand Veeam can not restore a single database compleatly with the wizard. You have to create a new database with the user mailboxes and let the Veeam Explorer for Exchange restore the mailbox data. This protects you from any database corruption that is maybe within the backup but is not detected when you do a legacy restore of the database. However if you want to perform a database restore, you can use the Veeam file restore wizard to restore all needed database and logfiles and perform the other needed steps manually according to the Microsoft recovery steps: https://technet.microsoft.com/en-us/library/dd876954(v=exchg.150).aspx There are as well a lot of more illustrated recovery examples available at the internet. Just google for them.
For Veeam restore it doesn´t matter if you have one DB by vmdk or multiple. If you place the Exchange Server on VMware, you shold keep the VM disk count low to streamline snapshot commit processes that are used at backup (see above). In most cases you should keep the count below 8-10 and use ESXi6 U1 with Feb 2016 patch level that reduces overhead at snapshot commit dramatically.
The data amount per DB do not matter for Veeam, beside the data amount that you have selected for restore. The mount of the database at Veeam Explorer for Exchange is not really affected by the database size.
5. More Veeam specific Exchange Tips and Tricks for transport role backups
Shadow redundancy can help to protect your undelivered messages by replicating them immediately to a second transport role. Even this is more important with Exchange 2013 because DAG members are transport role servers as well.
Check with your Exchange Architect, that you backup at least one Transport server that holds an original or a shadow of your undelivered mails.
If you backup only one Exchange 2013 DAG member that holds only offline databases (or Exchange 2010 DAG members that has transport role enabled), check the requirements section of this article to analyse if you backup everything what you need.
You also need to check that your backup software works application aware (e.g. Veeam Guest processing) and that in case of a restore the transport role detect the restore scenario as well and do not deliver already sent messages a second time. This is done by VSS processing at restore. VMware Tools quiescence at backup cannot achieve this by design.
A common configuration is to use a full blown Exchange Transport Role Server, instead of an Exchange Edge as an SMTP Gateway to the internet. In many cases, this is done to save some public IP addresses for running CAS Server on the internet on the same server.
You will likely place this Server in your DMZ and open only needed port 25 for mail transfer. This isn´t enough if you have enabled shadow redundancy because the shadows are not being committed anymore. Disabling shadow redundancy only because of this isn´t acceptable as they potentially lead to some lost mails if you lose one of the transport servers. Think about you lose one Server with and mail that has “the opportunity of your (your companies) life” mail in it.
Backup of IP Less DAG with Exchange 2013 SP1:
There is an cluster operation mode for exchange mailbox role where you do not need any Cluster IP address and name. You need to check if your backup software can handle this. Veeam Backup & Replication uses the VSS Exchange writer to bring the Information Store in a consistent state. No change needed IP Less DAG. The restore is done by EWS and as well not affected. Good news for Veeam users.
6. More Veeam specific Exchange Tips and Tricks for CAS role backups.
You had 2 options with Exchange 2010 to achieve CAS redundancy.
1) External Layer 7 load balancer
Two or more CAS Server are placed behind an (redundant) load balancer that hold a cluster IP address. The Load balancer is aware if your CAS Server answers on a service port and how long the answer time is. This is the recommended way of doing CAS redundancy and no special things are needed for CAS backup.
2) Windows Network Load Balancing (NLB).
Windows is used to create a cluster IP address and all servers can answer to requests. Main problem here is that it is not service aware. So if your server is up and answering in the network but your Exchange CAS Service is down, your users do not get answers on a random base. => Invest in External Load Balancers!
If you want to try Windows NLB cluster way, you need to keep an eye on the configuration when using Veeam for backup. Veeam Guest Processing will read out VMware Tools IP address list and will use the first found IP address in the list and contact the VM. If the first IP address is the cluster address the guest processing might be connect to another server because NLB cluster route it to another VM.
So you can do one of the following:
1) As VMware Tools list IPs ascending, the NLB Cluster IP needs to be higher than the normal addresses.
2) If you cannot change the IP addresses (Public IPs for example) you can ask Veeam support for a hotfix that “Invert IP order guest processing Inverseipprder.zip”
3) Another option is to use VIX processing instead of direct network connection for Veeam Guest processing. The easiest way to do this is by blocking all communication to this VM over the network (Windows Firewall/your company firewall/…). Let port 443 open for Veeam explorer for Exchange restores.
In general you can get some background information about this here:
7. Windows Storage Spaces are not supported within a VM
I saw lately 2 customers running Exchange databases on Storage Spaces within a VM. They did this because of 2TB Storage system limitation.
It is not supported based on a lately updated articel from Microsoft:
” Storage layers that abstract the physical disks are not compatible with Storage Spaces. This includes VHDs and pass-through disks in a virtual machine, and storage subsystems that layer a RAID implementation on top of the physical disks. iSCSI and Fibre Channel controllers are not supported by Storage Spaces.”
Veeam Explorer for Exchange maps the VM volumes out of the backup to a windows system. Storage Spaces volumes are not imported automatically from windows, so the Veeam Explorer can not access the filesystem which hold the Exchange databases.
When you start a Veeam Window File Level Recovery Wizard for a VM that contains storage spaces volumes, you can see only System Volume Information folder on the drive.
Workaround if you really need mails from such a sceanrio:
Boot the VM with SureBackup in a Veeam “OnDemand” Virtual Lab. Stop the exchange service on the Virtual Lab Exchange Server and use Veeam Explorer for Exchange to access the databases for restore.
If there is no way to migrate the data to a single big volume or to spread the User Mailboxes accross individual small databases that fits the volume size needs, you can use Dynamic Disks to expand the drives. At least this is supported from Veeam side for Exchange restores. Based on Microsoft https://technet.microsoft.com/en-us/library/ee832792.aspx this is as well a supported way from Microsoft for Exchange. “Dynamic disk: A disk initialized for dynamic storage is called a dynamic disk. A dynamic disk contains dynamic volumes, such as simple volumes, spanned volumes, striped volumes, mirrored volumes, and RAID-5 volumes.”
8. Additional Tips:
Use the Veeam Forums http://forums.veeam.com and search for specific Exchange Topics, there you can find additional tips and feedback. Keep in mind that Veeam Forum is not a official support forum. If you need urgent help, please open a support ticket http://www.veeam.com/support . Testing and Proof of concept environments have support with lower priority.
Do you have feedback?
Was one of the tips helpful?
Please leave a comment.
All the best to you and success… Andy
VMware Backup from NFS (File) Datastores:
For most common VMs (90%) I would use Veeams Direct Storage (new Veeam Direct NFS) backup mode for backup and restore. Direct NFS is the fastest restore method within Veeam as it is written from scratch by Veeam and do not leverage the VMware VDDK kit.
Veeam Backup & Replication Proxy Mode Autodetection process works like this:
Nice Infographic about Veeam Backup & Replication v7