Hi … found that cool RFC-1855 and thought it belongs here…
You can fix some of the layer 8 problems with RFC-1855. This is the third most common solution after RTFM and Brain restart as solution… ;o)
Hi … found that cool RFC-1855 and thought it belongs here…
You can fix some of the layer 8 problems with RFC-1855. This is the third most common solution after RTFM and Brain restart as solution… ;o)
on customer request I created a video that shows backup and single mail restore for lotus domino with Veeam Backup & Replication.
A Lotus Domino is non VSS aware (anyway this is the case under Linux). So you have only 2 options for consistent backups as IBM do not support VSS Filesystem only backups:
The question is why should I use a non Domino Backup API based backup?
For Veeam the answer is:
Enjoy the video
HP StoreVirtual VSA is a beast at cloning, because it detects Network and Storage as new devices. On the network side static MAC addresses do a good job but offline RAID because of lost disks is ugly. Glad that we have our fellow Veeam VMCT Trainer Eric Rossmanith, he found the following way to fix it:
Go to your Cluster-Storage Systems-YourNode-Storage
Click the RAID Setup Tasks Button and select “Reconfigure Tiers…”
Eric, again thanks for this great solution that is very helpful in Lab situations.
just wanted to share a very cool promo for Hyper-V Users.
If you use Microsoft System Center Operations Manager and Microsoft Hyper-V, check out the following link and profit from 100 Hyper-V Socket promo.
Veeam enhanced it´s outstanding virtualization Management Pack in v7 to support Hyper-V now. Forget you very basic Microsoft Hyper-V Management Pack. This one brings you the best tools to fix your broken environments in a very short time.
Happy hands on experience…
there are some general tips and tricks for Backup & Replication that are not directly related with Veeam Software. I will update this blog post from time to time to share these tips.
1) Format Backup Target disks with “/l” to avoid that NTFS blocks access to your very large and frequently updated (fragmented) backup files.
format /FS:NTFS /L
This will take a while and will overwrite the selected folume (data loss be carefull).
If you have Win7/Win2008R2 you need to first install the following patch: http://support.microsoft.com/kb/967351/en-us
2) Fix CPU load VMXnet3 network card bug if you use one virtualized backup server/role or an VM with high disk load:
sometime the easiest things do not work and you have a tough time to find the root cause.
A customer of mine uses an virtual Windows 8.1 as a Veeam Backup Server and also as a VMware HotAdd Proxy for Branch Offices. For security reasons the UAC and Windows Firewall was enabled and username administrator disabled.
We found 3 major challenges in this situation:
– Backup & Replication was not able to run on an other local user than “Computername\Administrator” (“Can not access admin$ share” error.
– Random disconnect of VMware Tools with stunning Backup Jobs.
– After adding the Branch office Backup & Replication Server itself as a Veeam proxy to Head Quarters Backup & Replication server, local hot add processing was not possible anymore. Manual hotadd of disks was still possible… strange
Solutions for this situation:
– Enable File&Print sharing to use another local admin user than “Administrator.
– The second one was fixed by enabling “high performance” or “Höchstleistung” at the windows power options.
– Hotadd processing problem was related to different patch levels of B&R in the branch office. The HQs Backup & Replication Server was on a higher patch level and local branch office server was not able to process hotadd anymore. Running same patch level solved it.
Happy backup…. Andy
welcome to my new blog. I will focus here round about Backup, Storage, Virtualization and tips from the field. Also I will open my blog activities to some new stuff (to be continued).
Thanks for reading and all of your support.
My old blog is still there (read only) and continue to deliver content to hundreds of visitors each day. You can find it under http://neufert-at-veeam.blogspot.com
All the best to you… have fun… Andy
maybe this is an well known thing, but I never used an “!” at vSphere Password before and experienced some “inconvenience”. I installed vCenter Server and was not able to see the vcenter and also not able to access SSO configuration, because it was not there. (Remeber SSO configuration in 5.5 was rewritten and it is normaly found directly in vSphere Web Client – Administration tab.
“!” are not an allowed character at vSphere SSO Administrator password, but Setup process allow it.
If you used it in you password, you are able to logon to Web Client but you see no vSphere Server nor are you able to see the SSO configuration area. If you do not want to install your Server from scratch, you can use the following command to change the password:
Open SSH connection
The funny thing is, that you need to take care as well, that there is not an “!” in the auto generated password ;o)
After that login to Web Client and you can access SSO configuration now. In my case the vSphere Server showed up automatically after this as well.
in the last time more and more customers test Hyper-V and migrate their workloads.
Many of them are not aware, that Hyper-V need some critical updates for a stable operation (and Backup).
Please check the following Links:
Update 1: Added Exchange & Virtualization + NFS Support statement. Split of DAG failover / 20 sec timeout tips.
Update 2: At 1.4. blogger completely destroyed my article as a April-Joke, so I had to rearrange the text again. I added some small updates to the text based on your feedback. Thanks for sending me change ideas.
Update 3: 11.April 2014 Added some planning informations round about CPU/RAM/DISK to section 2. and 3.
Update 4: 24. April 2014 Added Exchange Update Link with Exchange 2013 CU3 example.
Update 5: 23. June 2014 Added Transport Role and CAS role specific backup informations. (See chappter 5. and 6.)
Update 6: 31. July 2014 Added recommended Hotfixes to DAG recommendations section.
Update 7: 01. August 2014 Added a tip from a forum member regarding Public Folder databases and VSS timeouts.
Update 8: 06. November 2014 Added Tip “x)” that addresses VSS Timeouts.
Update 9: 27. November 2014 Corrected a statement at the CAS HA section.
Update 10: 18.05.2015 Added the IP Less DAG section.
Update 11: 20.09.2015 Added Tip y) and z) regarding existing SnapShots
Update 12: 27.01.2016 Added new VMware Exchange on vSphere guide and updated h) because of actual SnapShot changes in vSphere6 actual Patch level. As well I placed h) more prominent on the beginning of the chappter.
Update 13: 04.03.2016 Changed tips f), i) j) k) l) v) y); Changed Priority for Tips; Changed SnapShot background story; Added tips: aa), ab)
Update 14: 29.03.2016 Update the “3. VMware+Exchange Design + Background informations:” section with more tips.
Update 15: 20.04.2016 Added Restore Tips and tricks ac)-am)
Update 16: 29.06.2016 Added chappter “7.” with “Storage Spaces are not supported within a VM”.
Update 17: 31.08.2015 Added new NFS discussions and statements from the internet. Added tip an)
Update 18: 28.02.2017 Changed the NFS statement slightly to reflect my latest discussions with other Exchange experts. + Minor updates on the other topics make them better understandable. Thanks for the feedback here.
Update 19 12.10.2017 Updated most part of the text and links. Added tip “ao)” regarding vRDMs.
Update 20 02.11.2017 Tip j) was corrected. There is no reg key to influence snapshot commit parallel disk processing within same VM.
Update 15.06.2018 Tip ap) Not all Logfiles are deleted at Logfile Truncation in a DAG?
Here you can find my updated general recommendations for Exchange/Exchange DAG on VMware together with Veeam Backup & Replication.
1. Check Microsoft Exchange 2013 Virtualization Topic:
– Exchange 2013 DAG are supported on virtualization platforms.
– Microsoft say: Snapshots are not supported, because they are not application aware and can have unintended and unexpected consequences.
Because Microsoft do not support Snapshots at Exchange, you need to contact your backup or virtualization vendor for any snapshot related questions/support requests. The question is why should someone then use Snapshots (for backups) with Exchange when Microsoft do not support it? Why not use agent based backup methods? The data amount is critical. When you do Agent based backups you need to process full backups regularly. With VMware based or similar backup methods you can use Incremental Forever backup methods to minimize the backup window. Another really game changer is the restore part. With an Agent you have to transport back your TBs of data and with Image level backup methods you can restore an Exchange service with Instant VM Recovery in some minutes (usually 2-3min + boot time). Next argument is that Agents usually depends on the Exchange Version and Update. If Microsoft creates a new security patch/update and you have to wait until your Agent vendor support this, you run into potentially security issues. With Veeams Agentless backup method, it will automatically process exchange backups without a change as the VSS framework will not change for consistency. The Exchange Restore part with Veeam Explorer for Exchange will automatically load the ese.dll from the backed up exchange server and can access the data usually without an update as it uses the standard ese.dll to access the data. And last thing Veeams agentless backup is affordable compared with many other agent based Exchange backup solutions.
However if you use Virtualization based backups (VMware/Hyper-V) you need snapshots for backup. In case of Veeam by enabling Veeam Guest Processing, Veeam uses VSS to bring Exchange in an Application aware backup/restore state before snapshot are started. You can check this by Windows Event Log. There you can find Messages that say “Exchange VSS Writer <instance GUID> has processed pre-restore events successfully.” and “Exchange VSS Writer (instance GUID) has processed post-restore events successfully.”. The Event IDs are around number 96xx. They are different in each Exchange Version, so search for the text to check if your Backup application use VSS on a Microsoft recommended way.
Link to the Microsoft compatibility list for VMware: https://www.windowsservercatalog.com/results.aspx?&bCatID=1521&cpID=11779&avc=0&ava=0&avq=0&OR=1&PGS=25&ready=0
Another request that I hear frequently is that the backup vendor needs to be “certified” for Exchange backups. Over the years I did not find any kind of certification program from Microsoft for Exchange. A backup vendor can only certify the platform (Windows) and usually all backup vendors have those.
2. VMware NFS based Datastores and Exchange?
Back in the old times Microsoft had created a KB entry that stated that NFS storage is not supported. Now you will say NFS and Windows? NFS is used in many cases as main storage for VMware and the VM hard disks are written as vmdk files to it.
Originally the statement was created at a time when some non matured NFS storage had some reliability issues under VMware.
Over time Microsoft kept this NFS non supported statement in their documentation:
“All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2016 doesn’t support the use of network attached storage (NAS) volumes, other than in the SMB 3.0 scenario outlined later in this topic. Also, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.”
There is a deep discussion going on in the industry why this is case technically. As a lot of customers run their Exchange systems on NFS based VMware Datastores without a problem in the last years, the question is if the statement is still there for technical reasons.
At least from my field experience as backup architect I can say that there is a specific thing to look for when it comes to reliable VMware Backups when Exchange run on NFS. VM Snapshot commit processes on NFS datastores can lead into higher VM stun times and DAG failovers are more likely. Please check below the tips arround HotAdd processing and Latency optmizations. It looks like that specifically Hyperconverged Systems tend to run into issues there and the vendor blame the backup vendors to fix. You can easily do a test to sort this out, by creating just VMware Snapshots manually and release it later. Follow all the recommendations below and create a manual VMware Snapshot at the production exchange (load) and release it after 15 min. There should not be any fail over in the DAG cluster.
If you want to dig deeper into the NFS unsupported discussion you can read the following:
Please check out this article, it describe the main idea why NFS Datastores for Exchange loads are not an good idea. (Teasers: Exchange Error -1018 (JET_errReadVerifyFailure); NFS can abort transactions only on a best effort base and can cause corruptions in the database.)
However Microsoft added neccessary functionallity to SMB3.0, so in case you run Exchange on Hyper-V Win2012/2012R2 you can use SMB3.0 shared storage on Windows File Server 2012/2012R2. You have to place Exchange data into VHD/VHDX then.
There is lately a discussion that VMware virtualized SCSI driver will abstract the abort transaction and answers to the request virtually.
Check out Eric_W comments on the following website: https://social.technet.microsoft.com/Forums/en-US/c8b4a605-3083-4d0f-b3aa-62ea57cc6d43/support-for-exchange-databases-running-within-vmdks-on-nfs-datastores?forum=exchangesvrgeneral
He describes why this is not a final answer for NFS Storage under VMware and Exchange Databases. Exchange send abort commands to the storage and block storage follow those commands. Within NFS this command can ignore the command. So the discussed solution is that the VMware virtual SCSI driver emulate the abort command and tell the application that it was aborted. But the stage on the NFS disk side is potentially not reflecting this (corruption). As Exchange recovery process relay on the physical correct data this is problematic and the corruption detection process will report a Jet Error 1018. If you already ran into this error, check the following: http://www.exchangerecover.com/blog/exchange-server-error-messages.html
3. VMware+Exchange Design + Background informations:
Please check there the links and the whitepapers on the bottom of the page. The whitepapers containing outstanding background information for your exchange platform design.
There are also 2 interessting Webinars from VMware
Universal Tips for VMware and Hyper-V
– Do not go higher than 2-to-1 in virtual CPU to physical CPU Core ratio. Microsoft strongly recommend a ratio of 1-to-1 on the host where the Exchange Server runs.
– Do not use Hyper-Threading at the Exchange sizing. Calculate with the physical cores.
– Exchange is not NUMA aware. So if you can, please size the Exchange VM in a way, that it can run within a single NUMA node. The idea is to not give Exchange more resources just because they are free and available. Let the Exchange Server run in a single NUMA node if the resources are enough for your size. This would allow Exchange to work with the direct memory access.
– Use thick provisioning for the disks for better performance but use thin provisioning for backup and restore optimization. Now you have to choose :-)
– Give the OS disk enough spare space and place it on a fast storage system as well (do not place it on a datastore with hundreds of other VM boot volumes.). This is a very critical an important step.
– Do not enable deduplication (not supported by Microsoft) or compression (not recommended by Microsoft) on your primary storage that runs the virtualized Exchange workload.
– Use virtualization infrasturcture anti-affinity rules to avoid that members of the same DAG run on the same virtualization host.
– Place DAG members on different storages to avoid issues when a whole storage system is not accessible.
– Exchange Server usually uses a lot of memory. Don´t forget to plan enough disk space for the Pagefiles. I saw so many disk designs that have forgotten that part.
– Do not use Dynamic Memory
– Reserve 2 physical Cores for the Host OS
– If you perform on Veeam Host Backups don´t forget to add or plan the needed resources for this on the host (RAM/CPU)
– Use VHDX whenever it is possible as it has up to 64TB volumes, improved sector alignment and more corruption resistance on power failures.
– Don´t forget to plan the space for the *.bin file (equal to memory size)
Check if you have the latest Exchange Updates installed. For Example:
Cumulative Update 3 for Exchange Server 2013
to address random backup problem with Event ID 2112 and 2180
4. More Veeam specific Exchange Tips and Tricks for DAG and mailbox role backups:
In general, if you use virtualization based backups for Exchange, there is a chance to hit one of 2 problems:
– Exchange DAG cluster failovers
– Exchange VSS timeouts (Exchange hard coded 20 second timeout between start of consistency processing and release of it).
If you design your Exchange environment in the below described way you will likely prevent those problems and you address the needed technical requirements/settings for optmized VMware DRS, Storage DRS and vMotion automatically.
Below the list of tips, you can find some insights regarding restore.
Exchange DAG uses a network based heartbeat between the DAG members. When VMware Snapshots are committed, the delta changes are written back to the original storage space. Over time VMware enhanced the process starting from a “Hold on and write a data chunks back and release the VM … repeat” to a process that is similar to Storage vMotion. It was finally implemented with the ESXi 6 U1 Februar 2016 patch levels. It depends as well on the sizes of the disks. If you want to read more about it, you can use the following 2 resources:
An aggressive DAG Cluster heartbeat (default settings) can detect one or more of that VM holds/stuns as a network outtake and may lead into a DAG cluster failover. Typically vMotion, Storage vMotion or manual SnapShot releases lead then into the same issue.
These failovers are likely transparent to the Exchange users but can cause 2 problematic situations:
For DAG clusters there are 2 ways in case of Job design:
– Repository: Per Job chain with all DAG members in same job.
– One DAG member per Job
The idea of the first one is to use Veeam Inline Deduplication to reduce data amount significant (as the DAG memebers have a lot of replicated data). But in case you run into Snapshot commit issues with Cluster failovers, this can be negative as it is likely that shortly after a cluster failover the next snapshot commit happens on the other member. The result would be double cluster failover within short time. In theory this should not be a problem, but I saw customers struggling with it because the index service consume then 100% of the CPU for a long time and User connections timed out. If you want to be 100% on save side or use per VM chain Repository setting anyway, create one Job per DAG member (or site) and schedule the Jobs at different times within the night.
To address all the above described things, check out the following tips and tricks:
Tips for preventing Exchange DAG cluster failovers:
In general you need to prevent big snapshot files. This can be achieved by:
Increase the DAG heartbeat time to avoid cluster fail over (no reboot or service restart needed, they are online after you press enter).
On a command line (with admin rights)
cluster /prop SameSubnetThreshold=20:DWORD
cluster /prop SameSubnetDelay=2000:DWORD
cluster /prop CrossSubnetThreshold=20:DWORD
cluster /prop CrossSubnetDelay=4000:DWORD
cluster /prop RouteHistoryLength=40:DWORD
RouteHistoryLength need to be double the amount of CrossSubnetTrashhold.
You can check the settings with:
After changing those settings, perform an manual Snapshot at VMware Client and release it after 3 seconds. If a DAG failover is performed, please check tips below and work with VMware Support till this process works without exchange failover. After that you can work on the Veeam side by optimizing backup infrastructure.
If you face [VSS_WS_FAILED_AT_BACKUP_COMPLETE]. Error code: [0x800423f3] and tip a) didn´t helped, check for dismounted databases and mount or delete them.
Do not use vRDM. Sometimes architect think that the last bit of performance enhancement is really needed but forget that there are several downsides of that decisions. Specifically if the VM run in Snapshot mode all writes will go to snapshot files that are usually placed next to the vmx file. You can change the folder by changing the work dir to another storage area, but overall all disks will have the snapshot file in the same folder on same storage. You will see a write performance reduction during backup (or during manual created snapshots) and snapshot commit is hell when fast IOs are neeed with low latency. => Use vmdks as the snapshot files are placed in the same storage area. If you really think that this performance difference is huge (beside that VMware had shown it is not the case 8 years ago) add 1-2 more spindles or a SSD to cover that performance gap. Your system would work much more stable and you get all the typical VMware benefits.
Use new Veeam Storage Snapshot Feature (HPE StoreVirtual including VSA, HPE 3PAR StoreServ, NetApp ONTAP, EMC VNX(e), Cisco HyperFley) if you can (after Veeam v7 release) => Reduces Snapshot Lifetime to some seconds => No load and problems at commit because of less data. (This option can be counterproductive if you experience the 20 sec VSS timeout). Veeams Cisco HyperFlex integration avoid usage of VMware VM Snapshots at all.
Use at minimum VMware vSphere 5.0 because of changes in the snapshot places and Background things. vSphere 6u1 lates patch level (Jan2016 upated) completely changed the way Snapshots are committed. I highly recommend to update to it if you have any problems round about Snapshot commit.
To reduce Snapshot commit time (and to reduce data in the snapshot), try to avoid any changes at the backup time window (User, Background processes, Antivirus, ….). Also try to avoid that on all LUNs on the storage System itself (faster writes at snapshot commit).
If you cannot avoid many changes on block level at your backup window, use (Forward) Incremental with or without synthetic fulls. Reverse Incremental will take a bit longer than the other backup methods as they perform 3x more IOs at backup target within the snapshot lifetime. This lead to longer snapshot lifetime and at the end the Snapshot removal process has to handle more data.
To reduce backup time window and snapshot lifetime, use Direct SAN or Direct NFS Mode. If you can not use one of them, use NBD-Network mode for Exchange backups. Avoid in any way HotAdd processing with NFS Datastores (which are not supported by Exchange in any way either). Keep an eye on Proxies that run on AutoDetect transport mode if they use the correct mode!
Disable VDDK Logging for Direct SAN Mode if your backups themselves run stable (ask Veeam support for the registry key and consequences). This tip is important if you have more than 10 LUNs connected to the Veeam Proxy.
As well very important is that you install and configure the correct multipath driver for your storage on the Proxy. Run the storage with host profile VMware even if the Veeam Proxy is a Windows System. (In no way use Windows Failover Cluster host profile at the Host access setting on the storage system)
To reduce Snapshot lifetime and reduce amount of data to snapshot commit, use new Veeam parallel processing with enough resources to backup all of your disks at the same time (after v7 release). => Potentially add more Proxy resources. This tip is somehow outdated, as parallel processing is now enabled by default.
Use actual VMware Versions as the newest VMware VADP/Storage API version are then available and the SnapShot commit process will get latest optimizations.
Use actual Veeam Versions (newer VDDK integration).
Install latest patches/updates for VMware ESXi/vCenter and Veeam.
Still problems: Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage !!!). It is very important to not place the OS disk on a datastore with hundreds of other OS boot disks.
Keep an eye on the Storage Latency. On average it should not be higher than 30ms for all disks (including OS) and maximum latency should not go above 40ms. Original latency numbers from Microsoft are 20ms and max. 50ms, but specifically the max latency should not go above 40ms BEFORE you perform a backup and process Snapshots.
Less VMware VM disks can help to reduce snapshot commit time. By default even with no delta change data each disk will slow down snapshot commit. This time was reduced by 4x with ESXi6U1Feb2016 patch level.
In a worst case scenario and no other tips help, you can check the following VM setting. This an undocumented VMware VM setting and you have to check with VMware the support statement. This was a tip from one of my customers with 13TB+ Exchange environment, who had a long run with VMware Support.
snapshot.maxConsolidateTime = “1” (in seconds) (again do this only together with VMware support). This setting was used with ESXi 4 and I think at least for vSphere 6 it is not usable as the SnapShot process was changed.
If you have problems with cluster failover at Backup, one option is to backup DAG member(s) that hold only inactive databases (no cluster failover because of no active databases). The Logfile Truncation will be replicated by Exchange in whole DAG in that case. This give you also the option to restart the server or services and Exchange process VSS consistency more faster afterwards. If you restart the services, take care that you wait long enough afterwards that also the VSS Exchange writers come up again, before you backup. I saw some customers restarting VSS framework and immediately had performed a successfull backup as no VSS writers where registered at that point in time.
If you add an additional DAG member server for this, you have to check with you Exchange Architect the failover planning, because you change the member count for quorum failover selection. (e.g. you have 2 Exchange DAG members on different datacenters and a whiteness disk on datacenter 3. You add an additional Exchange Server on one side, the failover is affected because of different server count at the one of the datacenters.)
If you do so please monitor the Logfile Truncation replication processes in windows ecent log. Check that Windows Logs\ MSExchangeRepl Event 2046 occurs (Backup is happening for Database XXX) and there will be as well other messages that state that replication of log truncation happened. If you perform backup of an active Database there are other log messages for replication:
Applications and Services Logs \ Microsoft \ Exchange HighAvailability \ TruncationDebug
• Event 224 – The replication service decides which logs to truncate
• Event 225 – No logs will be truncated (either not enough logs or Circular Logging is enabled )
• Event 299 – The replication service truncates the logs ( or will tell you that there is no minimum amount of logs for truncation)
Not all Logfiles are deleted in a DAG at successfull Logfile Runctaion.
Please read: https://blogs.technet.microsoft.com/timmcmic/2012/03/12/exchange-2010-log-truncation-and-checkpoint-at-log-creation-in-a-database-availability-group/
Install MS recommended fixes to avoid cluster failover problems.
Keep in mind that there are hotfixes that are recommended from Microsoft, but they are sometimes not in Windows update included!!!
Windows 2012 R2
Delete all existing VM Snapshots before you start. Existing SnapShot can slow I/O at Storage processes like Snapshot commit. As well existing snapshots will make it impossible to enable VMware Change Block Tracking. This could lead into 100% data read at any backup(Snap and Scan Backup), which cause longer Snapshot lifetimes with longer snapShot commit phases.
Maybe not directly related but some of the customer reported that they had high CPU/RAM usage and where able to fix logfile truncation problems by adding more CPU/RAM resources to the VM.
Use newest avilable VMware Tools version within the VMs. This is NOT optional.
Tips for preventing VSS (timeout) problems:
If you perform an VSS based consistency on an exchange server, hard coded 20 second timeout release the Exchange VSS writer state automatically if the consistency state are hold longer than these 20 seconds. The result is that you cannot perform consistent backups.
In detail, you have to perform Exchange VSS Writer consistency, VM snapshot and Exchange VSS writer release in these from Microsoft hard coded 20 seconds.
With newer Veeam Backup & Replication versions, Veeam will detect this automatically and perform another kind of VSS snapshot to avoid this completely. See tip x)
So if you have another backup application and run into the timeout:
If you see Exchange VSS Timeout EventLog 1296 => Change Log setting =>
Set-StorageGroup -Identity “<yourstoragegroup>” -CircularLoggingEnabled $false
Use at least Veeam Backup & Replication v8. If standard VSS processing of a Microsoft Exchange Server times out, the job will retry processing using persistent in-guest VSS snapshot, which should prevent VSS processing timeouts commonly observed with Microsoft Exchange 2010. Persistent snapshots are removed from the production VMs once backups are finished. All VM, file-level and item-level restore options have been enhanced to detect and properly handle restores from Exchange backups created with the help of persistent VSS snapshots.
In many cases Exchange can perform consistency more faster if you add more CPU/Memory to the VM. Based on customer feedback this solved many of the VSS timeout problems.
Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage as well !!!)
The worst thing you can do is to place the OS disk on a datastore with hundreds of other boot vmdk volumes on a Raid5/Raid6 storage.
Use at minimum VMware vSphere 5.0 because of changes in the snapshot creation area.
Use actual VMware Versions (newest VADP/VDDK Kits with a lot of updates in it) and actual Veeam Versions (newer VDDK Integration). And install actual ESXi/vCenter patches! => to perform snapshot processing faster
Important one on VMware side: Less VM disks will reduce snapshot creation time. Check how long it take to start a snapshot of the VM in VMware vSphere (web) client. Think about that you need to perform Exchange VSS writer consistency + VM snapshot in 20 seconds.
To optimize snapshot creation time:
Check your vcenter load and optimize it (or use direct ESX(i) Connections for Veeam VM selection, so that the snapshot creation took less time.)
Check your health an configuration of Exchange itself. I saw some installations where different problems ended up with a high cpu utilization at indexing service. This prevented VSS to work correct. Check also all other mail transport-cache settings. Sometimes the Transport Service cache, replicate shadows of the mails over and over again and nobody commit them. This is typcially the case if firewalls are not set up correctly. Sometimes only SMTP is allowed but not the Transport Service windows service interaction.
Veeam specific: Veeam performs VSS processing over the network. Check with Veeam UserGuide TCP Port Matrix that B&R Server can perform Veeam Guest Processing over the network (open Firewall Ports).
If this is not possible Veeam failback (after a timeout) to network-less VMware Tools VIX communication channel (Veeam own) In-Guest processing. If you use network-less In-Guest processing, change the veeam registry key, so that VIX based processing is performed before network In Guest processing. => No wasted time because of waiting for timeout.
However network based In Guest processing is performed faster and I recommend it.
One of the Forum member report VSS problems when a PublicFolder database is present on the server. If the above mentioned tips did not help, temporarily disable your PublicFolder DB or replicate (DAG) it to another server. If the backup then runs fine, check with your Exchange Architect the PublicFolder DB configuration. Another option can be to upgrade to Exchange 2013 because no special PublicFolder DB needed (PublicFolder are covered by normal mailbox DBs).
Delete all existing VM Snapshots before you start. Existing SnapShot can slow down SnapShot processing.
Tips for restore planning with Veeam:
It doesn´t matter if the VM that you had backed up holds active or inactive DAG enabled databases for Veeam Explorer for Exchange based restore. Enable InGuest processing of Veeam so that Veeam can interact with the OS and Exchange to bring it into a consistent state. As well it allows Veeam to set some restore awareness settings, so that the OS and Exchange detect/knows when a restore is performed.
– Instant VM Recovery/VM Restore/QuickRollback. OS will boot and because of the restore awareness settings Exchange will perform automatic recovery steps at the DB. You can find this step documented in the Event Log. Depending of the Exchange Version it can have another EventID. For Example Event ID 9618; Event Source MSExchangeIS; General Exchange VSS Writer (instance GUID) has processed post-restore events successfully.
To enable this automatic step, you need to boot the VM at restore with network.
– Veeam Explorer for Exchange based Single Iteam (Mail/Calendar/…) Restore is as well compatible with inactive and active Databases.
Veeams Explorer for Exchange will be started within the Veeam console if you want to restore a single item like mails, calendar entries,… . It will load the actual ese.dll from the Exchange Backup and access the Exchange Database Files directly out of the backup. This way Veeam is automatically compatible with all Exchange Updates and fixes even if there are changes within the DB structure (ese.dll abstract this to Veeam).
With this process you have to think about 2 things at the design:
– ese.dll can load (by the way Veeam uses the dll) “only” 64 databases. database number 65 will not be loaded. But you can unload one or more of the other 64 database and add the others manually.
– Veeam need a specific time per database for the mounting. The time depends on the backup target storage that you use. For example if it will take 10 seconds per database, it will take 1 Minute to mount 6 databases, but it will take 10 minutes to mount 60 databases. So a kind of best practices is to keep the database count at a reasonable level. A workaround can be to start the Win File Level Recovery and to start the Veeam Explorer for Exchange from there and select only the needed databases manually to shorten up the restore process.
If you “only” have a crash consistent backup or snapshot, you can use the Veeam Explorer for Exchange as well. Start the Veeam Win File Level Recovery Wizard and from there the Veeam Explorer for Exchange. As Veeam didn´t had the chance to collect the database places at backup (crash consistent) you can add each database and it´s logfile manually.
Veeam support single object restore of Public Folders as well. Depending on the version and settings you maybe have to export data to a pst file with the Veeam Explorer for Exchange wizard if you can not send the data back to the public folder database/mailbox. The PST data can be mounted to a Outlook and you can copy and paste back the data if needed. If you face an empty Public Folder with Veeam Explorer for Exchange, please contact Veeam support there is a hotfix available.
If you want to use Export to PST, Veeam needs an Outlook 64bit on the server/workstation which run the Veeam Explorer for Exchange.
PST Explort places a documentation in the root of the pst that is compatible with the demand of court and lawyers. A restore protocoll which document all restores can be found at the Veeam UI at the “History-Restore” part.
If you are not allowed to access mails directly at restore, you can use the Veeam Enterprise Manager and it´s iteam restore possibilities to restore only changed/deleted files from a specified time period (without to get access or see the mails themself). This is for example sometime needed to be inline with the law and employee council (Betriebsrat).
Veeam Explorer for Exchange uses Exchange Web Services (EWS) for restore. Please check that EWS is working correctly and the Veeam Explorer can acess a CAS Server by TCP 443. DNS is in most cases needed. As the Explorer write directly into the mailbox of the user, the at restore defined user needs to have write access to the mailbox. You can define the restore user that write to the mailbox at restore. It can be the original user, and admin user or a user that you give temporary access to the mailbox for restore.
Veeam do not use any recovery database processes. So you can save this space on the Exchange Server. As these process isn´t used for any restore you save a lot of time at restore as well and reduce complexity dramatically. On the other hand Veeam can not restore a single database compleatly with the wizard. You have to create a new database with the user mailboxes and let the Veeam Explorer for Exchange restore the mailbox data. This protects you from any database corruption that is maybe within the backup but is not detected when you do a legacy restore of the database. However if you want to perform a database restore, you can use the Veeam file restore wizard to restore all needed database and logfiles and perform the other needed steps manually according to the Microsoft recovery steps: https://technet.microsoft.com/en-us/library/dd876954(v=exchg.150).aspx There are as well a lot of more illustrated recovery examples available at the internet. Just google for them.
For Veeam restore it doesn´t matter if you have one DB by vmdk or multiple. If you place the Exchange Server on VMware, you shold keep the VM disk count low to streamline snapshot commit processes that are used at backup (see above). In most cases you should keep the count below 8-10 and use ESXi6 U1 with Feb 2016 patch level that reduces overhead at snapshot commit dramatically.
The data amount per DB do not matter for Veeam, beside the data amount that you have selected for restore. The mount of the database at Veeam Explorer for Exchange is not really affected by the database size.
5. More Veeam specific Exchange Tips and Tricks for transport role backups
Shadow redundancy can help to protect your undelivered messages by replicating them immediately to a second transport role. Even this is more important with Exchange 2013 because DAG members are transport role servers as well.
Check with your Exchange Architect, that you backup at least one Transport server that holds an original or a shadow of your undelivered mails.
If you backup only one Exchange 2013 DAG member that holds only offline databases (or Exchange 2010 DAG members that has transport role enabled), check the requirements section of this article to analyse if you backup everything what you need.
You also need to check that your backup software works application aware (e.g. Veeam Guest processing) and that in case of a restore the transport role detect the restore scenario as well and do not deliver already sent messages a second time. This is done by VSS processing at restore. VMware Tools quiescence at backup cannot achieve this by design.
A common configuration is to use a full blown Exchange Transport Role Server, instead of an Exchange Edge as an SMTP Gateway to the internet. In many cases, this is done to save some public IP addresses for running CAS Server on the internet on the same server.
You will likely place this Server in your DMZ and open only needed port 25 for mail transfer. This isn´t enough if you have enabled shadow redundancy because the shadows are not being committed anymore. Disabling shadow redundancy only because of this isn´t acceptable as they potentially lead to some lost mails if you lose one of the transport servers. Think about you lose one Server with and mail that has “the opportunity of your (your companies) life” mail in it.
Backup of IP Less DAG with Exchange 2013 SP1:
There is an cluster operation mode for exchange mailbox role where you do not need any Cluster IP address and name. You need to check if your backup software can handle this. Veeam Backup & Replication uses the VSS Exchange writer to bring the Information Store in a consistent state. No change needed IP Less DAG. The restore is done by EWS and as well not affected. Good news for Veeam users.
6. More Veeam specific Exchange Tips and Tricks for CAS role backups.
You had 2 options with Exchange 2010 to achieve CAS redundancy.
1) External Layer 7 load balancer
Two or more CAS Server are placed behind an (redundant) load balancer that hold a cluster IP address. The Load balancer is aware if your CAS Server answers on a service port and how long the answer time is. This is the recommended way of doing CAS redundancy and no special things are needed for CAS backup.
2) Windows Network Load Balancing (NLB).
Windows is used to create a cluster IP address and all servers can answer to requests. Main problem here is that it is not service aware. So if your server is up and answering in the network but your Exchange CAS Service is down, your users do not get answers on a random base. => Invest in External Load Balancers!
If you want to try Windows NLB cluster way, you need to keep an eye on the configuration when using Veeam for backup. Veeam Guest Processing will read out VMware Tools IP address list and will use the first found IP address in the list and contact the VM. If the first IP address is the cluster address the guest processing might be connect to another server because NLB cluster route it to another VM.
So you can do one of the following:
1) As VMware Tools list IPs ascending, the NLB Cluster IP needs to be higher than the normal addresses.
2) If you cannot change the IP addresses (Public IPs for example) you can ask Veeam support for a hotfix that “Invert IP order guest processing Inverseipprder.zip”
3) Another option is to use VIX processing instead of direct network connection for Veeam Guest processing. The easiest way to do this is by blocking all communication to this VM over the network (Windows Firewall/your company firewall/…). Let port 443 open for Veeam explorer for Exchange restores.
In general you can get some background information about this here:
7. Windows Storage Spaces are not supported within a VM
I saw lately 2 customers running Exchange databases on Storage Spaces within a VM. They did this because of 2TB Storage system limitation.
It is not supported based on a lately updated articel from Microsoft:
” Storage layers that abstract the physical disks are not compatible with Storage Spaces. This includes VHDs and pass-through disks in a virtual machine, and storage subsystems that layer a RAID implementation on top of the physical disks. iSCSI and Fibre Channel controllers are not supported by Storage Spaces.”
Veeam Explorer for Exchange maps the VM volumes out of the backup to a windows system. Storage Spaces volumes are not imported automatically from windows, so the Veeam Explorer can not access the filesystem which hold the Exchange databases.
When you start a Veeam Window File Level Recovery Wizard for a VM that contains storage spaces volumes, you can see only System Volume Information folder on the drive.
Workaround if you really need mails from such a sceanrio:
Boot the VM with SureBackup in a Veeam “OnDemand” Virtual Lab. Stop the exchange service on the Virtual Lab Exchange Server and use Veeam Explorer for Exchange to access the databases for restore.
If there is no way to migrate the data to a single big volume or to spread the User Mailboxes accross individual small databases that fits the volume size needs, you can use Dynamic Disks to expand the drives. At least this is supported from Veeam side for Exchange restores. Based on Microsoft https://technet.microsoft.com/en-us/library/ee832792.aspx this is as well a supported way from Microsoft for Exchange. “Dynamic disk: A disk initialized for dynamic storage is called a dynamic disk. A dynamic disk contains dynamic volumes, such as simple volumes, spanned volumes, striped volumes, mirrored volumes, and RAID-5 volumes.”
8. Additional Tips:
Use the Veeam Forums http://forums.veeam.com and search for specific Exchange Topics, there you can find additional tips and feedback. Keep in mind that Veeam Forum is not a official support forum. If you need urgent help, please open a support ticket http://www.veeam.com/support . Testing and Proof of concept environments have support with lower priority.
Do you have feedback?
Was one of the tips helpful?
Please leave a comment.
All the best to you and success… Andy
VMware Backup from NFS (File) Datastores:
For most common VMs (90%) I would use Veeams Direct Storage (new Veeam Direct NFS) backup mode for backup and restore. Direct NFS is the fastest restore method within Veeam as it is written from scratch by Veeam and do not leverage the VMware VDDK kit.
Veeam Backup & Replication Proxy Mode Autodetection process works like this:
I did a new updated Exchange Backup Video for Veeam in German.
There are a lot of tips and tricks for general Exchange Backup at VMware, but also cover Veeam Exchange Backup and Restore.
Specific Exchange DAG settings are also discussed.
I hope to get a lot of feedback from you ;-)
got an email today from IBM – CH with 2 interessting numbers.
They say that each day 888PB will be created.
Also 90% of all data was produced in the last 2 years.
Nice Infographic about Veeam Backup & Replication v7
Funny backward compatibility to DOS … Try to create a folder with name “con”
kudos to my colleague Timothy Dewin who wrote an excellent Veeam Backup & Replication Restore Point Simulator.
Keep in mind that this is not official and has not support or garantie that it work correct.
Also deduplication and compression is dependent to the data and is from customer to customer different.
The Default values are OK from my Feeling. If you have Domino DBs this is a bit too low because of Domino own compression.
And don´t forget to select Reverse Incremental if you want max space saving.
Here you can find the tool.
Many DR scenatios didn´t reflect the need of the companies. Sometimes because of budget problems, sometimes of other things.
Why you want to bring data to another site?
HA? DataProtection? DataRecovery?
If you look at syncron mirrors. This is a typical HA szenario. If you go active-active there it can help to bring the servers back online as fast as of a boot. (VMware HA or othe cluster/failover solutions at legacy systems). Because this is an expensive one for legacy systems, this leeding to a scenario which VMs are ways better protected (HA) than my Tier 1 legacy systems. This and the advantages for maintenance (vmotion), power and cooling savings brings customer to a point that more and more Tier 1 applications are placed on virtualization.
Products that can be helpful here are IBM SVC, Datacore, Netapp, others.
Why this szenario has nothing to do with DP or DR.
You replicate only the disk data, so applications and DBs are not in a consistent state. Also the Applications and DBs are not in a Application Restore aware state. (see below)
Software errors are mirrored as well and if there are bugs in the code of the solution both sides are affected.
If you look at the storage and storage virtualization systems and their bigger and bigger fail domains you need a backup solution that fits your datarecovery needs.
Many customers looking because of the big fail domain at Replication solutions and looking for storage replication that can store more then 1 restore point (replicated snapshots).
Here you need to regard that your Servers/DBs/Applications are in the following states that you have no problems in case of restore/recovery. This is pretty the same demand for BackupSoftware.
a) Consistency (Application/OS/DBs). Basically before you do a replication/snapshot all RAM caches needs to be written to disk and no open write commands of the filesystem are there.
b) Application awareness: You have to set some settings in the OS/Application/DBs that after next boot(after restore/recovery) they jump into a mode that they needed in that secenario to avoid problems. A typical problem that is a good example for this is if you start active directory servers without this from different snapshots/recovery points you end up with an inconsistent and not supported active directory database.
So in most cases you need some software that can do this in addition to the storage snapshot replication base. IBM Flash Copy Manager or Netapp SnapManager for example. Or if you do this on the Software side Veeam (Virtualization) is a good example for this.
If you look at the size of your fail domains and your demand to bring back a lot of servers in a fail szenario. You need tools that can do this with an easy to use solution. Because of budget concerns many go only here with their backup software.
Most Backupsoftware can help to restore a Server in a timewindow you need (SLA) but if more than one server are affected this leeding into the situation where normal SLA are exceeded. There are some Backup Software that can help with this and can start systems directly out of the backup, start the OS and Applications and user can work, while later the data is transfered back to the repaired storage system. Veeam do this for example with VMs since 2010, but there are other solutions that can do this, for example NetBackup has announced that they will have that sort of recovery for VMs. Veeam holds the patents for this Instant VM Recovery.
Because this solutions uses the backup environment ressources to bring up the systems this can only be done with a limited number of systems, depending on the backup environment (20-100 VMs). So the idea was to add software based replication (Veeam for example) that has no direct interaction with storage System (storage fails can not harm this system) to replicate most critical systems to a separated datacenter over IP WAN links. There you can start (application aware) your core systems and you can recover over the next time your not so critical systems.
One cool thing I want to add from Veeam side. You are able to automatically test your Replicas (with v7 or newer) or Backups (v5 or newer) if they are able to restore your workloads. This scheduled restore checks test the OS boot, Network Connection and application Response.This include problems that are based on source side server. Simple example is a corrupt windows boot file that you do not detect in production. All backup solutions checks only the data with checksums > if same = backup/replication successfull. And you fail in case of restore. So this can be helpful to detect problems before you run into a failover scenario.
This described scenario reflect most budgets and can dramatically increase your operating safety.
Examples for a small szenario:
2 ESX Hosts with shared storage (redundant controllers)
Third ESX host on separate fire section or hosted datacenter. Which hold allsVMs as replicas.
Backup on the first side for fast restore and application/file restore.
Backup and recovery cheks on the second side of the replicas will help to prevent you from trouble in case you need your DR.
Implementation of automatic Restore Checks with SureBackup and SureReplica.
HW: Small Budget (Standard solution)
SW: Hyper-V or VMware Essentials + Veeam Essentials
Powerfull solution I think
Example for midsize – enterprise szenario:
Active-Active mirrored Storage (vdisk mirroring) for example with IBM SVC splitted over to datacenter on two fire sections. Third site with SVC Quorum.
Another datacenter many miles away with another Vmware environment which you use with Veeam replication of most critical systems.
Backup to third site (SVC Quorum) with Veeam (Fast Restore of Files/Objects/Servers).
Implementation of automatic Restore Checks with SureBackup and SureReplica.
What do you think? Yes it is very virtualization related but you get more operation safety than in legacy environments. Why not virtualize your biggest DBs on a 1:1 ratio and profit from this DR scenario. If you are concerned about VMware Snapshot commit szenarios or 2TB volume size limit, have a look at Hyper-V 3. The main idea behind this described szenario works there as well.
Excellent article about VMFS locking. Everything what you need to know what VMFS on 5.0 do now and how VAAI is using it. Please check:
Check out this old ESX3.5 article about performance differences between vRDM, pRDM and VMDK. You can see that even in these good old ESX3.5 days there was no significant performance gap. As actual VMware Volumes do not have the 2TB limitation anymore, there is no real blocker to use VMDK (and vRDM).
If you use vRDM don´t forget to reserve some space next to the vmx file for snapshots (e.g. Backup-Snapshot-helper) :
found an interessting article about raid level and impact on IOps
http://www.symantec.com/connect/articles/getting-hang-iops Please forgive me the link to the yellow ones ;o)