This article is intended for administrators wishing to leverage SvSAN as a storage repository for their backup target.
Any storage device can be used as a repository with applications such as Veeam Backup and Replication, and SvSAN is no different, with the value proposition being a synchronously mirrored storage.
We typically think of SvSAN as the storage we use for our production workloads, however it can also be attached, via iSCSI, directly to a Windows server (bare metal or virtual) to be used in conjunction with Veeam, as a backup repository.
Having a software-defined-storage platform such as SvSAN serve up some space for these backups brings benefits in terms of data resiliency.
The SvSAN storage will likely be mirrored and therefore 2 “copies” of this backup data, alongside potentially cache in memory and SSD:
as well as encrypted:
as well as being flexible enough that it can be tailored for use, and hardware selected for optimal performance vs. cost.
Having reliable storage from which to restore in the event of a catastrophe is as important as backing up in the first place. Gone are the days of offsite tape backups and rotating disks being taken home by staff, and in their place are live backup repositories with fully functional connectivity able to replace the core infrastructure either in full or in part with just a few clicks and with no physical movement of data.
With this in mind, what differences are there between these use cases, and what settings/features should be considered when troubleshooting problems or optimizing efficiency?
SvSAN was tested and validated as the storage using a backup solution with SvSAN as both the source and target of the investigation.
StorMagic recently applied for and passed the VeeamReady alliance partnership, and this consisted of running a suite of tests against SvSAN storage in a controlled way.
There were several tests from a simple backup of VMs to a full synthetic backup - comprising of reconstructed data from existing backups rather than re-copying data already in the backup repository – alongside a simple restore and a instant recovery – where the VM actually runs from the backup repository on the original host it was running on, testing network connectivity as well as the suitability of the backup storage itself to support these features in the real world.
Veeam Ready Test Setup
Veeam test infrastructure setup
2x HPE DL380 Gen9 servers
- Intel(R) Xeon(R) CPU E5-2699 v4 22C@2.20GHz
- 768GB RAM
- HPE Smart Array P440
--2x 1.2TB RAID1 – Vmware and StorMagic
--6x 1.2TB RAID10 – Source Datastore and Target Repository
2x1GbE management connection (10.10.80.0/24)
2x10GbE connection (direct attached) (192.168.111.0/24 & 192.168.222.0/24 respectively)
RAID and performance settings
The submitted results quoted throughout this document were run using a 6x 10K SAS drive RAID10 setup for both source and repository. Two other setups were tested; an initial RAID5 and also a 4x drive RAID10.
Initially the server was setup with 8 disks in RAID5, with two virtual disks provisioned from that, similar to the below documentation:
This leveraged a small ~250GB boot drive, hosting ESXi and the SvSAN VSA VM, and a ~7.5TB LUN assigned to each StorMagic VSA. After running the full set of results, both backup and restore performance numbers were not sufficient to pass.
The system was then reconfigured with two 4x drive RAID10 drives. One for ESXi/StorMagic, and the other as source and repository. This came closer, but still failed the synthetic backup creation test, which relies heavily on read and writes within the same physical disks.
Finally the servers were configured with a 2x drive RAID1 for ESXi/StorMagic and a 6x drive RAID10 for source/repository.
This table demonstrates that while read speeds are greatly increased as the number of drives increases, write IOPs are increased dramatically less for a RAID5 setup.
Striping data across multiple drives increases the locations where data in a block resides, and hence increases the speed at which they can be read.
Writing involves a calculation and parity bit to be generated, requiring a controller cache and write-back, and thus pure write speed, such as a backup, will fill the cache and performance slow significantly.
Moving first to a 4x drive RAID10 offered twice the write performance, and going to a 6x drive RAID10 increasing that a further 50% to a 3x write performance increase.
Results obtained for all three setups are included and labelled for comparison.
VMware ESXi 7.0.3
VMware vCSA 7.0.3
StorMagic SvSAN 6.3
Veeam Backup and Replication 11 (Windows Server 2022- 16 cores, 32GB RAM)
SvSAN Storage configuration
Server1: 6x 1.2TB 10K SAS RAID10 providing 3.27TB presented to, SvSAN via Raw Device Mapping, then shared as a simple target datastore to both hosts as “VM Store” VMFS Datastore
Server 1 will host all the Guest VMs included in the test, including the managing vCenter, and a Linux Jump VM created to access the system.
Server2: 6x 1.2TB 10K SAS RAID10 providing 3.27TB presented to, SvSAN via Raw Device Mapping, then shared as a simple target to the Veeam VM as “VeeamRepository – V:\”
Server 2 will host the Veeam Backup and Replication Windows VM, running on the SvSAN shared datastore from Server1.
VeeamRepository Target on svsan-b
This is just a simple target presented only to windows (note initiator added)
Target mounted in Windows iSCSI initiator
The Windows VM with a virtual NIC on the same vSwitch as the pair of VSAs for iSCSI was able to use a 10GbE connection to mount using a single path.
Disk initialised, and a new simple volume created, ReFS 64K allocation size
The Veaam test conditions stipulated an ReFS-formatted disk with a 64K allocation unit size as above. Veeam uses a 1MB before 2x compression block size, meaning each block is 512KB or less. Aligning this with the filesystem and in turn the storage subsystem can have a significant impact on write performance.
For example, a 64KB allocation unit file system as being used here means Veeam will require 8 IOs per 512KB block, (64 x 8) which is a 2x improvement over a 32KB allocation file system which will require 16 IOs for the same 512KB block (32 x 16).
Veeam file system settings
Once created and formatted, it can be selected as the Backup repository, in this case V:\ within Windows to use for testing.
Use this newly created volume as a backup repository
On Server 1, the storage is presented more traditionally as a simple target to both hosts using the same underlying disk configuration.
Simple target from svsan-a on Server 1
Datastore mounted on both hosts
Veeam Ready – Repository
The Veeam Ready partnership we have applied for involves testing the software storage solution ensuring it meets the standards required to be considered a “Veeam Ready – Repository”.
Veeam Ready alliance partnership categories
There are other categories as above, for more info, see here: https://www.veeam.com/alliance-partner-technical-programs.html?page=1
Part1- VM-Based Testing
Four test VMs are deployed and a standard backup job is created.
Backup test job of the four VMs
Populating tests VMs with data
Each of the four tests VMs are populated with random data by using the console of each machine as above, creating 100GB data, making each VM a total of just over 105GB, and total backup size of 421GB
Test 1- VM backup
Full backup run, test results as below:
6x disk RAID10
6x drive RAID10 backup Job statistics
The pass threshold was 35 minutes. In particular here I was looking for a consistent write rate, and to watch the bottleneck specified in the job. With bandwidth of 10GbE for the Veeam server to both read the source VM data and transfer that back to writes on the local target, the average throughput of 490MB/s is near the theoretical possible. Below are included test results for the other RAID configurations on this server.
RAID5 backup job statistics
4x drive RAID10 backup job statistics
Test 2- VM restore
Next was a full VM restore of all four of the test VMs. First was to create a backup chain of incremental delta backups from which the full restore would be based. Restoring from an incremental backup puts additional strain on the backup storage as it attempts to reconstruct the data using the initial full backup and then combining in the subsequent changes, and then replacing the original VM in its original location on the ESXi host.
incremental data creation
First each test VM needed an additional 10GB test data created, then a backup is run to capture that changed data, approx. 40GB, which is then repeated three more times for a total of 5 restore points in the repository.
Repository confirming 5 restore points per VM
Restore job set to original location
The results for this test were again set to 35 minutes to pass.
VM restore 6x drive RAID10
The previous results included below:
VM restore RAID5
The RAID5 results can be seen how much slower this process was, essentially copying four 140GB VMs over the network. The test took 42 minutes for the longest running VM.
VM restore 4x drive RAID10
Both RAID10 setups passed the test, with the 6x drive setup taking 21 minutes, and 4x drive 28 minutes.
Test 3- Instant VM Recovery
In this test, a further four test VMs were created, with no randomized data this time, and then all of the test VMs (the four with populated data, and these newly created four) are then instantly recovered by mounting the Veeam repository to the ESXi host and running them directly from the backup storage.
Instant VM recovery
Each VM is registered as a new VM alongside the original with a “_restored” suffix in its name.
Instant recovery job settings
Adding suffix to newly registered VMs
Each VM takes on the original VM’s UUID effectively replacing the original VM in the infrastructure. This is particularly useful if the storage of the production servers is offline, but the compute is up and running, allowing the Veeam Repository to effectively replace the storage in-place. This restore process is also reversible, after which the original VM can be powered back on again.
Instant VM recovery status
Once all 8 VMs are running from the backup storage as instant recovery VMs, a test is run on the console of one of the VMs to test the storage for latency. Results included below:
Instant VM Recovery 6x drive RAID10
Instant VM Recovery RAID5
Instant VM Recovery 4x drive RAID10
The initial RAID5 test results were not enough to pass, as the system was looking for maximum average latency of 40ms under 4k and 70ms under 64k. Only the 6x drive RAID10 passed this test. Further testing of the 4-drive RAID10 was not repeated, after the erroneous results for 64k were recorded above. Final submitted results were much better at around 30ms for both 4k and 64k.
Test 4- Synthetic Full Backup
This was the toughest test in the suite, that not only required some setup to get it running, adding a registry key to activate the feature, but also put the biggest strain on the infrastructure, and with a pass time of 60 minutes, was also the longest of all the tests completed.
In terms of data, the synthetic full backup is identical to a regular full backup. Synthetic full backup produces a VBK file that contains data from the whole VM. The difference between active and synthetic full backup lies in the way VM data is retrieved. When a synthetic full backup is performed, Veeam Backup & Replication does not retrieve VM data from the source datastore. Instead, it synthesizes a full backup from data already in the backup repository. Veeam Backup & Replication accesses the previous full backup file and a chain of subsequent incremental backup files on the backup repository, consolidates VM data from these files and writes consolidated data into a new full backup file. As a result, the created synthetic full backup file contains the same data it would have an active full backup was created. Due to the fact that the synthetic full backup synthesizes a new backup file from existing backups already stored in the repository, this workload has very different I/O characteristics than a typical active full backup.
Synthetic full backup 6x drive RAID10
Synthetic full backup RAID5
Synthetic full backup 4x drive RAID10
In this test, the only setup to pass was the 6x drive RAID10, where write performance was significantly higher across the backup job. Being able to consistently provide this performance alongside heavy reads to reconstruct the backup became a necessity.
Part 2- NAS-Based Testing
This second part of testing involved deploying another Veeam Ready test VM and populating it with 23GB in 1 million files, which is then presented on the network as an NFS share. Note that the 10Gb storage network was utilized to mounted the NFS share from the Veeam server to improve access and performance.
Test 1- Full Backup of NAS Share
NAS Backup 6x drive RAID10
NAS Backup RAID5
The two results differ greatest on this test, with the RAID5 setup not able to meet the test requirement and taking over twice as long as the RAID10 setup.
Next were three incremental updates to the NAS data, where the application just “touches” all 1 million files forcing them all to be checked and backed up incrementally. Results for both RAID10 and RAID5 included below for all three incremental runs
NAS incremental backup 1 - RAID10
NAS incremental backup 2 - RAID10
NAS incremental backup 3 – RAID10
NAS incremental backup 1 - RAID5
NAS incremental backup 2 - RAID5
NAS incremental backup 3 - RAID5
Test 2- NAS File Restore
This final test was a functional test to restore a file from the backup to the local C: drive on the Windows server. This completed in all testing scenarios.
NAS File Restore