Sunday, July 31, 2016

VSAN 6.2 Deduplication and Compression Ratios

One of the things which is generally discussed in a VSAN based discussion is the De-duplication technology and amount of savings it can help us achieve?
Well, like in many similar discussions around De-dupe in general, the response to the question is "It Depends"!!!
I know it may sound like a cliche...but that's a fact and to understand it better we need to understand "How De-duplication happens" a bit more in deep.

De-Duplication in VSAN

While De-duplication is not a new concept, the principle remains the same in VSAN as well.
VSAN 6.2 uses SHA-1 hashing algorithm and works with a 4K block for de-duplication. It happens right before the data is written/de-staged from the cache tier to the capacity tier.
VMware terms this as "Nearline De-Duplication" 

Using the hashing algorithm, VSAN creates a fingerprint for each data block which is written to the capacity disk. For any new incoming data block, its first compared to the existing fingerprints for a match. If found, the data is not written to the disk, rather a pointer is created to an existing finger-print. Otherwise, the unique data is written to the disks and a new fingerprint is created and published.

De-Duplication provides a significant amount of savings in terms of capacity, as we need to manage Unique data blocks only.

As off the blog publish date, De-Duplication in VSAN is only supported for All-Flash based arrays.

Image from: Virtual SAN 6.2 - Deduplication and Compression Deep Dive

But from sizing and planning perspective we need to realize how much De-duplication and Compression factor can we consider. Lets understand that in details.

De-Duplication and Compression Factor

The De-dupe and compression factor largely depends on the kind off workload running on VSAN.

In general, most of the workloads are suitable for VSAN if properly sized and planned, with a few exceptions. The suitability of workload depends on how the applications interact with the VSAN cache tier. The more the application is cache-friendly the better it can take advantage of VSAN.

Now to understand the efficiency factor in details, lets take an example of VDI workload hosted on VSAN.

A VDI machine is made of...
   a. OS + Application Tier 
   b. User Data Tier

In a general VDI setup you would create OS+App tier (boot disk) from a Golden Image, hence we can expect to achieve higher de-dupe ratios. It can be expected to be 4X.
Whereas for a User Data disk, users may have unique, unstructured data. Hence the de-duplication factor might not be very high. It can be expected to be 2X.

With that as consideration, lets see how we calculate the overall Raw disk capacity requirement.
For a VDI setup with 100 users, each with OS Disk=100 GB and User Disk=40 GB and FTT=1 RAID=5
** assuming Linked clone is NOT used.

Raw Storage Requirement for OS disk
100 GB * 100 users = 10 TB - Usable Capacity
10000*1.33 = 13.3 TB - Raw Capacity (with FTT =1 RAID 5)
13.3 TB / 4 = 3.3TB (with 4X De-duplication and Compression Ratio)

Raw Storage Requirement for User Data disk
40 GB * 100 users = 4 TB - Usable Capacity
4000*1.33 = 5.32 TB - Raw Capacity (with FTT =1 RAID 5)
5.32 TB / 2 = 2.7 TB (with 2X De-duplication and Compression Ratio)

so Total Raw Disk Capacity
3.3 + 2.7 = 6 TB for 100 VDIs

Please refer to below link for detail sizing scenarios

1 comment:

  1. We are urgently in need of kidney donors with the sum of $500,000.00 USD (3 crore) and Also In Foreign currency. Apply

    Now!,For more info Email:
    Call or whatsapp +91 994 531 7569