Sorting bits into bytes...

Exploring VMware Performance Monitoring and vSAN Optimization

## Introduction
Around 10 to 15 years ago, I was tasked with finding a way to monitor the performance of the VMware infrastructure. Back then, I turned to a tool called vCOPs, an abbreviation for vCenter Operations Manager. Despite its rebranding to vRealize Operations Manager and, most recently, Aria Operations, its essence remains rooted in operations management. Initially, my focus was on measuring performance through CPU or memory utilization – the busier, the better. However, as I discovered, monitoring performance extends beyond these metrics, prompting me to delve deeper into the intricacies of vSAN optimization.

## Performance Monitoring
In those early days, my performance indicators centered around CPU, memory, and latency. Over time, I realized that effective performance monitoring involves more than just these metrics. The online resources on this topic are abundant, but my attention shifted to explore a different aspect: optimizing vSAN performance.

## vSAN Optimization
Fast forward to the present, I found myself grappling with how to extract the best performance from vSAN. While the hardware was in place and the design approved, my focus turned to storage policies as the means to achieve this optimization. After considering cluster size and overall requirements, I settled on two vSAN policies: RAID 1 or RAID 5. This blog aims to simplify the discussion, avoiding technicalities such as the number of stripes or read cache reservation. Instead, I focus on the fundamental choice between mirroring (RAID 1) and erasure coding (RAID 5). These options involve trade-offs – RAID 1 writes faster due to fewer operations, while RAID 5 reads faster due to more copies.

## Introducing the Super Metric
In the pursuit of the optimal storage policy, I formulated a Super Metric, a mathematical formula designed to dynamically assess the best fit. This metric takes into account specific conditions related to disk space usage, write speed, and read IO percentage. The goal is to guide the selection of an appropriate storage policy based on the unique needs of virtual machines.

count({This Resource: diskspace|used, depth=1, where= ($value >= 1000)})
count({This Resource: virtualDisk:Aggregate of all instances|numberWriteAveraged_average, depth=1, where= ($value <= 250)})
count({This Resource: Super Metrics|vDisk Read Percentage, depth=1, where= ($value >= 30)})
? 5: 1

## Tailoring Storage Policies to Customer Needs
It’s worth noting that these conditions are adaptable to specific customer requirements. For instance, if a customer is concerned with storage savings for vDisks exceeding 1000GB, and if read IO surpasses 30%, the Super Metric assigns a value of 5; otherwise, it’s 1.

## Conclusion and Automation Teaser
Even with abundant vSAN disk space, the current focus is on potential storage savings for vDisks with substantial space usage. For VMs with low IOPS, a different storage policy may not yield significant benefits. As I manage well over 1000 VMs for my customer, manually tracking the best storage policy is time-consuming. Automation seems like the logical solution, and in my next blog post, I’ll share in detail how I automated this task. Stay tuned!

Leave a Reply