Notes on QuickSort: when pivot choice matters

2024-08-03 · ~5 min read

QuickSort is one of those algorithms I thought I understood until I had to debug a slow implementation in production. The textbook version runs in O(n log n) on average, but the wrong pivot strategy can drop it to O(n²) on real-world data faster than you'd expect.

Here's the standard Lomuto partition in Python:

def partition(arr, low, high):
    pivot = arr[high]
    i = low - 1
    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i+1], arr[high] = arr[high], arr[i+1]
    return i + 1

The first thing I learned the hard way: always-pick-first (or always-pick-last) is bad for sorted-ish data. If you pass an already-sorted array — which happens way more often than you'd think (log files, timestamps, sorted DB rows) — each partition only removes one element. You end up with n levels of recursion instead of log n.

Median-of-three

Median-of-three handles most of these cases:

def median_of_three(arr, low, high):
    mid = (low + high) // 2
    if arr[low] > arr[mid]:
        arr[low], arr[mid] = arr[mid], arr[low]
    if arr[low] > arr[high]:
        arr[low], arr[high] = arr[high], arr[low]
    if arr[mid] > arr[high]:
        arr[mid], arr[high] = arr[high], arr[mid]
    arr[mid], arr[high-1] = arr[high-1], arr[mid]
    return arr[high-1]

You take the median of the first, middle, and last element as the pivot. It's not bulletproof — there are adversarial inputs that defeat even this — but for normal data it's a huge improvement.

What I'd actually ship

For production code, I'd just use the language's built-in sort. Python's Timsort and C++'s IntroSort both fall back to heapsort or insertion sort when recursion gets too deep, sidestepping the worst case entirely.

Still — implementing QuickSort once teaches you something about how recursion and partition choice interact. Worth the afternoon, even if you'll never ship the result.