Difficulty: Medium, Asked-in: Google, Microsoft, Adobe, Goldman Sachs, Qualcomm
Quick sort is one of the fastest sorting algorithms that use divide and conquer approach of problem-solving. Here are some excellent reasons to learn quicksort :
Let’s start by exploring the pattern of sorted array.
From the above observation, we can try to think in a reverse way: Suppose we partition unsorted array around some input value (pivot) into two parts:
After this partition process, pivot will get placed at correct position in the sorted output. Now we have a situation similar to sorted array with one difference: left and right halves are still unsorted. If we observe further, both unsorted halves are smaller sub-problems of the original sorting problem, i.e., a sorting problem for smaller input sizes.
If we sort both halves recursively, whole array will get sorted because all values in the left side of pivot are less than pivot, pivot is at correct position, and all values in the right side of pivot are greater than pivot. Think!
The above approach is a divide and conquer approach of problem-solving : We divide sorting problem into two smaller subproblems using partition algorithm and solve each sub-problems recursively to get the final sorted array. We repeat same process of partition for smaller sub-arrays till we reach the base case.
Divide part : We divide problem into two smaller sub-problems by partitioning the array X[l…r] into two smaller subarrays around pivot. The partition process returns the pivotIndex i.e. index of pivot in the sorted array.
Conquer part : Now, we solve both sub-problems or sort both smaller arrays recursively. We should not worry about the solution of sub-problems, because recursion will do this work for us.
Combine part : This is a trivial case because after sorting both smaller arrays, whole array will get sorted. In other words, there is no need to write code for combine part.
Suppose, we use function quickSort(int X[], int l, int r) to sort the entire array where l and r are left and right ends. The initial function call will be quickSort(X, 0, n - 1).
Let’s define partition(X, l, r) function that divides array around pivot and returns pivotIndex. We will select pivot value inside partition function. A critical question to think: Why are we choosing pivot value inside the partition function?
pivotIndex = partition(X, l, r)
The base case is the smallest version of problem where recursion terminates. What would be the base case? If we find l ≥ r during recursive calls, sub-array would be either empty or have one value left.
void quickSort(int X[], int l, int r)
{
if (l < r)
{
int pivotIndex = partition(X, l, r)
quickSort(X, l, pivotIndex - 1)
quickSort(X, pivotIndex + 1, r)
}
}
How do we divide array around pivot so that values in the left part is less than pivot and values in the right part is greater than pivot? Can we do it in-place? Let’s think!
Suppose we pick rightmost value as a pivot i.e. pivot = X[r] and scan the array from left to right. Whenever we find some element less than pivot, we place it at starting part of array and move forward. Otherwise, we ignore that element and move forward. How do we implement it? Let’s think!
int partition (int X[], int l, int r)
{
int pivot = X[r]
int i = l - 1
for (int j = l; j < r; j = j + 1)
{
if (X[j] < pivot)
{
i = i + 1
swap (X[i], X[j])
}
}
swap (X[i + 1], X[r])
return i + 1
}
We are running single loop and doing constant operations at each iteration. In the worst or best case, we are making one comparison at each iteration. Here swapping operation depends on comparison if (X[j] < pivot). Time complexity = O(n).
We are using constant extra space, so space complexity = O(1)
Let’s assume that T(n) is the worst-case time complexity of quicksort for n integers. Let's analyze it by breaking down time complexities of each process:
Divide part: This is equal to the time complexity of partition algorithm, which is O(n).
Conquer part: We are recursively solving two subproblems of different sizes. The size of subproblems depends on the choice of pivot in partition process. Suppose after the partition, i elements are in left subarray (left of the pivot), and n - i - 1 elements are in right subarray.
Time complexity of conquer part = Time complexity of recursively sorting left subarray + Time complexity of recursively sorting right subarray = T(i) + T(n - i - 1)
Combine part: As mentioned above, this is a trivial part of quick sort algorithm. Time complexity of combine part = O(1).
For calculating overall time complexity T(n), we need to add time complexities of divide, conquer, and combine parts:
T(n)
= O(n) + T(i) + T(n — i — 1) + O(1)
= T(i) + T(n — i — 1) + O(n)
= T(i) + T(n — i — 1) + cn
Recurrence relation of the quick sort
T(n) = c, if n = 1
T(n) = T(i) + T(n — i — 1) + cn, if n > 1
Worst-case scenario of quicksort occurs when partition process always picks the largest or smallest element as pivot. In this scenario, partition process would be highly unbalanced i.e. one subproblem with n - 1 elements and other with 0 element. This situation occurs when array is sorted in increasing or decreasing order. Think!
Let us assume that unbalanced partitioning arises at each recursive call. For calculating the time complexity in worst case, we put i = n - 1 in the above formula of T(n).
T(n) = T(n - 1) + T(0) + cn
T(n) = T(n - 1) + cn
We simply expand the recurrence relation by substituting all intermediate value of T(i) (from i = n - 1 to 1). By end of this process, we ended up in a sum of arithmetic series.
T(n)
= T(n-1) + cn
= T(n-2) + c(n-1) + cn
= T(n-3) + c(n-2) + c(n-1) + cn
………… and so on
= T(1) + 2c + 3c + ... + c(n-3) + c(n-2) + c(n-1) + cn
= c + 2c + 3c + ... + c(n-3) + c(n-2) + c(n-1) + cn
= c (1 + 2 + 3... + n-3 + n-2 + n-1 + n)
This is a simple sum of arithmetic progression.
T(n) = c (n(n + 1)/2) = O(n^2)
Worst case time complexity of quick sort = O(n^2)
When we sum the total partitioning times for each level of recursion tree, we get => cn + c(n−1) + c(n−2) +⋯+ 2c + c = c (n + n−1 + n−2 + ⋯+ 2 + 1) = c[n(n+1)/2] = O(n^2)
The best-case behavior for quicksort occurs when we are lucky and partition process always picks middle element as pivot. In other words, this is a case of balanced partition, where both sub-problems are n/2 size each.
Let us assume that balanced partitioning arises in each recursive call. For calculating the time complexity in best case, we put i = n/2 in the above formula of T(n).
T(n) = T(n/2) + T(n - 1 - n/2) + cn
= T(n/2) + T(n/2 - 1) + cn
~ 2 T(n/2)+ cn
T(n) = 2 T(n/2)+ cn
This recurrence is similar to merge sort for which solution is O(nlogn). We can also try to solve this using recursion tree or master method. The best-case time complexity of quick sort = O(nlogn)
The behavior of quicksort depends on the relative order of values in the input. We can assume that all permutations of input are equally likely. When we run quicksort algorithm on random input, partitioning is highly unlikely to happen in same way at each level of recursion. So we expect that some of splits will be reasonably well balanced and some of splits will be pretty unbalanced.
Partition process generates a mix of good (balanced partition) and bad (unbalanced partition) splits in average case. In other words, good and bad splits are distributed randomly throughout recursion tree. Suppose, for intuition, good and bad splits appear at the alternate level of tree.
Suppose at the root, partition generates a good split, and at the next level, partition generates a bad split. The cost of partition process will be O(n) at both levels. So combined partitioning cost of the bad split followed by the good split is O(n). This situation is equivalent to the single level of partitioning, which looks similar to the scenario of a balanced partition. So average-case running time of quicksort is much closer to the best case.
From the above perspective, average case time complexity looks O(nlogn). For better visualization, let’s assume, partition algorithm always produces a partially unbalanced split in the ratio of 9 to 1. The recurrence relation for this would be: T(n) = T(9n/10) + T(n/10) + cn . Image source: CLRS Book
We can notice following things from above diagram:
Quicksort is an in-place sorting algorithm where we are not using extra space. But as we know, every recursive program uses a call stack in the background to execute recursion. So space complexity of quick sort depends on the size of recursion call stack, which is equal to the height of recursion tree. As we know from above analysis, the height of the recursion tree depends on the partition process. Think!
In the worst-case scenario, partition is always unbalanced, and there will be only 1 recursive call at each level of recursion. In such a scenario, generated recursion tree is skewed in nature. So the height of tree = O(n) and recursion requires call stack of size O(n). Worst-case space complexity of quick sort = O(n).
Partition is always balanced in the best-case scenario, and there will be 2 recursive calls at each level of recursion. In such a scenario, generated recursion tree looks balanced in nature. So the height of tree = O(logn), and recursion requires call stack of size O(logn). Best-case space complexity of quick sort = O(logn).
In the above algorithm, we assume that all input values are distinct. How do we modify quick sort algorithm when input values are repeated? Let’s think!
We modify the partition algorithm and separate input values into three groups: Values less than pivot, values equal to pivot and values greater than pivot. Values equal to pivot are already sorted, so only less-than and greater-than partitions need to be recursively sorted. In the modified partition algorithm, we return two index startIndex and endIndex such that:
void quickSort(X[], l, r)
{
if (l < r)
{
[leftIndex, rightIndex] = partition(X, l, r)
quickSort(X, l, leftIndex - 1)
quickSort(X, rightIndex + 1, r)
}
}
As we know from the analysis: Efficiency of quicksort depends on the smart selection of pivot element. So there can be many ways to choose pivot in quick sort.
In the above implementation, we have chosen the rightmost element as pivot. This can result in a worst-case situation when input array is sorted. The best idea would be to choose a random pivot that minimizes the chances of worst-case at each level of recursion. Selecting median element as pivot can be also acceptable in majority of cases.
mid = l + (r - l)/ 2
if X[mid] < X[l]
swap (X[l], X[mid])
if X[hi] < X[lo]
swap (X[l], X[r])
if X[mid] < X[r]
swap (X[mid], X[r])
pivot = X[r]
Enjoy learning, Enjoy coding, Enjoy algorithms!
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.