Difficulty: Medium, Asked-in: Google, Microsoft, Adobe, Goldman Sachs, Qualcomm
Quick Sort is one of the most popular sorting algorithms that use the divide and conquer approach of problem-solving. Here are some excellent reasons to learn this algorithm :
Let’s explore the general pattern of the sorted array and think! Suppose all the elements are arranged in increasing order.
From the above observation, let’s think in a reverse way: Suppose we partition an unsorted array into two parts around some value present in the array (pivot). What will happen after the partition? Here is an idea:
Now we have a situation similar to the sorted array, but there is one difference : both left and right halves are still unsorted. If we sort both the halves, then our whole array will get sorted. Think!
If we explore closely, both unsorted halves are smaller sub-problem of the original sorting problem, i.e., a similar sorting problem for smaller input sizes. So the idea would be simple : if we sort both the halves then the whole array will be sorted because all the values in the left half are less than X[i], the pivot value is at the correct position after the partition and all the values in the right half are greater than X[i]. Think!
The above approach is a divide and conquers approach of problem-solving : we divide the sorting problem into two smaller subproblems of sorting using the partition and solving each sub-problems recursively to get the final sorted array. At each step of the recursion, we repeat the same process of partition for the smaller sub-array.
Divide part : We divide the problem into two smaller sub-problems by partition the array X[l…r] into two smaller subarrays around some pivot in the array. The partition process returns the pivotIndex i.e. index of the pivot in the sorted array.
Conquer part : Now, we solve both the sub-problems or sort both the smaller arrays recursively. We need not worry about the solutions to the sub-problems because recursion will do the rest of the work for us.
Combine part : There is no need to combine parts because after sorting both smaller arrays, our whole array will get sorted automatically. Think!
Suppose the function quickSort(int X, int l, int r) sorts the entire array where l and r are the left and right ends of the array. The initial function call would be quickSort(X, 0, n-1).
Let’s take a partition(X, l, r) function that divides the array around a pivot and returns the pivotIndex. We have selected the pivot value inside the partition function.
pivotIndex = partition(X, l, r)
This is the most critical step of the implementation. The base case would the smallest version of the problem where recursion terminates and returns the value directly. What would be the base case? If we find l ≥ r during the recursive calls, the sub-array would be either empty or has one value left. Our recursion should not go further and return from here.
How do we divide the input array around some pivot value so that values in the left part would be less than pivot and values in the right part would be greater than pivot? Can we do it in place? Let’s think!
Suppose we pick the rightmost value as the pivot, i.e., pivot = X[r], and scan the array from left to right. Whenever we found a value less than the pivot, we place it at the starting part of the array and move forward. Otherwise, we ignore the value and move forward. The idea looks simple, but how do we implement it? Let’s think!
Pseudocode of the partition algorithm
Analysis of the partition algorithm
We are running a single loop and doing constant operations at each iteration. In the worst or best case, we are making one comparison at each iteration of the loop. Swapping operation depends on the comparison if (X[j] < pivot). So time complexity = O(n). We are using constant extra space, so Space complexity = O(1)
Let’s assume that T(n) is the worst-case time complexity of quicksort for n integers. Let's analyze it by breaking down the time complexities of each process:
Divide part: The time complexity of the divide part is equal to the time complexity of the partition algorithm, which is O(n).
Conquer part: We are recursively solving two subproblems of different sizes. The size of the subproblems depends on the choice of the pivot in the partition process! Suppose after the partition, i elements are in the left subarray (left of the pivot), and n - i - 1 elements are in the right subarray (right of the pivot).
Time complexity of the conquer part = Time complexity of the left sub-problem + Time complexity of the right sub-problem = T(i) + T(n - i - 1)
Combine Part: As mentioned above, there is no operation in this part of the quick sort. Time complexity = O(1).
For calculating the T(n), we need to add the time complexities of the divide, conquer, and combine part: T(n) = O(n) + T(i) + T(n — i — 1) + O(1) = T(i) + T(n — i — 1) + O(n) = T(i) + T(n — i — 1) + cn Recurrence relation of the quick sort T(n) = c, if n = 1 T(n) = T(i) + T(n — i — 1) + cn, if n > 1
The worst-case behavior for quicksort occurs when the partition process always picks the largest or smallest element as a pivot. In this scenario, the partition process would be highly unbalanced, i.e., one subproblem with n - 1 elements and the other with 0 elements. Such a situation will occur when the array is already sorted in increasing or decreasing order. (Think!)
Let us assume that this unbalanced partitioning arises in each recursive call. So for the time complexity in the worst case, we put i = n - 1 in the above formula of T(n).
T(n) = T(n - 1) + T(0) + cn = T(n - 1) + cn
Analysis of worst-case using the substitution method
We simply expand the recurrence relation by substituting the all intermediate value of T(i) (from i = n-1 to 1). By end of this process, we ended up in a sum of arithmetic series.
T(n) = T(n-1) + cn = T(n-2) + c(n-1) + cn = T(n-3) + c(n-2) + c(n-1) + cn ………… and so on = T(1) + 2c + 3c + ... + c(n-3) + c(n-2) + c(n-1) + cn = c + 2c + 3c + ... + c(n-3) + c(n-2) + c(n-1) + cn = c (1 + 2 + 3... + n-3 + n-2 + n-1 + n) this is a simple sum of the arithmetic progression. T(n) = c (n(n+1)/2) = O(n^2) Worst case Time complexity of the quick sort = O(n^2)
Analysis of worst-case using recursion tree method
When we sum the total partitioning times for each level of recursion tree, we get => cn + c(n−1) + c(n−2) +⋯+ 2c + c = c (n + n−1 + n−2 + ⋯+ 2 + 1) = c(n(n+1)/2) = O(n^2)
Best-case analysis of quick sort
The best-case behavior for quicksort occurs when the partition process always picks the middle element as a pivot. In other words, the partition process is balanced, i.e., both sub-problems are n/2 size each. Let us assume that this balanced partitioning arises in each recursive call. So for the time complexity in the best case, we put i = n/2 in the above formula of T(n).
T(n) = T(n/2) + T(n - 1 - n/2) + cn = T(n/2) + T(n/2 - 1) + cn ~ 2 T(n/2)+ cn T(n) = 2 T(n/2)+ cn This recurrence is similar to the merge sort for which solution is O(nlogn).We solve this using both recursion tree method and master theorem. Best case Time complexity of the quick sort = O(nlogn)
Image Source — https://www.cs.dartmouth.edu/~thc/cs5-F96/lec28.html
Average-case analysis of quicksort
The behavior of quicksort depends on the relative order of the values in the input array and not by the particular values in the array. We can assume that all permutations of the input values are equally likely. When we run quicksort on random input, the partitioning is highly unlikely to happen in the same way at every level. We expect that some of the splits will be reasonably well balanced and that some will be pretty unbalanced.
In the average case, the partition process generates a mix of good (balanced partition) and bad (unbalanced partition) splits. In a recursion tree for an average case, the good and bad splits are distributed randomly throughout the tree. Suppose, for intuition, good and bad splits appear at the alternate level of the tree.
Suppose at the root of the recursion tree, partition generates a good split, and at the next level, partition generates a bad split. The cost of the partition process will be O(n) at both levels. The combination of the bad split followed by the good split produces a combined partitioning cost of O(n). Certainly, this situation is equivalent to the single level of partitioning, which overall looks similar to the scenario of a balanced partition! So the average-case running time of quicksort is much closer to the best case than to the worst case. From the above perspective, the average case time complexity looks O(nlogn).
For better visualization of average-case analysis, let’s assume the partitioning algorithm always produces a partially unbalanced spit in the ratio of 9 to 1. The recurrence relation for this would be: T(n) = T(9n/10) + T(n/10) + cn. The following diagram shows the recursion tree for this recurrence . Image source: CLRS Book
We can notice the following things from the above diagram:
Space complexity analysis
Quicksort is an in-place sorting algorithm where we are not using additional extra space in the code. But as we know, every recursive program uses a call stack in the background to execute the recursion. So the space complexity of the quick sort depends on the size of the recursion call stack which is equal to the height of the recursion tree! (Think!) As we know from the above analysis, the height of the recursion tree depends on the partition process.
In the worst-case scenario, the partition is always unbalanced, and recursion decreases by 1 at each level. In such a scenario, the generated tree is skewed in nature, and the height of the tree = O(n). So we require a call stack of size O(n). Worst-case space complexity = O(n)
The partition is always balanced in the best-case scenario, and recursion decreases by 1/2 at each level. In such a scenario, the generated tree is balanced in nature, and the height of the tree = O(logn). So we require a call stack of size O(logn). Best case space complexity = O(logn)
In the above algorithm, we assume that all input values are distinct. But how do we modify the quick sort algorithm when input values are repeated? Let’s think!
In the above partition algorithm, quicksort exhibits poor performance for inputs with many repeated elements even though we choose good pivot values. To understand this better, suppose all the input elements are equal. At each stage of recursion, the left partition is empty, and the right partition has only decreased by one element. This is a situation just similar to the worst-case scenario of the quick sort. But what would be a good solution? Here is an idea!
We modify the partition algorithms and separates the input values into three groups: values less than the pivot, values equal to the pivot, values greater than the pivot. The values equal to the pivot are already sorted, so only the less-than and greater-than partitions need to be recursively sorted. In the modified partition algorithm, we return two index startIndex and endIndex such that:
Modified quicksort pseudocode
There are many ways to choose pivot in the quick sort.
In the above implementation, we have chosen the rightmost element as the pivot element. But this can result in a worst-case situation when an array is sorted. The best idea would be to choose a random pivot that minimizes the chance of worst-case. Selecting the median element would also be acceptable in the majority of cases. Here is the pseudo-code snippet for the median-of-three pivot selection:
mid = l + (r - l)/ 2 if X[mid] < X[l] swap (X[l], X[mid]) if X[hi] < X[lo] swap (X[l], X[r]) if X[mid] < X[r] swap (X[mid], X[r]) pivot = X[r]
Enjoy learning, Enjoy coding, Enjoy algorithms!
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.