Difficulty: Medium, Asked-in: Google, Microsoft, Adobe, Goldman Sachs, Qualcomm.
Quick sort algorithm was developed by British computer scientist Tony Hoare in 1959. It quickly became popular due to its efficiency and simplicity, and it remains one of the most widely used sorting algorithms today.
Quick sort is one of the fastest sorting algorithms based on the idea of divide-and-conquer approach. Here are some excellent reasons to learn quick sort:
Let’s start by exploring the pattern of sorted array.
From the above observation, we can think in a reverse way: Suppose we partition an unsorted array around some input value (pivot) into two parts:
After the partition, pivot will get placed at the correct position in the sorted order. Now we have a situation similar to a sorted array with one big difference: the left and right halves are still unsorted! If we observe further, both unsorted halves are smaller subproblems of the sorting problem, i.e., a sorting problem for smaller input sizes.
If we sort both halves recursively using the same function, the whole array will get sorted! The idea is simple: All values on the left side are less than pivot, pivot is at the correct position, and all values on the right side are greater than pivot.
The above idea of quick sort is a divide-and-conquer approach of problem-solving: We divide the sorting problem into two smaller subproblems using the partition algorithm and solve each subproblem recursively to get the final sorted array. We repeat the partition process for smaller subarrays until we reach the base case.
Divide part: We divide the problem into two smaller subproblems by partitioning the array X[l...r]. The partition process will return the pivotIndex, i.e. the index of the pivot in the sorted array.
Conquer part: Now, we solve both subproblems by recursively sorting smaller subarrays.
Combine part: This is a trivial case because after sorting both smaller arrays, the whole array will get sorted. In other words, there is no need to write code for the combine part.
Suppose we use the function quickSort(int X[], int l, int r) to sort the entire array, where l and r are the left and right ends. The initial function call will be quickSort(X, 0, n - 1).
We define the partition(X, l, r) function to divide the array around the pivot and return the pivotIndex. We will select the pivot value inside the partition function. The critical question is: Why are we choosing the pivot value inside the partition function? Think and explore!
pivotIndex = partition(X, l, r)
In recursive algorithms, the base case is the smallest version of the problem that will be directly solved without further recursion. In the context of quick sort, the base case will occur when the sub-array being sorted has either zero elements (empty) or just one element remaining.
In other words, l >= r is the situation of base case in the quick sort. At this point, no further partitioning is needed, sub-array is considered sorted, and the recursion will terminate.
void quickSort(int X[], int l, int r)
{
if (l < r)
{
int pivotIndex = partition(X, l, r)
quickSort(X, l, pivotIndex - 1)
quickSort(X, pivotIndex + 1, r)
}
}
How can we divide an array around the pivot so that the values in the left part are less than the pivot and values in the right part are greater than the pivot? Can we do it in place? Let’s think!
Suppose we pick the rightmost value as the pivot, i.e., pivot = X[r], and scan the array from left to right. Whenever an element is less than the pivot, we will place it at the starting part of the array and move forward. Otherwise, we will ignore that element and move forward. How can we implement it? Let’s think!
int partition (int X[], int l, int r)
{
int pivot = X[r]
int i = l - 1
for (int j = l; j < r; j = j + 1)
{
if (X[j] < pivot)
{
i = i + 1
swap (X[i], X[j])
}
}
swap (X[i + 1], X[r])
return i + 1
}
We are running a single loop and doing constant operations at each iteration. In the worst or best case, we are doing one comparison at each iteration. So time complexity = O(n). We are using constant extra space, so space complexity = O(1). Note: Here swapping operation will depend on comparison if (X[j] < pivot).
Suppose T(n) is the worst-case time complexity of quick sort for n integers. Let's analyze it by breaking down the time complexities of each process:
Divide part: Time complexity of the divide part is equal to the time complexity of the partition algorithm, which is O(n).
Conquer part: We are recursively solving two subproblems, where size of subproblems will depend on the choice of pivot in the partition process. Suppose after the partition, i elements are in the left subarray and n - i - 1 elements are in the right subarray. So time complexity of conquer part = Time complexity of recursively sorting the left subarray + Time complexity of recursively sorting the right subarray = T(i) + T(n - i - 1).
Combine part: As mentioned above, this is a trivial part. Time complexity of combine part = O(1).
For calculating time complexity T(n), we need to add time complexities of divide, conquer, and combine parts.
T(n)
= O(n) + T(i) + T(n — i — 1) + O(1)
= T(i) + T(n — i — 1) + O(n)
= T(i) + T(n — i — 1) + cn
Recurrence relation of the quick sort
T(n) = c, if n = 1
T(n) = T(i) + T(n — i — 1) + cn, if n > 1
The worst-case scenario will occur when the partition process always chooses the largest or smallest element as the pivot. In this case, the partition will be highly unbalanced, with one subarray containing n - 1 elements and the other subarray containing 0 elements.
When we always choose the rightmost element as the pivot, the worst-case scenario will arise when the array is already sorted in either increasing or decreasing order. In this case, each recursive call will create an unbalanced partition.
For calculating the time complexity in the worst case, we put i = n - 1 in the above formula of T(n).
T(n) = T(n - 1) + T(0) + cn
T(n) = T(n - 1) + cn
Worst-case analysis of quick sort using Substitution Method
We simply expand the recurrence relation by substituting all intermediate value of T(i) (from i = n - 1 to 1). By end of this process, we ended up in a sum of arithmetic series.
T(n)
= T(n - 1) + cn
= T(n - 2) + c(n - 1) + cn
= T(n - 3) + c(n - 2) + c(n - 1) + cn
... and so on
= T(1) + 2c + 3c + ... + c(n - 2) + c(n - 1) + cn
= c + 2c + 3c + ... + c(n - 2) + c(n - 1) + cn
= c(1 + 2 + 3 + ... + n - 2 + n - 1 + n)
This is a simple sum of arithmetic series.
T(n) = c(n(n + 1)/2) = O(n^2)
Worst case time complexity of quick sort = O(n^2)
Worst-case analysis of quick sort using Recursion Tree Method
The recursion tree method is one of the popular techniques for recursion time complexity analysis. Here we draw a tree structure of recursive calls and highlight the extra cost associated with each recursive call. To get the overall time complexity, we add the cost level by level.
This is the recursion tree diagram of quick sort in the worst case.
We sum the total partitioning cost for each level => cn + c(n−1) + c(n−2) +⋯+ 2c + c = c (n + n−1 + n−2 + ⋯+ 2 + 1) = c[n(n+1)/2] = O(n^2).
The best-case scenario of quick sort will occur when partition process will always pick the median element as the pivot. In other words, this is a case of balanced partition, where both sub-arrays are approx. n/2 size each.
Suppose, the scenario of balanced partitioning will arise in each recursive call. Now, for calculating the time complexity in the best case, we put i = n/2 in the above formula of T(n).
T(n) = T(n/2) + T(n - 1 - n/2) + cn
= T(n/2) + T(n/2 - 1) + cn
~ 2 T(n/2)+ cn
T(n) = 2 T(n/2)+ cn
This recurrence is similar to the recurrence for merge sort, for which the solution is O(n log n). So the best-case time complexity of quicksort = O(n log n). Note: We can solve this using the recursion tree method or the master method.
Suppose all permutations of input are equally likely. Now, when we run the quicksort algorithm on random input, partitioning is highly unlikely to happen in the same way at each level of recursion. The idea is simple: The behaviour of quicksort will depend on the relative order of values in the input.
Here some of the splits will be reasonably well balanced and some of the splits will be pretty unbalanced. So the partition process will generate a mix of good (balanced partition) and bad (unbalanced partition) splits in the average case. These good and bad splits will be distributed randomly throughout the recursion tree.
For a better understanding of the analysis, Suppose good and bad splits appear at alternate levels of the tree.
Suppose, at the root, the partition generates a good split, and at the next level, the partition generates a bad split. The cost of the partition process will be O(n) at both levels. Therefore, the combined partitioning cost of the bad split followed by the good split is O(n).
Overall, this is equivalent to a single level of partitioning, which resembles the scenario of a balanced partition. As a result, the height of the recursion tree will be O(log n), the combined cost of each level will be O(n), and the average-case running time of quicksort will be O(n log n). Note: Average case time complexity is the same as the best case, but the value of the constant factor in O(n log n) will be slightly higher in the average case.
For better visualization, let’s assume that the partition algorithm always produces a partially unbalanced split in the ratio of 9 to 1. The recurrence relation for this would be: T(n) = T(9n/10) + T(n/10) + cn. Image source: CLRS Book.
We can notice following things from above diagram:
Quicksort is an in-place sorting algorithm because it does not use extra space in the code. However, every recursive program uses a call stack in the background to execute recursion. Therefore, the space complexity of quicksort will depend on the size of the recursion call stack, which will be equal to the height of the recursion tree. As mentioned in the above analysis, the height of the recursion tree depends on the partition process.
In the worst-case scenario, the partition will always be unbalanced, resulting in only one recursive call at each level of recursion. In other words, the generated recursion tree will be skewed in nature, and the height of the recursion tree will be O(n). As a result, recursion will require a call stack of size O(n) and the worst-case space complexity of quicksort will be O(n).
In the best-case scenario, the partition will always be balanced, resulting in two recursive calls at each level of recursion. In such a scenario, the generated recursion tree will be balanced in nature. So, the height of the recursion tree will be O(log n), and recursion will require a call stack of size O(log n). Therefore, the best-case space complexity of quicksort is O(log n).
In the above algorithm, we assume that all input values are distinct. How do we modify quick sort algorithm when input values are repeated? Let’s think!
We can modify the partition algorithm and separate input values into three groups: Values less than pivot, values equal to pivot and values greater than pivot. Values equal to pivot are already sorted, so only less-than and greater-than partitions need to be recursively sorted. Because of this, we will return two index startIndex and endIndex in the modified partition algorithm.
void quickSort(X[], l, r)
{
if (l < r)
{
[leftIndex, rightIndex] = partition(X, l, r)
quickSort(X, l, leftIndex - 1)
quickSort(X, rightIndex + 1, r)
}
}
As we know from the analysis, the efficiency of quicksort depends on the smart selection of the pivot element. So there can be many ways to choose the pivot in quicksort:
In the above implementation, we have chosen the rightmost element as the pivot. This can result in a worst-case situation when the input array is sorted. The best idea would be to choose a random pivot that minimizes the chances of a worst-case at each level of recursion. Selecting the median element as the pivot can also be acceptable in the majority of cases.
mid = l + (r - l)/ 2
if X[mid] < X[l]
swap (X[l], X[mid])
if X[hi] < X[lo]
swap (X[l], X[r])
if X[mid] < X[r]
swap (X[mid], X[r])
pivot = X[r]
If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy coding, Enjoy algorithms!
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.
©2023 Code Algorithms Pvt. Ltd.
All rights reserved.