Difficulty: Medium, Asked-in: Google, Microsoft, Amazon, Cisco, SAP Labs, VMWare
Given an array X of n elements and a positive integer k, write a program to find the kth smallest element in the array.
Input: X = [4, 3, 13, 2, 12, 7, 23], k = 4, Output: 7
Explanation: 7 is the 4th smallest element in the array.
Input: X = [-12, -8, 16, 23], k = 2, Output: -8
Explanation: -8 is the 2nd smallest element in the array.
As we know that elements are distinct in the array, then the basic idea would be to sort the array and directly return the kth number from the start i.e. return X[k-1].
Time and Space complexity analysis
Suppose we are using efficient O(nlogn) sorting algorithm heap sort. Time Complexity = Time Complexity of the heap sort + Time Complexity of accessing kth smallest element = O(nlogn) + O(1) = O(nlogn)
Space Complexity: O(1) because heap sort is an in-place sorting algorithm.
Solution Idea and Steps
Min heap is an array-based complete binary tree structure where the value in each node is smaller than or equal to the values in the children of that node. So minimum element is always present at the root, i.e., X. We use the min-heap for the efficient implementation of a min-priority queue. Here are some critical min-heap operations:
So how do we optimize time complexity using min heap? Here is an idea:
Here we have provided the complete class-based implementation of the min-heap.
Finding the kth smallest element
Min heap implementation
Time and space complexity analysis
Time Complexity = Time complexity of building the min-heap size n + Time complexity of deleting min element k-1 times + Time complexity of accessing the kth min element.
So overall time complexity = O(n) + O(klogn) + O(1) = O( n + klogn)
Space Complexity = O(1) because we build min-heap in place using the same array. So we are using constant memory. Now a critical question would be: can we improve optimize the solution further?
Similar to min-heap, the max heap is an array-based complete binary tree structure where the value in each node is larger than or equal to the values in the children of that node. So maximum element is always present at the root, i.e., X. We use max-heap for the efficient implementation of a max-priority queue. Here are some critical max-heap operations:
How do we solve this problem using max heap? A solution insight would be: If we have a max-heap of the k smallest elements, then the kth smallest element will be present at the root of the heap and we can return the root value in O(1). But the critical question would be - how do we generate the max heap of k smallest element? Let's think.
Finding the kth smallest element
Max heap implementation
Time and space complexity analysis
The time complexity of building k size max-heap = O(k).
Now in the worst case, at each iteration of the loop, if(X[i] < maxHeap.getMax()) will be true and we execute maxHeap.replaceMax(X[i]) operation every time. The time complexity of each replaceMax() on k size heap = O(log k). So the overall time complexity of the (n -k) replaceMax() operation = (n-k)*O(log k) = O((n-k)log k)
Time Complexity in the worst case = O(k) + O((n-k)*logk) + O(1) = O(k + (n-k)*logk) = O(k + nlogk - klogk) = O(nlogk), k and klogk are lower order terms.
Space Complexity = O(1), we are not using any extra space.
Now we are going to discuss an interesting quick select approach that solves the problem efficiently by using a divide and conquer idea similar to the quick sort.
The solution intuition comes from the quick-sort partition process, which divides the array into two parts around a pivot and returns the index of the pivot in the sorted array. Elements in the array will look like this after the partition:
X [ l to pos - 1 ] < pivot < X [ pos + 1.....r ]
The pivot element is present at index pos, and pos - l elements are smaller than the pivot. So pivot element is (pos - l + 1)th smallest element in the array.
So from the above insight, we can develop an approach to use the partition process to find the kth smallest element recursively. But unlike the quick-sort, which processes both subarrays recursively, we process only one subarray recursively. We recur for either the left or right side based on the comparison with k and the position of the pivot.
Time and space complexity analysis of the quick-select
This is a divide and conquers algorithm, where we are solving only one subproblem.
Worst-case analysis: Worst case situation occurs when the partition is bad and highly unbalanced. There would be two scenarios: either 0 elements in the left subarray and (n - 1) elements in the right subarray or (n - 1) elements in the left subarray and 0 elements in the right subarray. So in the worst case, the algorithm always explores the larger sub-problem of size n - 1.
The quick select algorithm looks highly inefficient in the worst case. But the real magic would be average case analysis because the algorithm works in O(n) time complexity average. How? Let's think.
Average case analysis: We assume that the partition process chooses pivot randomly. In other words, all possibility of partition is equally likely and the probability of occurring is the worst case in 1/n.
T(n) = 1/n * [ i = 1 to n ∑ T(max (i - 1, n - i)) ] + O(n)
So from i = 1 to n, each term T(i) appears twice in the above formula (Think!). We cab also place cn in place of O(n). So we can simplify the above formula.
T(n) <= 2/n [ i = n/2 to n-1 ∑ T(i)) ] + cn
We can solve this using the “guess and check” or substitution method based on our intuition. Let's assume that T(n) is a linear function, i.e., T(n) ≤ pn, where p is a constant.
T(n) <= 2/n ( i = n/2 to n-1 ∑ pi ) + cn
Let's further simplify the expression on the right-hand side
2/n (i = n/2 to n-1 ∑pi) + cn = 2p/n (i = n/2 to n-1 ∑ i) + cn = 2p/n [( i = 0 to n-1 ∑ i) - (i = 0 to n/2-1 ∑ i)] + cn = 2p/n [n(n-1)/2 - (n/2 - 1)(n/2 - 2)/2] + cn = p/n [n(n-1) - (n/2 - 1)(n/2 - 2)] + cn = p/n [n^2 - n - n^2/4 + n + n/2 - 2] + cn = p/n [3n^2/4 + n/2 - 2] + cn = p [3n/4 + 1/2 - 2/n] + cn = 3pn/4 + p/2 - 2p/n + cn = pn - (pn/4 - cn - p/2)
In order to complete the proof, we need to show that for sufficiently large n, pn - (pn/4 - cn - p/2) is at most pn or, equivalently (pn/4 - cn - p/2) > 0 => n(c- p/4) > p/2 => n > (p/2)/(c- p/4)
(p/2)/(c- p/4) is a small constant, so our guess would be correct for n > (p/2)/(c- p/4). So average case time complexity of quick select algorithm T(n) = pn = O(n)
Space complexity: Worst case situation occur when the partition is bad and the height of the recursion tree will be O(n). In such a scenario, recursion decreases by 1 and allocates O(n) stack space. But in the average case, space complexity would be O(logn). (Think!)
Enjoy learning, Enjoy coding, Enjoy algorithms!
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.