Find Kth Smallest Element in an Array

Difficulty: Medium, Asked-in: Google, Microsoft, Amazon, Cisco, SAP Labs, VMWare

Key takeaways

  • An excellent problem to learn problem-solving using the heap data structure.
  • The quick-select algorithm is intuitive and worth exploring. It is based on the divide and conquer approach similar to quicksort and works in O(n) time average.

Let's understand the problem

Given an array X[] and a positive integer k, write a program to find the kth smallest element in the array. 

  • It is given that all array elements are distinct.
  • We can assume that k is always valid, 1 ≤ k ≤ n

Important note: before moving on to the solutions, we recommend trying this problem on paper for atleast 15 or 30 minutes. Enjoy problem-solving!

Examples

Input: X[] = [4, 3, 13, 2, 12, 7, 23], k = 4

Output: 7 i.e. 7 is the 4th smallest element in the array.

Input: X[] = [-12, -8, 16, 23], k = 2,

Output: -8 i.e. -8 is the 2nd smallest element in the array.

Discussed solution approaches

  • A brute force approach using sorting
  • Using the min-heap data structure
  • Using the max-heap data structure
  • Quick-select: Using divide and conquer idea similar to quick-sort

Solution approach 1: A brute force approach using sorting

Solution idea

As we know that all elements are distinct in the array. So the basic idea would be to sort the array in increasing order and directly return the kth number from the start i.e. return X[k - 1].

Solution pseudocode

int findKthSmallest (int X[], int n, int k)
{
    sort(X, n)
    return X[k - 1]
}

Solution analysis

Suppose we are using heap sort which is an efficient O(nlogn) sorting algorithm. Time complexity = Time complexity of the heap sort + Time complexity of accessing kth smallest element = O(nlogn) + O(1) = O(nlogn)

Space complexity = O(1), because heap sort is an in-place sorting algorithm.

Solution approach 2: Using the min-heap data structure

The time complexity of the above solution is dominated by the sorting algorithm. Now the critical question is: can we improve the time complexity further? can we solve this problem without using sorting? Can we think to use some efficient mechanism to find min or max elements like min priority queue or heap data structure? Think!

Solution idea and steps

Min heap is an array-based complete binary tree structure where the value in each node is smaller than or equal to the values in the children of that node. So minimum element is always present at the root, i.e., X[0].

We also use the min-heap for the efficient implementation of a min-priority queue. Here are some critical min-heap operations:

  • getMin(): provide fast access to the minimum element in O(1) time.
  • deleteMin(): delete minimum element in O(logn) time.
  • Insert(): add an element in O(logn) time.
  • We can build a min-heap in O(n) time using the bottom-up approach.

So how do we optimize time complexity and find the kth smallest element using the above min-heap operations? Here is an idea: we first build a min-heap of all n array elements and remove k - 1 element by continuously performing the deleteMin() operation. After this, the kth smallest element will be present on the root of the min-heap. So, we can easily get this in O(1) time by calling the getMin() operation.

Solution pseudocode

Finding the kth smallest element using min-heap

int findKthSmallest(int X[], int n, int k)
{
    MinHeap heap(X, n)
    
    for (int i = 0; i < k - 1; i = i + 1)
        heap.deleteMin()
        
    return heap.getMin()
}

min-heap implementation

class MinHeap
{
    int heapArray[]
    int heapCapacity
    int heapSize
    
public:

    MinHeap(int X[], int size)
    {
        heapSize = size
        heapArray = X
        int i = (heapSize - 1)/2
        
        while (i >= 0)
        {
            minHeapify(i)
            i = i - 1
        }
    }
    
    int deleteMin()
    { 
        if (heapSize == 0)
            return INT_MAX
        int min = heapArray[0]
    
        if (heapSize > 1)
        {
            heapArray[0] = heapArray[heapSize - 1]
            minHeapify(0)
        }
        
        heapSize = heapSize - 1
        return min
    }
    
    void minHeapify(int i)
    {
        int l = leftChild(i)
        int r = rightChild(i)
        int smallest = i
    
        if (l < heapSize && heapArray[l] < heapArray[i])
            smallest = l
        if (r < heapSize && heapArray[r] < heapArray[smallest])
            smallest = r
        
        if(smallest != i)
        {
            swap(heapArray[i], heapArray[smallest])
            minHeapify(smallest)
        }
    }
    
    int parent(int i)
    {
        return (i - 1)/2
    }
    
    int leftChild(int i)
    {
        return (2 * i + 1)
    }
    
    int rightChild(int i)
    {
        return (2 * i + 2)
    }
    
    int getMin()
    {
        return heapArray[0]
    }
}

Solution analysis

Time complexity = Time complexity of building the min-heap size of n + Time complexity of deleting k - 1 element from the min-heap + Time complexity of accessing the kth element from the min-heap

  • The time complexity of building the min-heap of size n = O(n)
  • After each deletion, the min-heap size will reduce by 1. So time complexity of deleting min element k - 1 times = log n + log (n - 1) + log (n - 2)....+ log (n - k - 1) = log (n * (n-1)....*(n - k - 1)) < log(n^ (k-1)) = (k - 1) logn = O(klogn). Here time complexity depends on the value of k (1 <= k <= n).
  • The time complexity of accessing the kth min element = O(1)

So overall time complexity = O(n) + O(klogn) + O(1) = O( n + klogn)

  • Best case occur if k = 1, time complexity = O(n + logn) = O(n)
  • Worst case occur if k = n, time complexity = O(n + nlogn) = O(nlogn)

Space complexity = O(1), because we can build min-heap in place using the same array. So we are using constant extra memory. Now a critical question would be: can we optimize the solution further?

Solution approach 3: Using the max-heap data structure

Solution idea and steps

Similar to min-heap, the max heap is an array-based complete binary tree structure where the value in each node is larger than or equal to the values in the children of that node. So maximum element is always present at the root, i.e., X[0].

We also use max-heap for the efficient implementation of a max-priority queue. Here are some critical max-heap operations:

  • getMax(): provide fast access to the maximum element in O(1) time.
  • deleteMax(): remove maximum element in O(logn) time.
  • Insert(): add an element in O(logn) time.
  • We can build a max-heap in O(n) time using the bottom-up approach.

How do we solve this problem using max heap? A solution insight would be: If we have a max-heap of the k smallest elements of the array, then the kth smallest element will be present at the root of the max-heap and we can get the root value in O(1) time. But the critical question is: how do we generate the max-heap of k smallest element in the array? Let's think!

  • We start with building a max-heap of the first k elements of the array. There is no need to allocate extra space, and we can use the same array from index i = 0 to k - 1.
  • After this process, the max element of the first k element will be present at the root of the heap, i.e., X[0].
  • Now we need to track the k smallest elements using the k size max-heap. For this, we scan the array from index i = k to n - 1 and update the max-heap with all the values smaller than the root of the max-heap.
  • In other words, we run a loop from i = k to n -1 and check If (X[i] < root). If yes, then we replace the max heap root with X[i] and run the heapify() operation on the root to maintain the max heap property. Otherwise, we ignore the current element and move to the next element in the array. Here we access the root value by the getMax() function and replace the heap root by replaceMax(X[i]) function.
  • By the end of the above loop, the max heap contains the top k smallest element of the array i.e. kth smallest element will be present at the root. We return this value.

Solution pseudocode

Finding the kth smallest element

int findKthSmallest(int X[], int n, int k)
{
    MaxHeap heap(X, k)
    for (int i = k; i < n; i = i + 1)
    {
        if(X[i] < heap.getMax())
            heap.replaceMax(X[i])
   }       
   return heap.getMax()
}

Max heap implementation

class MaxHeap
{
    int heapArray[]
    int heapCapacity
    int heapSize
public:

    MaxHeap(int X[], int size)
    {
        heapSize = size
        heapArray = X
        int i = (heapSize - 1) / 2
    
        while (i >= 0)
        {
            maxHeapify(i)
            i = i - 1
        }
    }
    
    void maxHeapify(int i)
    {
        int l = leftChild(i)
        int r = rightChild(i)
        int largest = i
    
        if (l < heapSize && heapArray[l] > heapArray[i])
            largest = l
        if (r < heapSize && heapArray[r] > heapArray[smallest])
            largest = r
        
        if (largest!= i)
        {
            swap(heapArray[i], heapArray[largest])
            maxHeapify(largest)
        }
    }
    
    int deleteMax()
    {
        if (heapSize == 0)
            return INT_MAX
        int max = heapArray[0]
    
        if (heapSize > 1)
        {
            heapArray[0] = heapArray[heapSize - 1]
            maxHeapify(0)
        }
    
        heapSize = heapSize - 1
        return max
    }
    
    int parent(int i)
    {
        return (i - 1)/2
    }
    
    int leftChild(int i)
    {
        return (2 * i + 1)
    } 
    
    int rightChild(int i)
    {
        return (2 * i + 2)
    }
    
    int getMax()
    {
        return heapArray[0]
    }
    
    void replaceMax(int value)
    {
        heapArray[0] = value
        maxHeapify(0)
    }
}

Solution analysis

  • The time complexity of building k size max-heap = O(k).
  • Now in the worst case, at each iteration of the loop, if(X[i] < heap.getMax()) will be true and we run replaceMax(X[i]) operation every time. The time complexity of each replaceMax() on k size heap = O(log k).
  • So the overall time complexity of the (n - k) replaceMax() operation = (n - k)*O(log k) = O((n - k)log k)
  • Overall time complexity in the worst case = O(k) + O((n - k)*logk) + O(1) = O(k + (n - k)*logk) = O(k + nlogk - klogk) = O(nlogk). Here k and klogk are lower order terms.
  • Space Complexity = O(1), we are using constant extra space.

Solution approach 4: Quick-select approach (Divide and conquer idea similar to quick-sort)

Solution idea

Now we will discuss an interesting quick select approach that solves the problem efficiently by using a divide and conquer idea similar to the quick sort.

The solution intuition comes from the quick-sort partition process: dividing the array into two parts around a pivot and returning the sorted array's pivot index. Elements in the array will look like this after the partition: X[ l...pos - 1 ] < pivot < X[pos + 1...r ]. Here pivot element is present at index pos, and pos - l elements are smaller than the pivot. So pivot element is (pos - l + 1)th smallest element in the array.

  • if ((pos - l + 1) == k): pivot is the kth smallest and we return X[pos]. 
  • if ((pos - l + 1) > k): kth smallest must be present in the left subarray.
  • if ((pos - l + 1) < k): kth smallest must be present on the right subarray.

So from the above insight, we can develop an approach to use the partition process and find the kth smallest element recursively. But unlike the quick-sort, which processes both subarrays recursively, we process only one subarray. We recur for either the left or right side based on comparing k and pivot position.

Solution steps

  • We define a function KthSmallest(int X[], int l, int r, int k), where the left and right ends of the array are provided in the input. Initially l = 0 and r = n - 1. 
  • The base case is the scenario of single-element array i.e if (l == r), return X[l].
  • Now we partition the array X[l … r] into two subarrays (X[l … pos-1] and X[pos + 1 ... r] ) using the partition process. Here pos is the index of the pivot returned by the partition.
  • After the partition, we calculate the pivot order in the array, i.e., i = pos - l + 1. In other words, the pivot is the (pos - l + 1)th smallest element.
  • if (i == k), we return X[pos]. Otherwise, we determine in which of the two subarrays (X[l .. pos-1] and X[pos + 1 .. r]) the kth smallest is present.
  • If (i > k), the desired element must lie on the left subarray. We call the same function recursively for the left part, i.e., kthSmallest(X, l, pos-1, k).
  • If (i < k), the desired element must lie on the right subarray. We call the same function recursively for the right part i.e, kthSmallest(X, pos + 1, r, k - i). Note: The desired element is the (k - i)th smallest element of the right side because we already know i number of values is smaller than the kth smallest element (left subarray and a pivot).

Solution pseudocode

int getKthSmallest(int X[], int n, int k)
{
    return KthSmallest(X, 0, n - 1, k)
}

int KthSmallest(int X[], int l, int r, int k)
{
    if(l == r)
        return X[l]   
    int pos = partition(X, l, r)
    int i = pos - l + 1
    
    if(i == k)
        return X[pos]
    else if(i > k)
        return KthSmallest(X, l, pos - 1, k)
    else 
        return KthSmallest(X, pos + 1, r, k - i)
}

int partition (X[], l, r)
{
    int pivot = X[r]  
    int i = l - 1
    for (int j = l; j < r; j = j + 1)
    {
        if (X[j] <= pivot)
        {
            i = i + 1
            swap (X[i], X[j])
        }
    }
    swap (X[i + 1], X[r])
    return i + 1
}

Time and space complexity analysis of the quick-select

This is a divide and conquer algorithm, where we are solving only one subproblem.

Worst-case analysis: Worst case situation occurs when the partition is bad and highly unbalanced. There would be two scenarios: either 0 elements in the left subarray and (n - 1) elements in the right subarray or (n - 1) elements in the left subarray and 0 elements in the right subarray. So in the worst case, the algorithm always explores the larger sub-problem of size n - 1.

  • For the partition algorithm, we are running a single loop with constant operation at each iteration. So the time complexity of the partition algorithm = O(n)
  • So worst case time complexity T(n) = Time complexity of the partition algorithm + Worst case time complexity of the one subproblem + Time complexity of extra operations = O(n) + T(n-1) + O(1) = T(n - 1) + O(n) = T(n - 1) + cn
  • The solution of the above recurrence is O(n^2). We can easily solve the above recurrence using the recursion tree method. Refer to the worst-case analysis of quicksort.

The quick select algorithm looks highly inefficient in the worst case. But the real magic would be average case analysis because the algorithm works in O(n) time complexity average. How? Let's think.

Average case analysis: We assume that the partition process chooses pivot randomly. In other words, all possibility of partition is equally likely and the probability of occurring is the worst case in 1/n.

  • Input size of the left subproblem = i - 1, Input size of the right subproblem = n - i. Here the value of i can be in the range of 1 to n. The time complexity of the left subproblem = T(i - 1), Time complexity of the right subproblem = T(n - i).
  • We solve only one subproblem recursively, so let's take the upper bound: the time needed for the recursive call on the largest possible input. In other words, to obtain an upper bound, we assume that the kth smallest element is always on the side of the partition with the greater number of elements. So time complexity of the smaller subproblem = T(max (i - 1, n - i)), where 1<= i <= n.
  • For calculating th average case, we explore all n possible values of T(max (i - 1, n - i)) and divide it by n. Recurrence relation for the average case:

T(n) = 1/n * [ i = 1 to n ∑ T(max (i - 1, n - i)) ] + O(n)

  • if i > n/2, max (i - 1, n - i) = i - 1
  • if i <= n/2, max (i - 1, n - i) = n - i

So from i = 1 to n, each term T(i) appears twice in the above formula (Think!). We cab also place cn in place of O(n). So we can simplify the above formula.

T(n) <= 2/n [ i = n/2 to n-1 ∑ T(i)) ] + cn

We can solve this using the “guess and check” or substitution method based on our intuition. Let's assume that T(n) is a linear function, i.e., T(n) ≤ pn, where p is a constant.

T(n) <= 2/n ( i = n/2 to n-1 ∑ pi ) + cn

Let's further simplify the expression on the right-hand side

2/n (i = n/2 to n-1 ∑pi) + cn
= 2p/n (i = n/2 to n-1 ∑ i) + cn
= 2p/n [( i = 0 to n-1 ∑ i) - (i = 0 to n/2-1 ∑ i)] + cn
= 2p/n [n(n-1)/2 - (n/2 - 1)(n/2 - 2)/2] + cn
= p/n [n(n-1) - (n/2 - 1)(n/2 - 2)] + cn
= p/n [n^2 - n - n^2/4 + n + n/2 - 2] + cn
= p/n [3n^2/4 + n/2 - 2] + cn
= p [3n/4 + 1/2 - 2/n] + cn
= 3pn/4 + p/2 - 2p/n + cn
= pn - (pn/4 - cn - p/2)

In order to complete the proof, we need to show that for sufficiently large n, pn - (pn/4 - cn - p/2) is at most pn or, equivalently (pn/4 - cn - p/2) > 0 => n(c- p/4) > p/2 => n > (p/2)/(c- p/4)

(p/2)/(c- p/4) is a small constant, so our guess would be correct for n > (p/2)/(c- p/4). So average case time complexity of quick select algorithm T(n) = pn = O(n)

Space complexity: Worst case situation occurs when the partition is bad and the height of the recursion tree will be O(n). In such a scenario, recursion decreases by 1 and allocates O(n) stack space. But in the average case, space complexity would be O(logn). (Think!)

Important note: we recommend transforming the above pseudo-codes into a favorite programming language (C, C++, Java, Python, etc.) and verifying all the test cases. Enjoy programming!

Critical ideas to think!

  • Can we improve the time complexity of the 4th approach to O(n) in the worst-case scenario? Explore the median finding algorithm for finding the kth smallest element.
  • How do we modify the above algorithms to handle duplicates?
  • In the 1st approach, do we need to sort the entire array?
  • What would be other ways to perform the partition of an array?
  • Why average case space complexity is O(logn) in the 4th approach?
  • Can this problem be solved using a Binary Search Tree? What would be the complexity in that case?
  • Why is Heap preferred over BST for the Priority Queue implementation?
  • In the 3rd approach, why do we ignore elements larger than root in the second part of the algorithm? What would be the worst and best case input?

Comparisons of time and space complexities

  • Using sorting: Time = O(nlogn), Space = O(1)
  • Using min-heap: Time = O(n + klogn), Space = O(1)
  • Using max-heap: Time = O(nlogk), Space = O(1)
  • Using Quick-Select: Time = O(n) average, Space = O(logn) average

Similar coding questions to practice

  • Find the Kth largest element.
  • Find k smallest elements in an array.
  • Median of two sorted arrays of the same size
  • The kth missing element in an unsorted array
  • Find the kth largest element in a stream.
  • kth Smallest Element in a Sorted Matrix
  • k smallest elements in an array
  • kth element of two sorted Arrays
  • Find the top k frequent elements.

Please write in the message below if you find anything incorrect, or you want to share more insight, or you know some different approaches to solve this problem. Enjoy learning, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.