Find First Missing Positive in an Array

Difficulty: Hard, Asked-in: Google, Amazon, Samsung

Key takeaway: An excellent problem to learning step-by-step optimization using various approaches. In-place hashing solution is an intuitive approach that uses the same input array to process values and generate correct output.

Let’s understand the problem

Given an array that includes positive and negative numbers, write a program to find the smallest missing positive integer.

Example 1

Input: X[] = [2, -9, 5, 11, 1, -10, 7], Output: 3

Explanation: Consecutive positive integers 1 and 2 are present in the array. So the first missing positive is 3.

Example 2

Input: X[] = [2, 3, -2, 1], Output: 4

Explanation: Consecutive positive integers 1, 2, and 3 are present in the array. So the first missing positive is 4.

Example 3

Input: X[] = [4, 6, 5, 3, 8], Output: 1

Explanation: All values are positive, but the first positive integer 1 is missing in the input. So the output is 1.

Discussed solution approaches

  • Brute force approach using two nested loops
  • Using sorting and single scan
  • Using a hash table
  • Using in-place hashing
  • Using the idea of partition and in-place hashing

Brute force approach  using two nested loops

Solution idea

Positive number starts from 1 on the number scale. If the first missing positive number is k, then all the positive numbers from 1 to k - 1 will be present in the array.

The critical question is: What will be the maximum possible missing positive number in an array of size n? Think! The answer is simple: In the n size array, the maximum possible value of missing positive will be n + 1. This is the scenario when all numbers from 1 to n are present in the array. 

So one basic approach would be to search all positive integers starting from 1 to n + 1 linearly in the array. If any positive integer is missing in the sequence, we will return that value as an output.

Solution code C++

// Finds the first missing positive integer in the given array
int firstMissingPositive(int X[], int n)
{
    for (int i = 1; i <= n + 1; i = i + 1)
    {
        bool missingFlag = true;
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[j] == i)
            {
                missingFlag = false;
                break;
            }
        }
        if (missingFlag == true)
            return i;
    }
}

Solution code Python

def first_missing_positive(X, n):
    for i in range(1, n + 2):
        missing_flag = True
        for j in range(n):
            if X[j] == i:
                missing_flag = False
                break
        if missing_flag:
            return i

Solution analysis

We are running two nested loops and doing constant operations at each iteration. Here both loops will run n times in the worst-case. So worst case time complexity = O(n)*O(n)*O(1) = O(n^2). We are using constant extra space, so space complexity = O(1).

Using sorting and single scan

Solution idea and steps

If we sort the input array, then all positive numbers get lined up in increasing order. So we can search linearly to find the first missing positive integer. For sorting, we can use efficient O(nlogn) sorting algorithms like heap sort, merge sort, or quicksort.

  1. We sort the array. Suppose we are using heap sort which is an in-place O(nlogn) sorting algorithm.
  2. Now we run a loop and skip the negative numbers and zero present in the starting positions of the array. Starting from the first index, when X[i] < 1, we increment i by 1. By the end of this loop, index i will be present at the first positive number.
  3. Now we run another loop from j = i to n - 1 to find the first missing positive number. Outside the loop, we also declare a variable missingPositive to track to the first missing positive. The missingPositive will be initialized with the smallest positive number i.e. 1 at the start.
  4. We start comparing the missingPositive with X[j]. If (missingPositive == X[j]), we move to the next positive number by incrementing missingPositive by 1.
  5. We continue the above process until we find some value X[j] > missingPositive. At this stage, value stored in missingPositive will be the first missing positive. So we return missingPositive as an output.

Solution code C++

int firstMissingPositive(int X[], int n)
{
    sort(X, X + n);
    int i = 0;
    while (X[i] < 1)
        i = i + 1;
    int missingPositive = 1;
    for (int j = i; j < n; j = j + 1)
    {
        if (missingPositive == X[j])
            missingPositive = missingPositive + 1;
        else if (X[j] > missingPositive)
            return missingPositive;
    }
    return missingPositive;
}

Solution code Python

def first_missing_positive(X, n):
    X.sort()
    i = 0
    while X[i] < 1:
        i = i + 1
    missing_positive = 1
    for j in range(i, n):
        if missing_positive == X[j]:
            missing_positive = missing_positive + 1
        elif X[j] > missing_positive:
            return missing_positive
    return missing_positive

Solution analysis

Time complexity = Time complexity of sorting + Time complexity of linear scan for finding the first missing positive = O(nlogn) + O(n) = O(nlogn). Space complexity = O(1) (Think!).

Using hash table

Solution idea

Now the critical question is: Can we improve the time complexity further? As emphasized in the brute force approach, this is a searching problem where we need to look for positive integers starting from 1 to n + 1. Can we think of optimizing the brute force approach? Think!

To improve efficiency of searching, one idea will be to use hash table instead of linear search. Hash table can help us to perform searching and insertion efficiently in the O(1) average.

Solution steps

  • We create hash table H of size n.
  • Now we run loop from i = 0 to n - 1 and insert all elements in the hash table. 
  • Now we another loop from i = 1 to n + 1 to search all positive numbers from 1 to n + 1. If any positive integer i is not present in hash table, we return i. Otherwise, we move to the next iteration.

Solution pseudocode

int firstMissingPositive(int X[], int n)
{
    HashTable H
    for (int i = 0; i < n; i = i + 1)
        H[X[i]] = true
    for (int i = 1; i <= n + 1; i = i + 1)
    {
        if (H[i] == false)
            return i
    }
}

Solution code C++

int firstMissingPositive(int X[], int n)
{
    unordered_map<int, bool> H;
    for (int i = 0; i < n; i = i + 1)
        H[X[i]] = true;
    for (int i = 1; i <= n + 1; i = i + 1)
    {
        if (H[i] == false)
            return i;
    }
}

Solution code Python

def first_missing_positive(X, n):
    H = {}
    for i in range(n):
        H[X[i]] = True
    for i in range(1, n + 2):
        if H.get(i, False) == False:
            return i

Solution code Java

public class Solution
{
    public int firstMissingPositive(int[] X, int n) 
    {
        HashMap<Integer, Boolean> H = new HashMap<>();
        for (int i = 0; i < n; i = i + 1)
            H.put(X[i], true);
        for (int i = 1; i <= n + 1; ii = i + 1) 
        {
            if (!H.containsKey(i))
                return i;
        }
    }
}

Solution analysis

Time complexity = Time complexity of inserting n elements into the hash table + Time complexity of searching n elements into the hash table = O(n) + O(n) = O(n). Space complexity = O(n), for the hash table of size n.

Using in-place hashing

Solution idea and steps

Now the critical question is: Can we solve this problem in-place or without using extra space? Let’s think!

  • We need to identify the first missing number in the sequence of 1 to n + 1. For this, we can use indexes of the same array to mark presence of the numbers in the sequence and put each number in its correct place, i.e X [X[i] - 1] == X[i]. For example, we can place 1 at X[0], 2 at X[1], 3 at A[2], and so on.
  • For placing numbers in their right place, we scan the array X[] from the start. We check if number X[i] is in the range of 1 to n and already present at index X[i] - 1 or not. If the first condition is true and the second is false, we swap X[i] with the number present at the index X[i] - 1. Otherwise, we ignore the current number and move to the next index. 
  • Now we scan the modified array again to find the first index where X[i] != i + 1. For some value of i, if X[i] != i + 1, we return i + 1 as the first missing positive. If we didn't find any such index, all numbers from 1 to n will be present in the array. So we return the value n + 1 as the first missing positive. 

The best part of this idea is: We are modifying the same array and storing the results to get the correct output.

Solution code C++

int firstMissingPos(int X[], int n)
{
    int i = 0;
    while (i < n)
    {
        if (X[i] > 0 && X[i] <= n && X[X[i] - 1] != X[i])
            swap(X[i], X[X[i] - 1]);
        else
            i = i + 1;
    }

    for (int i = 0; i < n; i = i + 1)
    {
        if (X[i] != i + 1)
            return i + 1;
    }
    return n + 1;
}

Solution code Python

def firstMissingPos(X, n):
    i = 0
    while i < n:
        if X[i] > 0 and X[i] <= n and X[X[i] - 1] != X[i]:
            X[i], X[X[i] - 1] = X[X[i] - 1], X[i]
        else:
            i = i + 1

    for i in range(n):
        if X[i] != i + 1:
            return i + 1
    return n + 1

Solution analysis

We are running two separate single loops. So worst case time complexity = Time complexity of modifying the array + Time complexity of searching first missing positive = O(n) + O(n) = O(n)

Space complexity = O(1), as we use the same input array to place the elements at their correct position. That's why such approaches are always premium!

Using partition process and in-place hashing

Solution idea and steps

This idea is similar to the above approach, but there is a difference! We first divide the array into two parts: The first part consists of positive numbers, and the second part consists of non-positive (0 and negative) numbers. Suppose this process returns the total count of positive elements, i.e. k.

So the idea is to use array elements as an index and mark the presence of all elements X[i] (less than or equal to k) by changing the value at the index X[i] - 1 to negative. Note that one is subtracted because the array index starts from 0, and positive numbers start from 1.

We are assuming that array can be modified.

Step 1: We first partition the input array into a group of positive and negative numbers using the partition algorithm similar to the quick sort. After the partition, the positive number starts from index 0 and ends at index positiveEnd - 1. Here positiveEnd value is returned by the partition process.

Step 2: We traverse the array containing all positive numbers from 0 to positiveEnd - 1. During this process, we mark the presence of all elements X[i] less than or equal to positiveEnd by changing the sign of the element at the index X[i] - 1 negative.

Step 3: We traverse the array from index 0 to positiveEnd -1. If we encounter a positive element at some index i, we output i + 1 as the first missing positive. If we do not encounter any positive element, then integers from 1 to positiveEnd are present in the array, and we return positiveEnd + 1 as an output. Note: It can also be the case that all the numbers are non-positive, making positiveEnd = 0. The output positiveEnd + 1 = 1 remains correct.

We can better understand this approach via an example.

Initial Array: 1 -1 4 -3 3 -5 2 8

Step 1: Partition: 1 8 4 2 3 | -5 -3 -1, positiveEnd = 5

Step 2: Modifying the input array.

  • 1 < positiveEnd, we change the sign of value at index 1 - 1 = 0

    Modified array: -1 8 4 2 3 | -5 -3 -1

  • 8 > positiveEnd, we do nothing and move to the next element.
  • 2 < positiveEnd, we change the sign of value at index 2 - 1 = 1

    Modified array: -1 -8 4 2 3 | -5 -3 -1

  • 4 < positiveEnd, we change the sign of value at index 4 - 1 = 3

    Modified array: -1 -8 4 -2 3 | -5 -3 -1

  • 3 < positiveEnd, we change the sign of value at index 3 - 1 = 2 

    Modified array: -1 -8 -4 -2 3 | -5 -3 -1

Step 3: Traversing from index 0 to positiveEnd - 1, we find X[4] = 3 > 0. The answer is 4 + 1 = 5

Why this approach is working perfectly? What is the critical reason behind it? We are leaving this intuition for the learners to think!

Solution code C++

int partitionPosNeg(int X[], int n)
{
    int j = n - 1, i = 0;

    while (i <= j)
    {
        if (X[i] <= 0)
        {
            swap(X[i], X[j]);
            j = j - 1;
        }
        else
            i = i + 1;
    }
    return i;
}

int firstMissingPositive(int X[], int n)
{
    int positiveEnd = partitionPosNeg(X, n);
    for (int i = 0; i < positiveEnd; i = i + 1)
    {
        if (abs(X[i]) - 1 < positiveEnd && X[abs(X[i]) - 1] > 0)
            X[abs(X[i]) - 1] = - X[abs(X[i]) - 1];
    }
    
    for (int i = 0; i < positiveEnd; i = i + 1)
    {
        if (X[i] > 0)
            return i + 1;
    }       
    return positiveEnd + 1;
}

Solution code Python

def partitionPosNeg(X, n):
    i = 0
    j = n - 1

    while i <= j:
        if X[i] <= 0:
            X[i], X[j] = X[j], X[i]
            j = j - 1
        else:
            i = i + 1
    return i

def firstMissingPositive(X, n):
    positive_end = partitionPosNeg(X, n)
    for i in range(positive_end):
        if abs(X[i]) - 1 < positive_end and X[abs(X[i]) - 1] > 0:
            X[abs(X[i]) - 1] = -X[abs(X[i]) - 1]
    
    for i in range(positive_end):
        if X[i] > 0:
            return i + 1
    return positive_end + 1

Solution analysis

Time complexity = Time complexity of the partition + time complexity of modifying the input array + Linear scan to find the missing positive = O(n) + O(n) + O(n) = O(n)

Space complexity = O(1), we are solving the problem in place using the same input array!

Critical ideas to think!

  • In the 4th and 5th approaches, why are we not storing the occurrences of numbers greater than n?
  • In the 5th approach, why are we taking the absolute value of X[i]?
  • What if we need to find the largest missing negative number?
  • Is it possible to output all the numbers missing in the array in the range [1, n]?
  • In the 2nd approach, our array is sorted. Can we optimize the search by applying the binary search algorithm?
  • Can we solve the problem using some other approach?
  • Will the 4th and 5th approaches work if the input array contains large numbers?
  • In the 3rd approach, do we need to store all non-positive integers?
  • What if all the elements that are present in the array are negative? Do the above algorithms cover this condition?

Comparisons of time and space complexities

  • Using two nested loops: Time = O(n^2), Space = O(1)
  • Using sorting and single scan: Time = O(nlogn), Space = O(1)
  • Using a hash table: Time = O(n), Space = O(n)
  • Using in-place hashing: Time = O(n), Space = O(1)
  • Using partition and In-place hashing: Time = O(n), Space = O(1)

Suggested coding problems to practice

  • The kth missing element in an unsorted array.
  • Smallest prime number missing in an array.
  • Find the missing integer in an array if the mean is given.
  • Find the missing number in another array which is a shuffled copy.

Enjoy learning, Enjoy algorithms!

More from EnjoyAlgorithms

Self-paced Courses and Blogs