Longest Consecutive Sequence

Difficulty: Hard, Asked-in: Google, Amazon, LinkedIn, Walmart.

Key takeaway: This is an excellent problem to learn problem-solving and step-by-step optimization using sorting and a hash table.

Let's understand the problem!

Given an array X[] of n integers, write a program to find the length of the longest consecutive sequence. In other words, we need to find the length of the longest subsequence such that the elements in the subsequence are consecutive integers.

  • The consecutive integers can be in any order.
  • There can be more than one consecutive subsequence of the longest length. So we just need to return the length of it.
  • Input integers can be positive, negative, or zero.
  • Input values can be repeated.

Example 1

Input: X[] = [4, 7, 12, 8, 10, 3], Output: 4

Explanation: [1, 2, 3, 4] is the longest subsequence of consecutive elements.

Example 2

Input: X[] = [0, -3, 5, -1, 7, -2, -4, 1, 3], Output: 6

Explanation: There are two longest consecutive sequences of length 6: [-4, -3, -2, -1, 0, 1] and [-2, -1, 0, 1, 2, 3]. So, we return 6 as an output.

Example 3

Input: X[] = [0, 3, 7, 2, 5, 8, 4, 6, 0, 2, 1], Output: 9

Explanation: Here 2 and 3 are repeated, but all the unique integers are part of the longest consecutive sequence i.e. 0, 1, 2, 3, 4, 5, 6, 7, 8.

Discussed solution approaches

  • Brute force approach  using nested loops
  • Solution approach using sorting and single loop
  • Efficient approach using a hash table

Brute force approach  using nested loops

Solution idea

The longest consecutive sequence must start from some element in the array. So, the basic idea would be to explore each possibility: pick each element in the input and conduct a linear search to count the length of the longest consecutive sequence starting from that element.

We also keep track of the longest length of consecutive sequences seen so far during this process. The critical question is: how do we implement this? Let's think!

Solution steps

Step 1: We initialize a variable longestLength to track the length of the longest consecutive sequence: longestLength = 0.

Step 2: We run an outer loop from i = 0 to n - 1 to traverse the input array. Inside the loop, we initialize two variables: currLength to store the length of the longest consecutive sequence starting from any element X[i] and currElement to track the consecutive element of the sequence starting from element X[i]: currElement = X[i], currLength = 1.

Step 3: Now we run an inner loop to search for the next element in the sequence (currElement + 1) using linear search. If the next element is present in the array, we increment the value of currLength by 1 and move to the next possibility of the consecutive element by incrementing currElement by 1.

This process continues in an inner loop until we find an element missing in the sequence. If the linear search returns false, we stop the inner loop.

while(linearSearch(X, n, currElement + 1) == true)
{
    currElement = currElement + 1
    currLength = currLength + 1
}

Step 4: At the end of the inner loop, if (currLength > longestLength), we update the longestLength with currLength. In other words, the length of the consecutive sequence starting from element X[i] is larger than the length of the longest consecutive sequence calculated until that point.

if(longestLength < currLength)
   longestLength = currLength

Step 5: Now we move to the next iteration of the outer loop to do a similar process and calculate the longest consecutive sequence starting from the element X[i+1]. At the end of the outer loop, we return the value stored in the variable longestLength.

Solution code C++

bool linearSearch(int X[], int n, int k)
{
    for (int i = 0; i < n; i = i + 1)
    {
        if (X[i] == k)
            return true;
    }
    return false;
}

int longestConsecutiveSequence(int X[], int n)
{
    int longestLength = 0;
    for (int i = 0; i < n; i = i + 1)
    {
        int currElement = X[i];
        int currLength = 1;
        while(linearSearch(X, n, currElement + 1) == true)
        {
            currElement = currElement + 1;
            currLength = currLength + 1;
        }
        longestLength = max(longestLength, currLength);
    }
    return longestLength;
}

Solution code Python

def linearSearch(X, n, k):
    for i in range(n):
        if X[i] == k:
            return True
    return False

def longestConsecutiveSequence(X, n):
    longestLength = 0
    for i in range(n):
        currElement = X[i]
        currLength = 1
        while linearSearch(X, n, currElement + 1):
            currElement = currElement + 1
            currLength = currLength + 1
        longestLength = max(longestLength, currLength)
    return longestLength

Solution analysis

For each element X[i], we find the length of the longest consecutive streak of integers using the inner loop. So, the overall time complexity is equal to n times the time complexity of finding the longest consecutive streak starting from each element, which is equal to n times the time complexity of the inner while loop.

The time complexity of the inner while loop depends on two things: 1) The length of the longest consecutive streak starting from a given element (this could be n in the worst case), and 2) The time complexity of searching for an element in the streak linearly (which is O(n) in the worst case).

The time complexity of the inner while loop in the worst case is n * O(n) = O(n²). This process is repeated for each element of the array. Therefore, the overall time complexity is n * O(n²) = O(n³).

The space complexity is O(1), as we are using a constant number of variables.

Using sorting and single loop

Solution idea and steps

Suppose we sort the input and iterate over each element; it will be easy to find sequences of consecutive numbers because consecutive elements will be lined up linearly next to each other.

Step 1: We initialize two variables, currLength and longestLength, to track the current length of the consecutive sequence and the length of the longest consecutive sequence.

Step 2: We sort the input using an efficient in-place sorting algorithm such as heap sort.

Step 3: Now, we traverse the sorted array and compare each element X[i] to its previous element X[i - 1].

  • If both are equal, we simply move to the next element.
  • If they are not equal, we check whether the current element is the next element in the consecutive sequence, i.e., X[i] = X[i - 1] + 1. If it is, we increment currLength by 1 and move to the next iteration.
  • Otherwise, the sequence is broken. We update the value of the longestLength seen so far, reset the value of currLength to 1, and move to the next iteration.

By the end of the loop, it is possible that the last element X[n - 1] may be part of the longest sequence. In other words, if (X[n - 1] == X[n - 2] + 1), then X[n - 1] is a part of the continuous sequence of X[n - 2], and currLength gets incremented to 1.

After this, the loop will end due to the loop condition in the next iteration, and the updated value of currLength will not be considered for the calculation of longestLength. To handle this, we need to return the maximum of currLength and longestLength by the end of the loop, i.e., return max(currLength, longestLength).

Solution code C++

int longestConsecutiveSequence(int X[], int n)
{
    sort(X, X + n);
    int longestLength = 1;
    int currLength = 1;
    for (int i = 1; i < n; i = i + 1)
    {
        if (X[i] != X[i - 1])
        {
            if (X[i] == X[i - 1] + 1)
                currLength = currLength + 1;
            else
            {
                longestLength = max(longestLength, currLength);
                currLength = 1;
            }
        }
    }
    return max(longestLength, currLength);
}

Solution code Python

def longestConsecutiveSequence(X, n):
    X = sorted(X)
    longestLength = 1
    currLength = 1
    for i in range(1, n):
        if X[i] != X[i - 1]:
            if X[i] == X[i - 1] + 1:
                currLength = currLength + 1
            else:
                longestLength = max(longestLength, currLength)
                currLength = 1
    return max(longestLength, currLength)

Solution analysis

Suppose we are using some efficient O(nlogn) sorting algorithm like merge sort or heap sort or quicksort. So time complexity = Time complexity of sorting + Linear traversal of the array = O(nlogn) + O(n) = O(nlogn).

Space complexity: If we use heap sort, O(1), else if we use merge sort, O(n).

Efficient solution approach  using hash table

Solution idea

In the previous solution, sorting helped us calculate the longest sequence in O(n), but the sorting algorithm still dominates the overall time complexity. The critical question is: how can we optimize the time complexity further? Let’s think.

The solution idea is inspired by the brute-force approach. Instead of using a linear search to find the next element in the sequence, can we think of using a hash table? As we know, the hash table does fast searching in O(1) time complexity on average.

If we observe the problem clearly, there will be two types of elements in the array:

  • Type 1: Elements that are the starting elements of some consecutive sequence.
  • Type 2: Elements that are the intermediate values of some consecutive sequence.

If we know the starting element of any consecutive sequence (Type 1), we can easily calculate the length of the sequence by searching for all the next successive elements. So, one solution idea would be to identify all elements of Type 1, calculate the consecutive sequence length starting from any such element, and return the max among them.

If we observe the sorted array approach, we are doing a similar process. When we encounter a different starting element, we reset the sequence length and update the max sequence length seen so far. But how do we implement this idea using a hash table? Let's think!

Solution steps

  1. We initialize a hash table of size n and a variable longestLength to store the length of the longest consecutive sequence.
  2. We iterate over the input array and insert all the elements into the hash table.
  3. Now, we traverse each element X[i] using a loop:

    • We search X[i] - 1 in the hash table. If it exists, it is not the first element of its corresponding sequence. So we can ignore it and move to the next element.
    • If X[i] - 1 does not exist in the hash table, then X[i] is the first element of its corresponding sequence, and we use a similar process used in the brute approach. The only difference here is that we use a hash table instead of a linear search to find the consecutive occurrences.
    • Now we calculate the longest consecutive sequence starting from X[i] and store it in the variable currLength. We also update the length of the longest consecutive sequence seen so far i.e., longestLength = max(longestLength, currLength).
  4. By the end of the loop, the value of the longest consecutive sequence is stored in the variable longestLength. We return this value.

Solution pseudocode

int longestConsecutiveSequence(int X[], int n)
{
    HashTable H
    int longestLength = 0
    for(int i = 0; i < n; i = i + 1)
        H.insert(X[i])
   
    for(int i = 0; i < n; i = i + 1)
    {
        if (H.search(X[i] - 1) == false)
        {
            int currLength = 1
            int currElement = X[i]
            while(H.search(X, currElement + 1) == true)
            {
                currLength = currLength + 1
                currElement = currElement + 1
            }
            longestLength = max(longestLength, currLength)
       }
   }
   return longestLength
}

Solution code C++

int longestConsecutiveSequence(int X[], int n)
{
    unordered_set<int> H;
    for (int i = 0; i < n; i = i + 1)
        H.insert(X[i]);

    int longestLength = 0;
    for (int i = 0; i < n; i = i + 1)
    {
        if (H.find(X[i] - 1) == H.end())
        {
            int currLength = 1;
            int currElement = X[i];
            while (H.find(currElement + 1) != H.end())
            {
                currLength = currLength + 1;
                currElement = currElement + 1;
            }
            longestLength = max(longestLength, currLength);
        }
    }
    return longestLength;
}

Solution code Java

int longestConsecutiveSequence(int[] X, int n) 
{
    HashSet<Integer> H = new HashSet<>();
    for (int i = 0; i < n; i = i + 1)
      H.add(X[i]);

    int longestLength = 0;
    for (int i = 0; i < n; i = i + 1) 
    {
        if (H.contains(X[i] - 1) == false) 
        {
            int currLength = 1;
            int currElement = X[i];
            while (H.contains(currElement + 1)) 
            {
                currLength = currLength + 1;
                currElement = currElement + 1;
            }
            
            longestLength = Math.max(longestLength, currLength);
        }
    }
    return longestLength;
}

Solution code Python

def longestConsecutiveSequence(X, n):
    H = set(X)
    longestLength = 0
    for i in range(n):
        if X[i] - 1 not in H:
            currLength = 1
            currElement = X[i]
            while currElement + 1 in H:
                currLength = currLength + 1
                currElement = currElement + 1
            longestLength = max(longestLength, currLength)

    return longestLength

Solution analysis

At first sight, the time complexity appears to be quadratic due to the two nested loops. However, a closer look is necessary because the while loop runs only when an element X[i] marks the beginning of a sequence. A better idea would be to calculate the count of critical operations inside the loop for a more detailed analysis.

  • Searching in the hash table is a crucial operation within the loop. In the worst case, each element is searched at most twice: first in the if condition and second in the while loop condition.
  • The array contains n elements, and the time complexity of each search operation is O(1).

Overall time complexity = Time complexity of inserting n elements into hash table + Time complexity of searching n elements twice = n*O(1) + 2*n*O(1)= O(n).

Space complexity = O(n), for the hash table.

Critical ideas to think!

  • What will be the time complexity of this optimized version of the brute force approach?
int longestConsecutiveSequence(int X[], int n)
{
    int longestLength = 0;
    for (int i = 0; i < n; i = i + 1)
    {
        if (linearSearch(X, n, X[i] - 1) == false)
        { 
            int currElement = X[i];
            int currLength = 1;
            while (linearSearch(X, n, currElement + 1) == true)
            {
                currElement = currElement + 1;
                currLength = currLength + 1;
            }
            longestLength = max(longestLength, currLength);
        }
    }    
    return longestLength;
}
  • Do the above algorithms work if there are repeated numbers in the array?
  • Can we think of solving this problem using dynamic programming?
  • Can we solve this problem using other data structures like a heap, BST, etc.? What is the worst-case time complexity if we use a BST in place of a Hash Table?
  • What are the best and worst-case inputs for all the above approaches?

Comparison of time and space complexities

  • Using nested loops: Time = O(n^3), Space = O(1).
  • Using sorting and single loop: Time = O(nlogn), Space = O(1).
  • Using a hash table: Time = O(n), Space = O(n).

Suggested coding problems to practice

  • First missing positive
  • Most frequent element in array
  • n repeated element in 2n size array
  • Count distinct elements in every window
  • Largest subarray with 0 Sum
  • Subarray sum equals K
  • Minimum size subarray sum
  • Longest palindromic substring
  • Maximum length of repeated subarray
  • Minimum window substring
  • Maximum product subarray
  • Longest increasing subsequence
  • Longest substring without repeating characters

If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Share Your Insights

☆ 16-week live DSA course
☆ 16-week live ML course
☆ 10-week live DSA course

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.