Find N Repeated Element in 2N Size Array

Difficulty: Medium, Asked-in: Google

Key takeaway: An excellent problem to learn time complexity optimization using mathematical approach. Sometimes mathematical insights into the problem can help us to get efficient solutions.

Let’s understand the problem

In array of size 2N, there are N + 1 unique elements, and exactly one of these elements is repeated N times. Write a program to find the element repeated N times.

Examples

Input: X[] = [1, 2, 2, 3], Output: 2

Input: X[] = [2, 1, 2, 5, 3, 2, 2, 4], Output: 2

Input: X[] = [5, 1, 5, 2, 5, 3, 5, 4, 6, 5], Output: 5

Discussed solution approaches

  • Brute force approach  using a hash table
  • Efficient approach  using the pigeon hole principle

A brute force approach using a hash table

Solution idea

A simple approach would be to traverse the array and count frequency of each element using hash table. During the traversal, if we find an element with frequency count greater than 1, we return that element. Why are we returning element with frequency count greater than 1? The idea is simple: There is only one element repeated N times and other N elements are unique.

From another perspective, we only need to find the repeated element in the array. So we take a hash table of size equal to the total number of unique elements i.e. N + 1 and run a while loop till i < n (Here n = 2N) to update the count of each element. When we find an element with a count greater than 1, we return that element as an output.

Note: In hash table, we use array elements as key and frequency count as value. Initially, all hash table entries are initialised with 0.

Solution pseudocode

int nRepeated(int X[], int n)
{
    HashTable H
    int i = 0
    while (i < n)
    {
        H[X[i]] = H[X[i]] + 1
        if(H[X[i]] > 1)
            return X[i]
        i = i + 1    
    }
}

Solution analysis

At each iteration, we are doing constant operations because insertion and searching in the hash table will take O(1) time average. In the best case, repeated element will be present at indexes 0 and 1 and above code will perform constant operations. So time complexity in the best case = O(1). What would be the worst-case scenario?

In the worst case, the first N elements are unique and the remaining N repeated elements are present at the last N positions. So the above while loop will run N times in the worst case to find the repeated element. Time complexity in the worst case = O(N)

We are using O(N) extra space to store frequency of elements in the hash table. Space complexity = O(N).

An efficient approach using the pigeon hole principle

Solution idea

Can we think to solve this problem in O(N) time without using extra space? As given in the problem, if an element x is repeated, then it must be the element that has been repeated N times because all other elements are unique. How can we use this insight to solve this problem efficiently in O(1)? Let's think!

Suppose indexes of all x’s in the array are X1, X2, ... XN, where X1 < X2 <...< XN. Let's calculate the average gap between all consecutive x’s.

  • Average gap = [(X2 - X1) + (X3 - X2) +...(XN - XN-1)] / (N - 1).
  • Here X2, X3, ...XN-1 cancel out each other. So Average gap = (XN - X1)/(N - 1)

As array size is 2N, so max possible value of XN = 2N - 1, min possible value of X1 = 0. If we put the values of XN and X1 in the above formula, we will get the max upper limit of the average gap.

  • Average gap <= (2N - 1)/(N - 1).
  • Equality holds when N = 2.
  • For all N > 2, (2N - 1)/(N - 1) is less than 3.
  • So average gap <= 3, for N > 2.

If the average gap is 3, it means: Gap between some x’s is more than 3 and the gap between some x’s is less than 3. So we can argue that: There must exist at least two x’s in the array such that their gap is at most 3. In simple words: If half array elements are repeated, it doesn’t matter how we shuffle it; for one of the repeated elements, there has to be another repeated element at least three positions away. Otherwise, array size will not be 2N. Think!

Solution steps

If we just check elements at gaps 1, 2 and 3, then we definitely find the repeated elements.

  • We run an outer loop from k = 1 to 3 to create a window of gap lengths 1, 2, and 3. In other words: For each element, we are using this loop to explore elements at gap lengths 1, 2, and 3.
  • Now for each window size k, we run an inner loop to traverse the array from i = 0 to n - k (Here n = 2N) and compare the first and last elements of the window.
  • If X[i] == X[i + k], then element we have found the repeated element and return X[i] as an output.

Solution pseudocode

int nRepeated(int X[], int n)
{
    for (int k = 1; k <= 3; k = k + 1)
    {
        for (int i = 0; i < n - k; i = i + 1)
        {
            if (X[i] == X[i + k])
                return X[i]
        }
   }
}

Solution analysis

We are using two nested loops and doing one comparison operation at each iteration of the inner loop. In the worst case, outer loop will run three times, and we need to traverse each element via inner loop. So time complexity = Time complexity of outer loop * Time complexity of inner loop = O(1)*O(n) = O(n) = O(N).

We are using constant extra space. So space complexity = O(1)

Critical ideas to think!

  • What is the pigeonhole principle? How can we use it in problem-solving?
  • Can we solve this problem by sorting the input? If yes, then what would be the time and space complexity?
  • Can we solve this problem efficiently using some other approach?
  • Can we further optimize the above approach? Is it okay to compare each element with its 2 right neighbors instead of 3?
  • Let's design a randomized algorithm: randomly pick two array elements and check whether they come from different cells and have the same value. If they do, then return true. So after how many numbers of repetition algorithm succeeds with probability 1 – 1/n? Out of all possible combinations of picking two elements, how many will result in success?
  • Based on the above observation, suppose we only check all consecutive triplets: if X[i] == X[i + 1] or X[i] == X[i + 2] or X[i] == X[i + 3]. If anyone of the condition is true, we return X[i]. The critical question is: Is this approach correct? If there is some bug, then how can we fix it? Think!
int nRepeated(int X[], int n) 
{
    for (int i = 3; i < n; i = i + 1)
    {
        if (X[i] == X[i - 1] || X[i] == X[i - 2] || X[i] == X[i - 3])
            return X[i]
    }        
}

Comparisons of time and space complexities

  • Brute force approach: Time = O(N), Space = O(N)
  • Using pigeon hole principle : Time = O(N), Space = O(1)

Suggested coding problems to practice

  • Find the two non-repeating elements in an array of repeating elements.
  • Find majority element in an array
  • Find the two numbers with odd occurrences in an unsorted array.
  • Find the first repeating element in an array of integers.
  • Find the maximum repeating number
  • Find the first non-repeating character of the given string
  • Maximum consecutive repeating character in string

Please write comments if you find an error or you want to share more insights about the topic. Enjoy learning, Enjoy coding, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.