Remove Duplicate from Sorted Array

Difficulty: Easy Asked in: Google, Facebook, Microsoft, Amazon, Morgan Stanley

Key takeaway: This problem is an excellent way to learn optimization using fast and slow pointers. We have discussed two in-place solutions that take O(n) time.

Let's understand the problem

We are given a sorted array and need to write a program to remove duplicate elements so that there is a single occurrence of each element in the array. Note: After removing duplicates, we need to return the length of the array containing unique elements. It doesn't matter what we leave beyond the new length.

Example 1

Input: X[] = [2, 2, 2, 2, 2, 3, 3, 3], Output: 2, X[] = [2, 3]

Explanation: Our function should return length = 2, with the first two elements of X[] being 2 and 3, respectively. It doesn't matter what we leave beyond the returned length.

Example 2

Input: X[] = [1, 2, 2, 3, 4, 4, 4, 5, 5], Output: 5, X[] = [1, 2, 3, 4, 5]

Explanation: Our function should return length = 5, with the first five elements of X[] being 1, 2, 3, 4, and 5, respectively.

Discussed solution approaches

  • Brute force approach using extra memory
  •  Efficient approach using two pointers
  •  Efficient approach by counting duplicates

Brute force approach using extra memory

Solution idea and steps

One idea would be to store unique elements in extra memory and copy them back to the original array. By the end of this process, we will return the new length of the input array.

  • We declare an extra array called unique[n] to store the unique elements.
  • We scan the input array and copy the unique elements of X[] to unique[]. During this process, we will keep track of the unique element count using the variable count.
  • We then copy the unique elements from unique[] back to X[].
  • Finally, we return the value of the count.

Solution code C++

int removeDuplicatesArray(int X[], int n) 
{ 
    if (n == 0 || n == 1) 
        return n;
    int unique[n];
    int count = 0;
    for (int i = 0; i < n - 1; i = i + 1) 
    {
        if (X[i] != X[i + 1]) 
        {
            unique[count] = X[i];
            count = count + 1;
        }
    } 
    unique[count] = X[n - 1];
    count = count + 1;
    for (int i = 0; i < count; i = i + 1)
        X[i] = unique[i];
    return count;
}

Solution code Python

def removeDuplicatesArray(X, n):
    if n == 0 or n == 1:
        return n
    unique = [0] * n
    count = 0
    for i in range(n - 1):
        if X[i] != X[i + 1]:
            unique[count] = X[i]
            count = count + 1
    unique[count] = X[n - 1]
    count = count + 1
    for i in range(count):
        X[i] = unique[i]
    return count

Time and space complexity analysis

We are linearly traversing the array twice, so the time complexity is Traversing the array to store unique elements + Traversing the array to copy unique elements = O(n) + O(count) = O(n). The value of count is equal to n in the worst case. Space complexity = O(n) for the extra array unique[n].

An efficient approach using two pointers

Solution idea

Can we use the sorted array property to solve this problem in place? Let's think.

In a sorted array, all duplicate elements will be placed adjacent to each other. So, we can easily track unique elements and place them into the front positions of the array. For this purpose, we use one fast pointer to scan the array and a slow pointer to track the position of unique elements.

Solution steps

  • We initialize slow and fast pointers i.e. slow = 0, fast = 1.
  • Now we scan the input array till fast < n. Whenever we find a duplicate element i.e. X[fast] == X[slow], then we skip the duplicate and increment fast by 1. When we encounter a unique element i.e. X[fast] != X[slow], we increment the slow pointer by 1 and copy X[fast] to X[slow]. We also increment fast by 1.
  • We repeat the above process until fast reaches the end of the array. Finally, we return the count of unique elements, which is slow + 1. (Think!)

Solution code C++

int removeDuplicatesArray(int X[], int n) 
{
    if (n == 0)
        return 0;
    int slow = 0;
    int fast = 1;
    while (fast < n)
    {
        if (X[fast] != X[slow]) 
        {
            slow = slow + 1;
            X[slow] = X[fast];
        }
        fast = fast + 1;     
    }
    return slow + 1;
}

Solution code Python

def removeDuplicatesArray(X, n):
    if n == 0:
        return 0
    slow = 0
    fast = 1
    while fast < n:
        if X[fast] != X[slow]:
            slow = slow + 1
            X[slow] = X[fast]
        fast = fast + 1     
    return slow + 1

Time and space complexity analysis

We are doing a single scan of the array, so the time complexity is O(n). We are just using two extra variables, so the space complexity is O(1).

Efficient approach by counting the duplicates

Solution idea and steps

Here is another idea to solve the problem by counting duplicates in the array:

  • We scan the array and track the count of duplicate elements using the variable duplicateCount.
  • When we find a duplicate, i.e., X[i] == X[i-1], we increase the variable duplicateCount by one.
  • Otherwise, we find the first appearance of element X[i]. So we place X[i] at the front, i.e., at index i - duplicateCount.
  • Finally, we return the count of unique elements, i.e., n - duplicateCount. (Think!)

Solution code C++

int removeDuplicatesArray(int X[], int n) 
{
    int duplicateCount = 0;
    for (int i = 1; i < n; i = i + 1)
    {
        if (X[i] == X[i - 1]) 
            duplicateCount = duplicateCount + 1;
        else 
            X[i - duplicateCount] = X[i];
    }
    return n - duplicateCount;
} 

Solution code Python

def removeDuplicatesArray(X, n):
    duplicateCount = 0
    for i in range(1, n):
        if X[i] == X[i - 1]: 
            duplicateCount = duplicateCount + 1
        else:
            X[i - duplicateCount] = X[i]
    return n - duplicateCount

Time and space complexity analysis

We are doing a single scan of the array, so the time complexity is O(n). The space complexity is O(1), and we are just using one extra variable.

Critical ideas to think!

  • What would be the best and worst-case scenarios of all three approaches?
  • Will the above solutions work when there are no duplicate elements at all?
  • Inside the while loop of the 2nd approach, if (X[fast] != X[slow]), why are we incrementing the slow pointer before copying the value? Similarly, in the 3rd approach, if (X[i] == X[i - 1]), why are we copying X[i] at X[i - duplicateCount]?
  • Can we solve this problem using BST or other data structures?
  • After the for loop in the brute force approach, why are we adding the instruction unique[count] = X[n - 1]?
  • To solve this problem, can we use an idea similar to the quicksort partition process?
  • Why does the two-pointers approach work perfectly in the case of a sorted array?
  • How do we approach this problem if we have an unsorted array in the input?

Comparisons of time and space complexities

  • Brute force approach: Time = O(n), Space = O(n)
  • Using two pointers: Time = O(n), Space = O(1)
  • By counting duplicates: Time = O(n), Space = O(1)

Similar coding questions to practice

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Share Your Insights

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.