Remove Duplicate Elements from Sorted Array

Difficulty: Easy, Asked-in: Google, Facebook, Microsoft, Amazon, Morgan Stanley

Key takeaway: An excellent problem to learn optimization using fast and slow pointers. We have discussed two O(n) time in-place solutions.

Let’s understand the problem

We are given a sorted array, and write a program to remove duplicate elements such that there is a single occurrence of each element in the array.

  • We need to return the length of array containing unique elements. It doesn't matter what you leave beyond new length.
  • The input array is passed in by reference, which means a modification to the input array will be known to the caller.

Example 1

Input: X[] = [2, 2, 2, 2, 2, 3, 3, 3], Output: 2, X[] = [2, 3]

Explanation: Our function should return length = 2, with first two elements of X[] being 2 and 3 respectively. It doesn't matter what we leave beyond the returned length.

Example 2

Input: X[] = [1, 2, 2, 3, 4, 4, 4, 5, 5], Output: 5, X[] = [1, 2, 3, 4, 5]

Explanation: Our function should return length = 5, with first five elements of X[] being 1, 2, 3, 4 and 5 respectively.

Discussed solution approaches

  • A brute force approach using extra memory
  •  An efficient approach using two pointers
  •  An efficient approach by counting duplicates

A brute force approach using extra memory

Solution Idea and Steps

One idea would be to store unique elements into an extra memory and copy unique elements to the original array. By the end of this process, we return new length of the input array.

  1. We declare extra array unique[n] to store unique elements.
  2. Now we scan input array and copy unique elements of X[] to unique[]. We will keep track of the unique element count using the variable count.
  3. Now we copy unique elements from unique[] to X[].
  4. Finally, we return the value of the count.

Solution pseudocode

int removeDuplicates(int X[], int n) 
    if (n == 0 || n == 1) 
        return n
    int unique[n]
    int count = 0
    for (int i = 0; i < n - 1; i = i + 1) 
        if (X[i] != X[i + 1]) 
            unique[count] = X[i]
            count = count + 1
    unique[count] = X[n - 1]
    count = count + 1
    for (int i = 0; i < count; i = i + 1)
        X[i] = unique[i]
    return count

Time and space complexity analysis

We are linearly traversing the array twice. So time complexity = Traversing the array to store unique elements + Traversing the array to copy unique elements = O(n) + O (count) = O(n). The value of count is equal to n in the worst case. Space complexity = O(n) for the extra array unique[n].

An efficient approach using two pointers

Solution Idea

Can we use the sorted array property to solve this problem in place? Let's think.

In sorted array, all duplicate elements will be placed adjacent to each other. So, we can easily track unique elements and place them into front positions of array. For this purpose, we use one fast pointer to scan array and slow pointer to track the position of unique elements.

Solution Steps

  • We initialise slow and fast pointers i.e. slow = 0, fast = 1
  • Now we scan input array till fast < n. Whenever we find a duplicate element i.e X[fast] == X[slow], then we skip the duplicate and increment fast by 1. When we encounter a unique element i.e. X[fast] != X[slow], we increment slow by 1 and copy X[fast] to X[slow]. We also increment fast by 1.
  • We repeat the above process until fast reaches the end of array. Finally, we return the count of unique element which is slow + 1. (Think!)

Solution Pseudocode

int removeDuplicates(int X[], int n) 
    if (n == 0)
        return 0
    int slow = 0
    int fast = 1
    while (fast < n)
        if (X[fast] != X[slow]) 
            slow = slow + 1
            X[slow] = X[fast]
        fast = fast + 1     
    return slow + 1

Time and space complexity analysis

We are doing a single scan of array, so time complexity = O(n). We are just using two extra variables, so space complexity = O(1).

An efficient approach by counting the duplicates

Solution Idea and Steps

Here is another idea to solve problem by counting duplicates in the array.

  • We scan the array and track duplicate element count using variable duplicateCount.
  • When we found a duplicate i.e. X[i] == X[i-1], we increase variable duplicateCount by one.
  • Otherwise, we find the first appearance of element X[i]. So we place X[i] at the front i.e. at index i  -  duplicateCount.
  • Finally, we return the count of unique elements i.e. n  -  duplicateCount. (Think!)

Solution Pseudocode

int removeDuplicates(int X[], int n) 
    int duplicateCount = 0
    for (int i = 1; i < n; i = i + 1)
        if (X[i] == X[i - 1]) 
            duplicateCount = duplicateCount + 1
            X[i - duplicateCount] = X[i]
    return n - duplicateCount

Time and space complexity analysis

We are doing a single scan of the array, so time complexity = O(n). Space complexity = O(1), we are just using one extra variable.

Critical ideas to think!

  • What would be the best and worst-case scenario of all three approaches?
  • Will above solutions work when there are no duplicate elements at all?
  • Inside while loop of the 2nd approach, if (X[fast] != X[slow]), why are we incrementing the slow pointer before copying the value? Similarly in the 3rd approach, if (X[i] == X[i - 1]), why are we copying X[i] at X[i - duplicateCount]?
  • Can we solve this problem using BST or other data structures?
  • After for loop in brute force approach, why are we adding instruction unique[count] = X[n - 1]?
  • To solve this problem, can we use an idea similar to the quicksort partition process?
  • Why does the two-pointers approach work perfectly in the case of a sorted array?
  • How do we approach this problem if we have an unsorted array in the input?

Comparisons of time and space complexities

  • Brute force approach: Time = O(n), Space = O(n)
  • Using two pointers: Time = O(n), Space = O(1)
  • By counting duplicates: Time = O(n), Space = O(1)

Similar coding questions to practice

  • Sort an array of 0s, 1s and 2s
  • Move zeroes to an end
  • Sort an array in a waveform
  • Find duplicates in an Array with values 1 to N
  • Valid Mountain in an array
  • Find the frequencies of all duplicate elements in the array
  • Remove the duplicates such that duplicates appeared at most twice

Thanks to Suaysh Namdeo for his contribution in creating the first version of the content. Please write comments if you find an error or you want to share more insights about the topic. Enjoy learning, Enjoy coding, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.