Intersection of Two Unsorted Arrays

Difficulty: Medium, Asked-In: Google, Facebook

Key takeaway: An excellent problem to learn time complexity optimization using various approaches. The two-pointer and hash table solutions are intuitive and worth exploring.

Let’s understand the problem

Given two integer arrays X[] and Y[] of size m and n, write a program to find the intersection of these two arrays.

  • Suppose m > n and elements in both arrays are distinct.
  • The intersection is a list of common elements present in both arrays.
  • The elements in the output can be in any order.

Example 1

Input: X[] = [3, 4, 6, 2, 8, 5, 9], Y[] = [6, 3, 2, 7, 5, 1]

Output: [3, 6, 2, 5]

Explanation: Common elements to both arrays are 3, 6, 2, and 5.

Example 2

Input: X[] = [3, 4, 6, 7, 10, 12, 5], Y[] = [7, 11, 18, 15, 12]

Output: [7, 12]

Explanation: Common elements to both arrays are 7 and 12.

Discussed solution approaches

  • A brute force approach  using nested loops
  • Using sorting and binary search
  • Using sorting and two pointers
  • An efficient approach  using a hash table

A brute force approach  using nested loops

Solution idea

A basic idea would be to pick each element from X[] and search linearly in Y[]. We add an element to an output list if it is present in both X[] and Y[].

  • We run two nested loops: an outer loop from i = 0 to m - 1 to pick each element X[i] and an inner loop from j = 0 to n - 1 to search X[i] linearly in Y[].
  • Whenever we found an element common to both X[] and Y[] i.e. X[i] == Y[j], we add aby one of it to the output list intersection[].
  • By the end of nested loops, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    for (int i = 0; i < m; i = i + 1)
    {
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[i] == Y[j])
            {
                intersection.push_back(X[i])
                break
            }    
        }
    }
    return intersection
}

Solution Analysis

We are running two nested loops and performing O(1) operations at each iteration. In the worst case, the inner loop will run n times, so the time complexity = O(m) * O(*n) * O(1) = O(mn). We are using constant extra space, so space complexity = O(1). Why are we not considering output list intersection as a part of space complexity? Think!

Using sorting and binary search

Solution idea

The critical question is: how can we improve the time complexity of the above approach? Searching is an essential operation in the above code. Can we think to improve the efficiency of searching an element X[i] in Y[]? One idea would be simple: if we sort the array Y[], we can search each element X[i] using the binary search. This could help to improve the time complexity because binary search takes O(logn) time.

Solution steps

  • We define an output list intersection[] to store the common elements.
  • Now we sort array Y[] in increasing order. We can use O(nlogn) sorting algorithms like heap sort or quick sort or merge sort.
  • We run a loop from i = 0 to m - 1 and search each element X[i] in the sorted array Y[] using binary search. Whenever we found an element common to both X[] and Y[] i.e. X[i] == Y[j], we add it to the output list intersection[].
  • By the end of the loop, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    heapSort(Y, n)
    for (int i = 0; i < m; i = i + 1)
    {
        if (binarySearch(Y, n, X[i]) != -1)
            intersection.push_back(X[i])
    }
    return intersection
}

int binarySearch(int Y[], int n, int target)
{
    int l = 0, r = n
    while (l <= r)
    {
        int mid = l + (r - l)/2
        if (Y[mid] == target)
            return mid
        else if (Y[mid] > target)
            r = mid - 1
        else
            l = mid + 1
   }
   return -1
}

Solution analysis

We are using heap sort and iterative binary search for the implementation. Time complexity = Time complexity of sorting array Y[] using heap sort + Time complexity of searching each elements X[i] in Y[] using binary search = O(nlogn) + m * O(logn) = O(nlogn + mlogn). If m > n, then mlogn > nlogn and time complexity would be O(mlogn).

Space complexity = Space complexity of heap sort + Space complexity of iterative binary search = O(1) + O(1) = O(1). What would be the space complexity if we use merge sort and recursive binary search for the implementation? Think!

Using sorting and two pointers

Solution idea

In the above approach, the overall time complexity of searching is O(mlogn). The critical question is: can we improve the time complexity of the searching further? Here is an idea: if we sort both arrays, we can apply a two-pointers approach similar to the merging procedure of merge sort to find the common elements. Think!

After sorting both arrays, the idea would be to start comparing one element of X[] with one element of Y[]. If both are equal, we add it to the output list and move to the next elements in both arrays. Otherwise, we move to the next element in the array that includes the smaller element. (Think!)

Solution steps

  1. We define an output list intersection[] to store the common elements.
  2. Now we sort both the arrays in increasing order. We can use effcient sorting algorithms like heap sort or quick sort to sort the array in O(nlogn) time.
  3. Before starting the two pointers loop, we initialize the pointers i and j i.e. i = 0 and j = 0
  4. Now we run a loop while i < m and j < n:
    • If X[i] == Y[j], we add X[i] to the output list intersection[]and increment pointers i and j by 1.
    • If X[i] < Y[j], we increment i by 1.
    • If X[i] > Y[j], we increment j by 1.
  5. By the end of the loop, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    heapSort(X, m)
    heapSort(Y, n)
    int i = 0, j = 0
    while (i < m && j < n)
    {
        if (X[i] == Y[j])
        { 
            intersection.push_back(X[i])
            i = i + 1
            j = j + 1
        }    
        else if (X[i] < Y[j])
            i = i + 1
        else
            j = j + 1
    }
    return intersection
}

Solution analysis

  • We are using heap sort for the implementation.
  • In the two-pointers loop, we increment pointer i or j or both by 1, based on the comparison with X[i] and Y[j]. In the worst case, it will access each X and Y element once. So time complexity of this loop = O(m) + O(n) = O(m + n)
  • So, overall time complexity = Time complexity of the sorting X[] using heap sort + Time complexity of sorting Y[] using heap sort + Time complexity of two pointers loop = O(mlogm) + O(nlogn) + O(n + m) = O(mlogm + nlogn). If m > n, mlogm > nlogn and time complexity will be O(mlogm).
  • Space complexity = Space complexity of heap sort + Space complexity oftwo pointers loop = O(1) + O(1) = O(1).

For m > n, if we compare the time complexity of this approach with the previous approach, then it is not better! The idea is: we have improved the time complexity of searching after sorting both the arrays, but, here, the time complexity is dominated by the sorting algorithm, which is O(mlogm). The time complexity of the above approach is O(mlogn) which is better than O(mlogm). Think!

An efficient approach  using a hash table

Solution idea

We can improve the time complexity of searching further using the hash table because it can perform searching in O(1) time average. The idea is to first insert all elements of array Y[] in the hash table and then search each element X[i] in the hash table. If element X[i] is present in the hash table, we add it to the output list.

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    HashTable H
    for (int i = 0; i < n; i = i + 1)
        H.insert(Y[i])
    for (int i = 0; i < m; i = i + 1)
    {
        if (H.search(X[i]) == true)
            intersection.push_back(X[i])
    }
    return intersection
}

Solution analysis

Time complexity = Time complexity of inserting n elements of Y[] in the hash table + Time complexity of searching m elements of X[] in the hash table = n * O(1) + m * O(1)) = O(m + n). Space complexity = O(n), for storing values in the hash table. 

Critical ideas to think!

  • Do we need to modify the above algorithms if elements are repeated?
  • What would be the time and space complexity, if the hash table is created for the larger array X[] instead of the smaller array Y[]?
  • Can we think to use some other data structures to solve this problem? Is self-balancing BST useful?
  • In the 2nd approach, we sorted the smaller array. Can we solve this problem by sorting the larger array? How does the time complexity change?
  • In the 3rd approach, why are we incrementing pointers when X[i] > Y[j] or X[i] < Y[j] ?
  • In the 3rd approach, what would be the worst and best-case scenario of finding the common elements using the two-pointers loop?

Suggested coding problems to solve

Enjoy learning, Enjoy Algorithm!

More From EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.

Follow Us:

LinkedinMedium

© 2020 EnjoyAlgorithms Inc.

All rights reserved.