Find whether an array is a subset of another array

Difficulty: Medium, Asked-in: Qualcomm

Key takeaway: An excellent problem to learn time complexity optimization using various approaches. The two-pointers and hash table solutions are intuitive and worth exploring.

Let’s understand the problem

Given two unsorted arrays X and Y of size m and n, write a program to check whether Y is a subset of X or not. Y will be a subset of X if each element of Y is present in X. Assume that there are no repeated elements in both arrays and n <= m.

Example 1

Input: X[] = [2, 8, 12, 6, 10, 11], Y[] = [8, 2, 6, 11]

Output: True, Explanation: All elements of Y are present in X.

Example 2

Input: X[] = [6, 4, 8, 3, 2], Y[] = [4, 7, 3, 9]

Output: False, Explanation: Elements 7 and 9 of Y are not present in X.

Discussed Solution Approaches

  • A brute force approach  using linear search
  • Using sorting and binary search
  • Using sorting and two pointers approach
  • An efficient approach  using a hash table

A brute force approach  using linear search

Solution Idea

A basic idea would be to search each element of Y in X using linear search. When we find the first element Y[i] which is not present in X, then Y is not a subset of X and we return false. Otherwise, If all Y elements are present in X, then Y is a subset of X and we return true.

Solution Pseudocode

bool checkSubset(int X[], int Y[], int m, int n)
{
    int i = 0
    while (i < n)
    {
        int k = linearSearch(X, 0, m - 1, Y[i])
        if (k == -1)
            return false
        else 
            i = i + 1    
    }
    return true
}

int linearSearch(int X[], int l, int r, int key)
{
    for (int i = l; i <= r; i = i + 1)
    {
        if (X[i] == key)
            return i
    }
    return -1
}

Solution Analysis

In the worst case, we are searching n elements of Y linearly in array X of size m. So worst-case time complexity would be O(m*n). If we observe closely, the time complexity depends on the order of the elements in both arrays. So what would be the best and worst scenarios of the input? Think! We are using constant extra space, so space complexity = O(1)

Using sorting and binary search

One idea is clear: Searching is the critical operation for the solution. So the question is: can we improve the efficiency of searching to improve the time complexity further? Let's think!

Solution Idea

If we sort the larger array X, then we can apply binary search to search each element of Y in X. Binary search performs searching in O(logn) time on the sorted array, which can help us to improve the time complexity further. Suppose we use one of the fastest sorting algorithms heap sort and iterative binary search for the implementation.

Solution Steps

  • We sort the X in increasing order using heap sort.
  • Now we run a loop to search each element Y[i] in X using binary search. If element Y[i] is not present in X, we return false. Otherwise, we move to the next element in Y and repeat the same process.
  • By the end of the loop, all Y elements are present in X, and we return true.

Solution Pseudocode

bool checkSubset(int X[], int Y[], int m, int n)
{
    heapSort(X, m)
    for (int i = 0; i < n; i = i + 1)
    {
        int k = binarySearch (X, 0, m - 1, Y[i])
        if (k == -1)
            return false
    }
    return true
}

int binarySearch(int X[], int l, int r, int key)
{ 
    while (l <= r) 
    { 
        int mid = l + (r - l) / 2
        if (X[mid] == key) 
            return mid
        if (X[mid] < key) 
            l = mid + 1
        else
            r = mid - 1
    }
    return -1
}

Time and space complexity analysis

Time Complexity = Time complexity of heap sort + n * time complexity of the iterative binary search = O(mlogm) + n. O(logm) = O(mlogm + nlogm). Space Complexity = Space complexity of heap sort + Space complexity of the iterative binary search = O(1) + O(1) = O(1)

Using sorting and two pointers approach

Solution Idea and Steps

Sometimes two pointers approach works perfectly on the sorted arrays. But how do we apply this approach here? Here is an idea: Sort both the arrays and apply a two-pointers approach similar to the merging algorithm. We can refer to the merging process of the merge sort algorithm.

Suppose we sort X and Y and initialize the two pointers i and j with starting elements of both arrays i.e. i =0 and j =0. Now we track the common elements by comparing elements one by one in both arrays. During this process, we increment the pointers based on the following three conditions. Note: This loop will run till i < m && j < n.

  • If(X[i] < Y[j]): We move the pointer i by 1 to search Y[j] in the remaining part of X. In other words, we have not yet found the element Y[j] in X.
  • If (X[i] == Y[j]): We found an element common to both arrays, which is part of the common subset. Now we move both pointers i and j by 1 to search the next element Y[j+1] in the remaining part of X.
  • If(X[i] > Y[j]): Both arrays are sorted, so remaining elements in X will be definitely greater than Y[j]. In other words, Y[j] is not present in X and we return false.
  • Boundary condition: By the end of the loop, if(j < n), it means pointer j has not reached the end of the Y. So the remaining values in Y (from index j to n - 1) will be not present in the X. So both arrays are not subset of each other, and we return false. Otherwise, if (j == n), then we found each value of Y in X, and we return true. Think!

Solution Pseudocode

bool checkSubset(int X[], int Y[], int m, int n)
{
    heapSort(X, m)
    heapSort(Y, n)
    int i = 0, j = 0
    while (i < m && j < n)
    {
        if (X[i] < Y[j])
            i = i + 1
        else if(X[i] == Y[j])
        {
            j = j + 1
            i = i + 1
        }
        else if(X[i] > Y[j])
           return fasle
    }
    if (j < n)
        return false
    else
        return true
}

Time and space complexity analysis

At each iteration of the loop, we compare one element each from X and Y. In other words, we increment the pointer i or j or both by 1, depending on the comparison. The total number of comparisons in the worst case = m + n, So the time complexity of the while loop = O(m + n). But if we observe closely, the comparison count depends on the order of the elements in both arrays. So what would be the best and worst scenarios of the input? Think!

Suppose we are using heap sort for the implementation. Overall time complexity = Time complexity of sorting array X + Time Complexity of sorting array Y + Time complexity of two pointers loop = O(mlogm) + O(nlogn) + O(m + n) = O(mlogm + nlogn)

Space Complexity = Space complexity of heap sort + Space complexity of the two pointers loop = O(1) + O(1) = O(1)

An efficient approach  using a hash table

Now critical questions are: can we improve the time complexity to O(n)? Can we solve the problem without using sorting? Think!

Solution Idea

Here searching is an essential operation because we have to search each value of Y in X. So we can improve the time complexity by using an efficient data structure for searching i.e. Hash Table. Hash Table to perform searching and insertion efficiently in O(1) time average.

Solution Steps

  • Take a Hash Table of size O(m)
  • Insert all the elements of array X[] into the Hash Table
  • Traverse Y[] and search for each element of Y[] in the Hash Table. If any element is not present in the hash table, then we return false.
  • If we have elements are found all elements of Y[] in the hash table, then we return true.

Solution Pseudocode

int checkSubset(int X[], int Y[], int m, int n)
{
    HashTable H
    for (int i = 0; i < m; i = i + 1)
        H.insert(X[i])
        
    for (int i = 0; i < n; i = i + 1)
    {
        if (H.search(Y[i]) == false)
            return false
    }
    return true
}

Time and space complexity analysis

Time complexity = Time complexity of inserting m elements of X[] in hash table + Time complexity of searching n elements of Y[] in hash table = m. O(1) + n . O(1) = O(m) + O(n) = O(m+n)

Space complexity = Extra space of Hash Table = O(m)

Critical ideas to think!

  • In the last approach, what would be the time and space complexity if we use a self-balancing BST in place of a hash table?
  • What would be the worst and best case input in the brute force approach? Count the exact number of comparison operations in the worst-case scenario.
  • In the 2nd approach, can we solve the problem by sorting the smaller array Y[]?
  • Is all the above algorithm works fine if elements are repeated? Check and confirm this by tracking the algorithms via some examples?
  • What would be the worst-case input in the two-pointers approach? Count the exact number of comparison operations in the worst-case scenario.
  • Can we solve this problem using some other approach?
  • What would be the time and space complexity if we use quick-sort or merge sort In the 2nd and 3rd approaches?
  • Why the idea of the two-pointer approach works perfectly for a sorted array? Verify the correctness of the 3rd approach.

Comparisons of time and space complexities

  • Nested loops: Time = O(nm), Space = O(1)
  • Sorting and binary search: Time = O(mlogm + nlogm), Space = O(1)
  • Sorting and two pointers: Time = O(mlogm + nlogn), Space = O(1)
  • Hash Table approach: Time = O(m + n), Space = O(m)

Suggested coding problems to practice

Enjoy learning, Enjoy coding, Enjoy algorithms!

More From EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.

Follow Us:

LinkedinMedium

© 2020 EnjoyAlgorithms Inc.

All rights reserved.