Valid Anagram: Check if Two Strings are Anagram or not

Difficulty: Easy, Asked-in: Google, Amazon, Microsoft, Goldman Sachs

Key takeaway: An excellent problem to learn problem-solving using hashing and step-by-step optimization using several approaches.

Let’s understand the problem

Given two strings str1 and str2 of size m and n, write a program to check whether the two strings are an anagram of each other or not. 

  • A string str1 is an anagram of str2 if the characters of str1 can be rearranged to form str2.
  • Both strings may contain duplicate characters.
  • Characters in both input strings are lowercase English characters.

Examples

Input: str1= “dusty” , str2= “study”, Output: true

Input: str1= “state” , str2= “taste”, Output: true

Input: str1= “enjoyalgorithms”, str2= “enjoymathematics”, Output: false

Discussed solution approaches

  • A brute force solution using nested loops
  • Using sorting and comparing characters
  • Counting occurrences of each character using two frequency tables
  • Counting occurrences of each character using one frequency table

Solution 1: A brute force solution using nested loops

Solution Idea

One basic solution would be to pick each character from str1 and search that character in str2. If any character is missing, then both strings are not an anagram of each other, and we return false. On another side, if strings are anagrams of each other, then the frequency of repeated characters should be the same. To ensure this, we can replace the found characters in str2 with an empty flag ‘\0’ to handle this scenario.

Solution Pseudocode

bool anagramString(String str1, int m, String str2, int n)
{
    if (m != n)
        return false
    for (int i = 0; i < m; i = i + 1)
    {
        int pos = linearSearch(str2, n, str1[i])
        if (pos == -1)
            return false
        else
            str2[pos] = '\0'
    }
    return true
}

Time and space complexity analysis

We are searching each character of str1 in str2 using the linear search. So the time complexity = m * O(n) = O(mn). We are using constant extra space, so space complexity = O(1).

Solution 2: Using sorting and comparing characters

Solution Idea

Can we think of improving the time complexity further? Actually, we need to compare the frequency of each character of both strings, which can be easy to handle if we sort both the strings in increasing order.

  • If (m != n), then strings are not an anagram of each other.
  • Now we sort both strings in increasing order. What sorting algorithm should we prefer? Only lowercase characters are present in both strings, i.e. (only 26 possible values repeated several times). So instead of using the O(nlogn) sorting algorithm, we can use a linear time sorting algorithm like the bucket or counting sort. Think!
  • Now we run a loop from i = 0 to m — 1 or i = 0 to n — 1 and compare characters at index i on both strings.
  • If we find some unequal character in the sequence, we return false. Otherwise, by the end of the loop, we return true.

Solution Pseudocode

bool anagramString(String str1, int m, String str2, int n)
{
    if (m != n)
        return false
    sort(str1, m)
    sort(str2, n)
    for (int i = 0; i < m; i = i + 1)
    {
        if (str1[i] != str2[i])
            return false
    }        
    return true
}

Time and space complexity analysis

We are performing sorting and comparison of both strings when m == n. So time complexity = Time complexity of sorting str1 + Time complexity of sorting str2 + Time complexity of comparing sorted strings = O(n) + O(n) + O(n) = O(n). We are using constant extra space, so space complexity = O(1).

Solution 3: Counting characters using two frequency tables

Searching is a critical operation to solve the problem. Can we think of using hashing to compare the frequency of characters in both strings?

Solution Idea

As we know, all are lowercase characters, which have 26 possible values. So we can create two separate frequency tables of size 26 to store the frequency of each character in both strings. After this, we can traverse both tables linearly and compare the frequency count of each character. If values in both tables are equal, then strings are anagram.

Solution Pseudocode

bool anagramString(String str1, int m, String str2, int n)
{
     if (m != n)
        return false
    int freqCount1[26]
    int freqCount2[26]
    
    for(int i = 0; i< 26; i = i + 1)
    {
        freqCount1[i] = 0
        freqCount2[i] = 0
    }
    
    for(int i = 0; i < m; i = i  + 1) 
    {
        freqCount1[str1[i]-'a'] =  freqCount1[str1[i]-'a'] + 1
        freqCount2[str2[i]-'a'] =  freqCount2[str2[i]-'a'] + 1
    }
    
    for(int i = 0; i < 26; i = i + 1)
    {
        if (freqCount1[i] != freqCount2[i])
            return false
    }        
    return true
}

Time and space complexity analysis

We are running three separate loops, where the 1st and 3rd loop are running constant times and the 2nd loop is running n times. So the time complexity = Time complexity of initializing frequency tables + Time complexity of storing frequency tables + Time complexity of comparing frequency tables = = O(1) + O(n) + O(1) = O(n). The size of both frequency tables is constant, so space complexity = O(1) + O(1) = O(1).

Solution 4: Counting occurrences of each character using one frequency table

The critical question is: can we optimize the space complexity of the above approach to solve the problem using a single frequency table?

Solution Idea

Suppose we traverse the first string and store the frequency count of each character in a frequency table of size 26. Now, if the second string is an anagram of the first string, then the frequency of each character must be equal. So instead of storing the frequency count of the second string in a separate table, we traverse the second string and decrement the frequency of each character in the already stored frequency table for the first string.

Now at this stage, if both strings are an anagram of each other, then all the values stored in the frequency table must be 0. So we traverse the frequency table to check this. If we find any value not equal to 0, we return false. Otherwise, by the end of the loop, we return true.

Solution Pseudocode

bool anagramString(String str1, int m, String str2, int n)
{
    if (m != n)
        return false
    int freqCount[26]
    
    for (int i = 0; i < m; i = i + 1) 
        freqCount[str1[i] - 'a'] = freqCount[str1[i] - 'a'] + 1
        
    for (int i = 0; i < n; i = i + 1) 
        freqCount[str2[i] - 'a'] = freqCount[str2[i] - 'a'] - 1
        
    for (int i = 0; i < 26; i = i + 1)
    {
        if (freqCount[i] != 0)
            return false
    }        
    return true
}

Time and space complexity analysis

Time complexity = Time complexity of storing frequency table + Time complexity of checking 0’s in frequency table = = O(n) + O(n) = O(n). We are only using one frequency table of constant size, so space complexity = O(1).

Critical ideas to think about!

  • Is there a way to optimize the 4th approach further? Can we think of putting an extra conditional statement in the second loop to avoid going through 3rd loop every time? Or, can we think of removing the 3rd loop?
  • In the 4th solution, would it be possible to merge the 1st and 2nd loop operations into a single loop?
  • What if the inputs contain Unicode characters? How can we change the above solutions? Can we use a frequency table, or do we need to use a hash table?
  • Is it possible to solve this problem using other data structures or another approach?
  • In the 2nd solution approach, can we solve the problem by sorting just one of the strings?
  • Can we think of modifying the above approaches when uppercase, empty and lowercase characters are allowed?
  • Why are found characters assigned with a null character in the brute force solution?
  • How can we implement the O(n) sorting algorithm in the sorting-based approach?

Suggested Problems to Practice

Please write in the message below if you have additional insights or if you find an error. Enjoy learning, Enjoy algorithms!

More From EnjoyAlgorithms

Our weekly newsletter

Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops design and mathematics.

Follow Us:

LinkedinMedium

© 2020 EnjoyAlgorithms Inc.

All rights reserved.