Difficulty: Easy, Asked-in: Google, Amazon, Microsoft, Goldman Sachs
Key takeaway: An excellent problem to learn problem-solving using hashing and step-by-step optimization using several approaches.
Let’s understand the problem
Given two strings str1 and str2 of size m and n, write a program to check whether the two strings are an anagram of each other or not.
- A string str1 is an anagram of str2 if the characters of str1 can be rearranged to form str2.
- Both strings may contain duplicate characters.
- Characters in both input strings are lowercase English characters.
Examples
Input: str1= “dusty” , str2= “study”, Output: true
Input: str1= “state” , str2= “taste”, Output: true
Input: str1= “enjoyalgorithms”, str2= “enjoymathematics”, Output: false
Discussed solution approaches
- A brute force solution using nested loops
- Using sorting and comparing characters
- Counting occurrences of each character using two frequency tables
- Counting occurrences of each character using one frequency table
Solution 1: A brute force solution using nested loops
Solution Idea
One basic solution would be to pick each character from str1 and search that character in str2. If any character is missing, then both strings are not an anagram of each other, and we return false. On another side, if strings are anagrams of each other, then the frequency of repeated characters should be the same. To ensure this, we can replace the found characters in str2 with an empty flag ‘\0’ to handle this scenario.
Solution Pseudocode
bool anagramString(String str1, int m, String str2, int n)
{
if (m != n)
return false
for (int i = 0; i < m; i = i + 1)
{
int pos = linearSearch(str2, n, str1[i])
if (pos == -1)
return false
else
str2[pos] = '\0'
}
return true
}
Time and space complexity analysis
We are searching each character of str1 in str2 using the linear search. So the time complexity = m * O(n) = O(mn). We are using constant extra space, so space complexity = O(1).
Solution 2: Using sorting and comparing characters
Solution Idea
Can we think of improving the time complexity further? Actually, we need to compare the frequency of each character of both strings, which can be easy to handle if we sort both the strings in increasing order.
- If (m != n), then strings are not an anagram of each other.
- Now we sort both strings in increasing order. What sorting algorithm should we prefer? Only lowercase characters are present in both strings, i.e. (only 26 possible values repeated several times). So instead of using the O(nlogn) sorting algorithm, we can use a linear time sorting algorithm like the bucket or counting sort. Think!
- Now we run a loop from i = 0 to m — 1 or i = 0 to n — 1 and compare characters at index i on both strings.
- If we find some unequal character in the sequence, we return false. Otherwise, by the end of the loop, we return true.
Solution Pseudocode
bool anagramString(String str1, int m, String str2, int n)
{
if (m != n)
return false
sort(str1, m)
sort(str2, n)
for (int i = 0; i < m; i = i + 1)
{
if (str1[i] != str2[i])
return false
}
return true
}
Time and space complexity analysis
We are performing sorting and comparison of both strings when m == n. So time complexity = Time complexity of sorting str1 + Time complexity of sorting str2 + Time complexity of comparing sorted strings = O(n) + O(n) + O(n) = O(n). We are using constant extra space, so space complexity = O(1).
Solution 3: Counting characters using two frequency tables
Searching is a critical operation to solve the problem. Can we think of using hashing to compare the frequency of characters in both strings?
Solution Idea
As we know, all are lowercase characters, which have 26 possible values. So we can create two separate frequency tables of size 26 to store the frequency of each character in both strings. After this, we can traverse both tables linearly and compare the frequency count of each character. If values in both tables are equal, then strings are anagram.
Solution Pseudocode
bool anagramString(String str1, int m, String str2, int n)
{
if (m != n)
return false
int freqCount1[26]
int freqCount2[26]
for(int i = 0; i< 26; i = i + 1)
{
freqCount1[i] = 0
freqCount2[i] = 0
}
for(int i = 0; i < m; i = i + 1)
{
freqCount1[str1[i]-'a'] = freqCount1[str1[i]-'a'] + 1
freqCount2[str2[i]-'a'] = freqCount2[str2[i]-'a'] + 1
}
for(int i = 0; i < 26; i = i + 1)
{
if (freqCount1[i] != freqCount2[i])
return false
}
return true
}
Time and space complexity analysis
We are running three separate loops, where the 1st and 3rd loop are running constant times and the 2nd loop is running n times. So the time complexity = Time complexity of initializing frequency tables + Time complexity of storing frequency tables + Time complexity of comparing frequency tables = = O(1) + O(n) + O(1) = O(n). The size of both frequency tables is constant, so space complexity = O(1) + O(1) = O(1).
Solution 4: Counting occurrences of each character using one frequency table
The critical question is: can we optimize the space complexity of the above approach to solve the problem using a single frequency table?
Solution Idea
Suppose we traverse the first string and store the frequency count of each character in a frequency table of size 26. Now, if the second string is an anagram of the first string, then the frequency of each character must be equal. So instead of storing the frequency count of the second string in a separate table, we traverse the second string and decrement the frequency of each character in the already stored frequency table for the first string.
Now at this stage, if both strings are an anagram of each other, then all the values stored in the frequency table must be 0. So we traverse the frequency table to check this. If we find any value not equal to 0, we return false. Otherwise, by the end of the loop, we return true.
Solution Pseudocode
bool anagramString(String str1, int m, String str2, int n)
{
if (m != n)
return false
int freqCount[26]
for (int i = 0; i < m; i = i + 1)
freqCount[str1[i] - 'a'] = freqCount[str1[i] - 'a'] + 1
for (int i = 0; i < n; i = i + 1)
freqCount[str2[i] - 'a'] = freqCount[str2[i] - 'a'] - 1
for (int i = 0; i < 26; i = i + 1)
{
if (freqCount[i] != 0)
return false
}
return true
}
Time and space complexity analysis
Time complexity = Time complexity of storing frequency table + Time complexity of checking 0’s in frequency table = = O(n) + O(n) = O(n). We are only using one frequency table of constant size, so space complexity = O(1).
Critical ideas to think about!
- Is there a way to optimize the 4th approach further? Can we think of putting an extra conditional statement in the second loop to avoid going through 3rd loop every time? Or, can we think of removing the 3rd loop?
- In the 4th solution, would it be possible to merge the 1st and 2nd loop operations into a single loop?
- What if the inputs contain Unicode characters? How can we change the above solutions? Can we use a frequency table, or do we need to use a hash table?
- Is it possible to solve this problem using other data structures or another approach?
- In the 2nd solution approach, can we solve the problem by sorting just one of the strings?
- Can we think of modifying the above approaches when uppercase, empty and lowercase characters are allowed?
- Why are found characters assigned with a null character in the brute force solution?
- How can we implement the O(n) sorting algorithm in the sorting-based approach?
Suggested Problems to Practice
Please write in the message below if you have additional insights or if you find an error. Enjoy learning, Enjoy algorithms!