Difficulty: Medium, Asked-in: Google, Microsoft, Bloomberg, Paypal
Key takeaway: An excellent problem to learn problem-solving using sorting and data structures like BST and hash table. We are storing additional information to sort the string in a stable order.
Given a string S, write a program to sort string in decreasing order based on the frequency of characters.
Important note: Before moving on to the solutions, we recommend trying this problem on paper for atleast 15 or 30 minutes. Enjoy problem-solving!
Example 1
Input: S = "tree", Output: "eetr"
Explanation: 'e' appears twice while 'r' and 't' both appear once. So 'e' must appear before both 'r' and 't'.
Example 2
Input: S = "ccaaaAA", Output: "aaaccAA"
Explanation: 'a' appears three times while 'c' and 'A' appear twice. So 'a' must appear before both 'c' and 'A'. Similarly, the frequency of 'c' and 'A' are the same. To maintain the stable order 'c' must appear before 'A'.
To sort characters based on their frequency:
So how do we ensure the above things? One basic idea would be:
We define a class or structure UniqueChar, which stores three information related to each unique character: character value, frequency count, and first occurrence in the input string.
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
Now we run a nested loop to identify each unique character and store their frequency count and first occurrence in the freq[] list. Here outer loop iterates through each character, and the inner loop checks its unique occurrence in the freq[] list.
for (int i = 0; i < n; i = i + 1)
{
bool found = false
for (int j = 0; j < freq.length; j = j + 1)
{
if (S[i] == freq[j].ch)
{
freq[j].freqCount = freq[j].freqCount + 1
found = true
break
}
}
if (found == false)
{
UniqueChar temp
temp.ch = S[i]
temp.freqCount = 1
temp.firstIndex = i
freq.push_back(temp)
}
}
Now we sort the freq[] list based on the frequency count of characters. If the frequency count is equal for two elements, we should first process the element with lower firstIndex. Our sorting procedure can use a comparator to ensure this.
sort(freq, freq.length, comp)
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
After this, we extract characters from the sorted freq[] list and store it in an extra output array result[n].
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for (int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
int[] frequencySort(char S[], int n)
{
vector<UniqueChar> freq
for (int i = 0; i < n; i = i + 1)
{
bool found = false
for (int j = 0; j < freq.length; j = j + 1)
{
if (S[i] == freq[j].ch)
{
freq[j].freqCount = freq[j].freqCount + 1
found = true
break
}
}
if (found == false)
{
UniqueChar temp
temp.ch = S[i]
temp.freqCount = 1
temp.firstIndex = i
freq.push_back(temp)
}
}
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for (int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
Suppose there are m unique characters in the input string. Time complexity = Time complexity of storing values in freq[] list + Time complexity of sorting freq[] list + Time complexity of storing output in result[] = O(mn) + O(m) + O(n) = O(mn)
Why is the time complexity of sorting freq[] list O(m)? The idea would be simple: If only uppercase and lowercase English letters are part of the input, there can be only 52 possible unique characters repeated several times. So we can use the idea of bucket sort to sort the input in O(m) time! In other words, there is no need to use comparison-based sorting algorithms like merge sort or heap sort! Think!
Space complexity = Extra space of freq[] list + Extra space used by sorting + Storing result[] array = O(m) + O(1) + O(n) = O(m + n)
In the above approach, the nested loops dominate the time complexity, which is O(nm). If we observe the loop: We are searching for the unique presence of each character in the freq[] list and inserting it with additional information. So one idea is clear: Searching is a critical operation that takes O(m) time for each character in the worst case. The critical question is: Can we use some other efficient data structure to improve the time complexity of searching?
A binary search tree can be a useful data structure for frequent search queries, where the time complexity depends on the tree's height. Although, if we use the idea of a balanced binary search tree such as an AVL Tree or Red-Black tree, the worst-case time complexity for searching, insertion, and deletion would be O(logn).
We can use the following node structure to store each unique character in a BST with two additional parameters: freqCount to store the frequency count and firstIndex for string the first occurrence.
Note: We will be assuming a balanced BST for the implementation.
class BSTNode
{
char ch
int freqCount
int firstIndex
BSTNode *left
BSTNode *right
}
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
int[] frequencySort(char S[], int n)
{
BSTNode root = NULL
BSTNode temp = NULL
for(int i = 0; i < n; i = i + 1)
{
temp = bstSearch(root, S[i])
if (temp != NULL)
temp->freqCount = temp->freqCount + 1
else if (temp == NULL)
{
BSTNode node = Newnode()
node->ch = S[i]
node->firstIndex = i
node->freqCount = 1
node->left = NULL
node->right = NULL
bstInsert(root, node)
}
}
vector<UniqueChar> freq
bstInorder(root, freq)
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for(int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
Suppose there are m unique characters in the input string.
Time complexity = Time complexity of inserting each characters in a BST + Time complexity of extracting characters to freq[] list using inorder traversal+ Time complexity of sorting freq[] list + Time complexity of storing output in result[] = O(nlogm) + O(m) + O(m) + O(n) = O(nlogm)
Space complexity = Extra space for BST + Extra space for storing freq[] + Extra space for storing result[] + Extra space used by sorting = O(m) + O(m) + O(n) + O(1) = O(m + n)
Can we improve the time complexity further by using a hash table? Hashing is one of the efficient ways to store data and perform the searching operation in O(1) time average. The critical question is: Given a hash table of characters with their frequency, how can we perform a stable sorting on it? The idea is simple: We can use another hash table that stores the first occurrence of each unique character.
We linearly traverse input string for each character S[i]:
Now we follow a similar procedure used in the previous approach:
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
int[] sortByFrequency(char S[], int n)
{
Hash Table freqCount
Hash Table firstIndex
for(int i = 0; i < n; i = i + 1)
{
if (freqCount.search(S[i]) == true)
freqCount[S[i]] = freqCount[S[i]] + 1
else
{
freqCount[S[i]] = 1
firstIndex[S[i]] = i
}
}
vector<UniqueChar> freq
for(each charcater x in freqCount)
{
UniqueChar temp
temp.ch = x
temp.freqCount = freqCount[x]
temp.firstIndex = firstIndex[x]
freq.push_back(temp)
}
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for(int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
Suppose there are m unique characters in the input string.
Time complexity = Time complexity of storing freqCount and firstIndex hash tables + Time complexity of storing values in freq[] + Time complexity of sorting freq[] + Time complexity of storing output in result[] array = O(n) + O(n) + O(m) + O(n) = O(m + n)
Space complexity = Extra space of freqCount and firstIndex hash tables + Extra space of freq[] list + Extra space of result[] array + Extra space of sorting = 2*O(m) + O(m) + O(n) + O(1) = O(m + n)
Important note: we recommend transforming the above pseudo-codes into a favourite programming language (C, C++, Java, Python, etc.) and verifying all the test cases. Enjoy programming!
Please write in the message below if you find anything incorrect, or you want to share more insight, or you know some different approaches to solve this problem. Enjoy learning, Enjoy algorithms!
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.