Difficulty: Medium, Asked-in: Google, Microsoft, Bloomberg, Paypal
Key takeaway: an excellent problem to learn problem-solving using sorting and data structures like BST and hash table! We are storing additional information to sort the string in a stable order.
Given a string S, write a program to sort it in decreasing order based on the frequency of the characters.
Important note: before moving on to the solutions, we recommend trying this problem on paper for atleast 15 or 30 minutes. Enjoy problem-solving!
Example 1
Input: s = "tree", Output: "eetr"
Explanation: 'e' appears twice while 'r' and 't' both appear once. So 'e' must appear before both 'r' and 't'.
Example 2
Input: s = "ccaaaAA", Output: "aaaccAA"
Explanation: 'a' appears three times while 'c' and 'A' appear twice. So 'a' must appear before both 'c' and 'A'. Similarly, the frequency of 'c' and 'A' are the same. So to maintain the stable order 'c' must appear before 'A'.
Solution idea
To sort characters based on their frequency:
So how do we ensure the above things? One basic idea would be:
Solution steps
We define a class called or structure UniqueChar, which stores three information related to each unique character: the character value, frequency count, and first occurrence in the input string.
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
Now we run a nested loop to identify each unique character and store their frequency count and first occurrence in the freq[] list. Here outer loop iterates through each character, and the inner loop checks its unique occurrence in the freq[] list.
for (int i = 0; i < n; i = i + 1)
{
bool found = false
for (int j = 0; j < freq.length; j = j + 1)
{
if (S[i] == freq[j].ch)
{
freq[j].freqCount = freq[j].freqCount + 1
found = true
break
}
}
if (found == false)
{
UniqueChar temp
temp.ch = S[i]
temp.freqCount = 1
temp.firstIndex = i
freq.append(temp)
}
}
Now we sort the freq[] list based on the frequency count of characters. If the frequency count is equal for two elements, we should first process the element with lower firstIndex. Our sorting procedure can use a comparator to ensure this.
sort(freq, freq.length, comp)
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
After this, we extract characters from the sorted freq[] list and store it in an extra output array result[n].
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for (int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
Solution pseudocode C++
int[] frequencySort(char S[], int n)
{
vector<UniqueChar> freq
for (int i = 0; i < n; i = i + 1)
{
bool found = false
for (int j = 0; j < freq.length; j = j + 1)
{
if (S[i] == freq[j].ch)
{
freq[j].freqCount = freq[j].freqCount + 1
found = true
break
}
}
if (found == false)
{
UniqueChar temp
temp.ch = S[i]
temp.freqCount = 1
temp.firstIndex = i
freq.append(temp)
}
}
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for (int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
Solution analysis
Suppose there are m unique characters in the input string. Time complexity = Time complexity of storing values in freq[] list + Time complexity of sorting freq[] list + Time complexity of storing output in result[] = O(mn) + O(m) + O(n) = O(mn)
Why time complexity of sorting freq[] list is O(m)? The idea would be simple: if only uppercase and lowercase English letters are part of the input, then there could be only 52 possible unique characters repeated several times. So we can use the idea of bucket sort to sort the input in O(m) time! In other words, there is no need to use comparison-based sorting algorithms like merge sort or heap sort! Think!
Space complexity = Extra space of freq[] list + Extra space used by sorting + Storing result[] array = O(m) + O(1) + O(n) = O(m + n)
In the above approach, the nested loops dominate the time complexity, which is O(nm). If we observe the loop: we are searching the unique presence of each character in the freq[] list and inserting it with additional information. So one idea is clear: searching is a critical operation that takes O(m) time for each character in the worst case. The critical question is: can we use some other effcient data structure to improve the time complexity of searching?
A binary search tree can be a useful data structure for frequent search queries, where the time complexity depends on the tree's height. Although, if we use the idea of a balanced binary search tree such as AVL Tree or Red-Black tree, the worst-case time complexity for searching, insertion, and deletion would be O(logn).
We can use the following node structure to store each unique character in a BST with two additional parameters: freqCount to store the frequency count and firstIndex for string the first occurrence.
Note: we will be assuming a balanced BST for the implementation.
class BSTNode
{
char ch
int freqCount
int firstIndex
node *left
node *right
}
Solution steps
Solution pseudocode
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
int[] frequencySort(char S[], int n)
{
BSTNode root = NULL
BSTNode temp = NULL
for(int i = 0; i < n; i = i + 1)
{
temp = search(root, S[i])
if (temp != NULL)
temp->freqCount = temp->freqCount + 1
else if (temp == NULL)
{
BSTNode node = Newnode()
node->data = S[i]
node->firstIndex = i
node->freqCount = 1
node->left = NULL
node->right = NULL
insert(root, node)
}
}
vector<UniqueChar> freq
inorderStore(root, freq)
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for(int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
Solution analysis
Suppose there are m unique characters in the input string.
Time complexity = Time complexity of inserting each characters in a BST + Time complexity of extracting characters to freq[] list + Time complexity of sorting freq[] list + Time complexity of storing output in result[] = O(nlogm) + O(m) + O(m) + O(n) = O(nlogm)
Space complexity = Extra space for BST + Extra space for storing freq[] + Extra space for storing result[] + Extra space used by sorting = O(m) + O(m) + O(n) = O(m + n)
Solution idea
Can we improve the time complexity further by using a hash table? Hashing is one of the effcient ways to store data and perform the searching operation in O(1) time average. But a critical question is: given a hash table of characters with their frequency, how can we perform a stable sorting on it? The idea is simple: we can use another hash table that stores the first occurrence of each unique character.
Solution steps
We linearly traverse the input string for each character S[i]:
Now we follow a similar procedure used in the previous approach:
Solution pseudocode
class UniqueChar
{
char ch
int freqCount
int firstIndex
}
int[] sortByFrequency(char S[], int n)
{
Hash Table freqCount
Hash Table firstIndex
for(int i = 0; i < n; i = i + 1)
{
if (freqCount.search(S[i]) == true)
freqCount[S[i]] = freqCount[S[i]] + 1
else
{
freqCount[S[i]] = 1
firstIndex[S[i]] = i
}
}
vector<UniqueChar> freq
for(each charcater x in freqCount)
{
UniqueChar temp
temp.ch = x
temp.freqCount = freqCount[x]
temp.firstIndex = firstIndex[x]
freq.append(temp)
}
sort(freq, freq.length, comp)
char result[n]
int k = 0
for (int i = 0; i < freq.length; i = i + 1)
{
for(int j = 0; j < freq[i].freqCount; j = j + 1)
{
result[k] = freq[i].ch
k = k + 1
}
}
return result
}
bool comp(UniqueChar x, UniqueChar y)
{
if (x.freqCount != y.freqCount)
return x.freqCount > y.freqCount
else
return x.firstIndex < y.firstIndex
}
Solution analysis
Suppose there are m unique characters in the input string.
Time complexity = Time complexity of seaching and insertion in freqCount and firstIndex hash tables + Time complexity of storing values in freq[] + Time complexity of sorting freq[] + Time complexity of storing output in result[] array = O(n) + O(n) + O(m) + O(n) = O(m + n)
Space complexity = Extra space of freqCount and firstIndex hash tables + Extra space of freq[] list + Extra space of result[] array + Extra space of sorting = 2*O(m) + O(m) + O(n) + O(1) = O(m + n)
Important note: we recommend transforming the above pseudo-codes into a favorite programming language (C, C++, Java, Python, etc.) and verifying all the test cases. Enjoy programming!
Please write in the message below if you find anything incorrect, or you want to share more insight, or you know some different approaches to solve this problem. Enjoy learning, Enjoy algorithms!
Given two integer arrays X[] and Y[], write a program to check if the arrays are equal or not. Two arrays are equal if they have the same elements in any order. If there are repeated elements, then counts of repeated elements must also be the same for both arrays.
The Least Recently Used (LRU) is one of the popular caching strategies, which defines the policy to discard the least recently used items first from the cache and make room for new elements when the cache is full. It is used to organize items in order of their use, which allows identifying items that have not been used for a long time.
Write a program to find the equilibrium index of an array. An array's equilibrium index is an index such that sum of elements at lower indexes equal to the sum of elements at higher indexes. This is an excellent coding question to learn — how to optimize space complexity and solve the problem using a single scan.
Subscribe to get free weekly content on data structure and algorithms, machine learning, system design, oops and math. enjoy learning!