Difficulty: Medium, Asked-in: Microsoft, Google, Amazon, Yahoo, Samsung
Key Takeaways
You are given an array X[] consisting of n elements, write a program to find the majority element in an array i.e. return the number which appears more than n/2 times.
Input: X[] = [2, 12, 2, 2, 2, 4, 2], Output: 2
Explanation: 2 is the majority element which appears 5 times.
Input: X[] = [4, 3, 4], Output: 4
Input: X[] = [1, 1, 1, 1, 1, 1], Output: 1
A basic approach would be to check whether each element is the majority element or not. We can use two nested loops where outer loop iterates over array and inner loop count occurrences of each element. If we found an element with frequency greater than n/2, that’s our majority element.
int getMajority(int X[], int n)
{
for (int i = 0; i < n; i = i + 1)
{
int majorityCount = 0;
for (int j = 0; j < n; j = j + 1)
{
if (X[i] == X[j])
majorityCount = majorityCount + 1;
}
if (majorityCount > n/2)
return X[i];
}
}
def getMajority(X, n):
for i in range(n):
majorityCount = 0
for j in range(n):
if X[i] == X[j]:
majorityCount = majorityCount + 1
if majorityCount > n/2:
return X[i]
We are running two nested loops where each loop is running n times. Time complexity = O(n²), Space complexity = O(1). Can we improve time complexity further? Think!
Given in the problem statement, the majority element always exists in array. So if we sort the array in increasing order, then the majority element will get grouped on adjacent indices. Can we access the majority element directly in sorted array? Think!
If we divide sorted array into two halves, each part contains n/2 elements each. We also know that the majority element repeated more than n/2 times in the array. Therefore, if the majority element is present in array, it must be present at the middle index. This is true for the cases when n is odd, and n is even.
int getMajority (int X[], int n)
{
sort(X, n)
return X[n/2]
}
Suppose we are using an efficient O(nlogn) sorting algorithm heap sort. Time complexity = Time complexity of heap sort + Time complexity of accessing the middle element = O(nlogn) + O(1) = O(nlogn)
Space complexity = O(1), Heap sort is an in-place sorting algorithm.
Can we apply divide and conquer approach? Here is an insight: If we divide array into two halves and recursively find the majority element of left and right halves, we can easily determine the overall majority element in linear time. Think!
We define a helper function getMajority(int X[], int l, int r) which takes left and right ends of array as a parameter. Initial value of l = 0 and r = n - 1.
Base case: This is the trivial case of single element array when both left and right ends are equal. We return this single element as the majority element i.e. if (l == r), return X[l] or X[r].
Divide step: We divide array into two equal halves by calculating mid-value, i.e., mid = l + (r - l)/2
Conquer step: We recursively calculate the majority element of left and right halves and store them in variables leftMajority and rightMajority.
Combine step: If majority elements of left and right halves are equal, then it must be the global majority element and we return this value. Think!. Otherwise, one of them must be equal to the global majority element. To determine this, we count the frequency of leftMajority and rightMajority in input array and return value with maximum frequency as the final output.
if (leftMajority == rightMajority)
return leftMajority
int leftCount = countFrequency(X, l, r, leftMajority)
int rightCount = countFrequency(X, l, r, rightMajority)
if(leftCount > rightCount)
return leftMajority
else
return rightMajority
Pseudocode implementation of countFrequncy method
int countFrequency (int X[], int l, int r, int value)
{
int count = 0
for (int i = l; i <= r; i = i + 1)
{
if (X[i] == value)
count = count + 1
}
return count
}
int countFrequency(int X[], int l, int r, int value)
{
int count = 0;
for (int i = l; i <= r; i = i + 1)
{
if (X[i] == value)
count = count + 1;
}
return count;
}
int getMajority(int X[], int l, int r)
{
if (l == r)
return X[l];
int mid = l + (r - l) / 2;
int leftMajority = getMajority(X, l, mid);
int rightMajority = getMajority(X, mid + 1, r);
if (leftMajority == rightMajority)
return leftMajority;
int leftCount = countFrequency(X, l, r, leftMajority);
int rightCount = countFrequency(X, l, r, rightMajority);
if (leftCount > rightCount)
return leftMajority;
else
return rightMajority;
}
int getMajorityElement(int X[], int n)
{
return getMajority(X, 0, n - 1);
}
def countFrequency(X, l, r, value):
count = 0
for i in range(l, r+1):
if X[i] == value:
count = count + 1
return count
def getMajority(X, l, r):
if l == r:
return X[l]
mid = l + (r - l) // 2
leftMajority = getMajority(X, l, mid)
rightMajority = getMajority(X, mid+1, r)
if leftMajority == rightMajority:
return leftMajority
leftCount = countFrequency(X, l, r, leftMajority)
rightCount = countFrequency(X, l, r, rightMajority)
if leftCount > rightCount:
return leftMajority
else:
return rightMajority
We are solving problem size n by recursively solving two smaller sub-problem of size n/2 and combing the solution of these two smaller problems by counting frequency of leftMajority and rightMajority (by performing two linear scans of size n).
So time complexity can be represented by the following recurrence relation :
T(n) = 2 T(n/2) + O(n), if n > 1
T(n) = O(1), if n = 1
We can use the master theorem or recursion tree method to solve this recurrence relation. If we observe closely, this is similar to merge sort recurrence. Time complexity = O(nlogn).
To better understand this analysis, explore the analysis of recursion blog.
Space complexity = O (logn) for recursion call stack. The divide and conquer method does not explicitly allocate any additional memory, but it uses an additional memory in stack frames due to recursion. The stack frame size depends on the max depth of the recursion tree, which is equal to O(logn) here. Think!
In brute force approach, we are counting the frequency of each element using inner loop. Can we remove this inner loop and count frequency in O(1)? Yes, the idea would be to use hash table to count and store frequency of each distinct element. The hash table allows us to access and insert elements in O(1) on average.
int getMajority (int X[], int n)
{
HashTable H
for (int i = 0; i < n; i = i + 1)
{
if (H.search(X[i]) == true)
H[X[i]] = H[X[i]] + 1
else
H[X[i]] = 1
if (H[X[i]] > n/2)
return X[i]
}
}
int getMajority(int X[], int n)
{
unordered_map<int, int> H;
for (int i = 0; i < n; i = i + 1)
{
if (H.count(X[i]) > 0)
H[X[i]] = H[X[i]] + 1;
else
H[X[i]] = 1;
if (H[X[i]] > n/2)
return X[i];
}
}
def getMajority(X, n):
H = defaultdict(int)
for i in range(n):
H[X[i]] = H[X[i]] + 1
if H[X[i]] > n/2:
return X[i]
public class Solution
{
public int getMajority(int[] X, int n)
{
HashMap<Integer, Integer> H = new HashMap<>();
for (int i = 0; i < n; i = i + 1)
{
if (H.containsKey(X[i]))
H.put(X[i], H.get(X[i]) + 1);
else
H.put(X[i], 1);
if (H.get(X[i]) > n / 2)
return X[i];
}
}
}
We are traversing array once and performing constant time hash table operations on each iteration. Therefore, time complexity = n*O(1) = O(n). Space complexity = O(n) for storing elements in hash table.
We can use mathematical insights from the binary representation of the numbers to find the majority element. How? Let's think.
Assume that each element is a 32-bit integer. As given in the problem, we also know that there is a majority element in the array. Now for all numbers, we count the frequency of set bits i.e. count the frequency of 1's at each bit position in the bitwise representation (It's like polling for each bit position).
At any position in bitwise representation:
For example, X[ ] = [6, 13, 13, 2, 13], n = 5, let's assume our input is 4-bit integer.
6 -> 0 1 1 0
13 -> 1 1 0 1
13 -> 1 1 0 1
2 -> 0 0 1 0
13 -> 1 1 0 1
-----------------------
Majority = 1 1 0 1 = 13
In the above example, let's start from left to right. 1's are in the majority at 1st, 2nd, and 4th position, so set bits at similar position in the majority element would be 1. At 3rd position, number of 1's across all array elements is only 2, not a majority, so it is set to 0 in the majority element.
So the idea would be simple: By identifying majority bits at every bit position for all the numbers, we can construct the majority element bitwise. How do we implement it? Let's think.
Now we iterate over entire array and increase countOnes by 1 if current bit position is 1 in an element i.e.
for(i = 0; i < n; i = i + 1)
{
if((X[i] & (1 << currBit)) != 0)
countOnes = countOnes + 1
}
By the end of loop, if countOnes value is greater than n/2, we set current bit position as 1 in variable majorityElement. Otherwise, we leave it because bit at that position is already set 0 in the majorityElement.
if(countOnes > n/2)
majorityElement = majorityElement + (1 << currBit)
int getMajority(int X[], int n)
{
int majorityElement = 0;
for (int currBit = 0; currBit < 32; currBit = currBit + 1)
{
int countOnes = 0;
for (int i = 0; i < n; i = i + 1)
{
if ((X[i] & (1 << currBit)) != 0)
countOnes = countOnes + 1;
}
if (countOnes > n / 2)
majorityElement = majorityElement + (1 << currBit);
}
return majorityElement;
}
def getMajority(X, n):
majorityElement = 0
for currBit in range(32):
countOnes = 0
for i in range(n):
if (X[i] & (1 << currBit)) != 0:
countOnes = countOnes + 1
if countOnes > n / 2:
majorityElement = majorityElement + (1 << currBit)
return majorityElement
We are running two nested loops where inner loop is running n times and outer loop is running 32 times or constant time. At each iteration of nested loops, we are doing constant number of operations. Time complexity = 32 * O(n) * O(1) = O(n).
Space complexity = O(1), we are using constant number of variables only.
Here is the intuition of randomized algorithm : More than n/2 elements are equal to the majority element and a randomly picked element has more than 50% probability of success. So, we just select a random element in the array and check whether it is majority element or not. Return that element if it is, otherwise, keep repeating the same process.
There are some critical questions : What is the exact probability that algorithm will find repeated elements? What is the running time of the algorithm? How a randomized algorithm is a good choice? Think! We can get an answer to this question in the analysis section.
int countFrequency(int X[], int n, int value)
{
int count = 0;
for (int i = 0; i < n; i = i + 1)
{
if (X[i] == value)
count = count + 1;
}
return count;
}
int randRange(int min, int max)
{
return rand() % (max - min + 1) + min;
}
int getMajorityElement(int X[], int n)
{
while (true)
{
int randomElement = X[randRange(0, n - 1)];
int k = countFrequency(X, n, randomElement);
if (k > n / 2)
return randomElement;
}
}
def countFrequency(X, n, value):
count = 0
for i in range(n):
if X[i] == value:
count = count + 1
return count
def randRange(min, max):
return random.randrange(min, max + 1)
def getMajorityElement(X, n):
while True:
randomElement = X[randRange(0, n - 1)]
k = countFrequency(X, n, randomElement)
if k > n / 2:
return randomElement
public class Solution
{
private int countFrequency(int[] X, int n, int value)
{
int count = 0;
for (int i = 0; i < n; i = i + 1)
{
if (X[i] == value)
count = count + 1;
}
return count;
}
private int randRange(int min, int max)
{
return new Random().nextInt((max - min) + 1) + min;
}
public int getMajorityElement(int[] X, int n)
{
while (true)
{
int randomElement = X[randRange(0, n - 1)];
int k = countFrequency(X, n, randomElement);
if (k > n / 2) return randomElement;
}
}
}
Note: The idea of “with high probability” is often used in the analysis of the randomized algorithms, where the “high probability” is with respect to the input size n.
Here is an intuition behind this algorithm: Since majority element occurs more than n/2 times, its frequency is greater than all other elements combined. Therefore, if we mark the occurrence of majority element as +1 and occurrence of any other element as -1, then overall sum of them would be definitely greater than zero. Think!
Another interesting analogy to understand this algorithm: Suppose we have n number of people, each holding one element of the array. Then, whenever two people find each other that neither holds the same array element as the other, they sit down. Eventually, in the end, if anyone is left standing, then that element is the majority element. Since majority element occurs more than n/2 times, we can guarantee that this approach will always find the majority element.
There are many ways to think about why this algorithm works. One good intuition is to think of this algorithm as breaking input into a chunk of consecutive copies of particular values. Incrementing the counter corresponds to marking multiple copies of the same value while decrementing it corresponds to some other sequences of values, “canceling out” accumulation of values of a particular type.
Note: As long as the majority element occurs with a frequency of at least n/2, we can guarantee that this approach will always find the majority element.
int majorityElement(int X[], int n)
{
int count = 0;
int majorityCandidate = 0;
for (int i = 0; i < n; i = i + 1)
{
if (count == 0)
majorityCandidate = X[i];
if (X[i] == majorityCandidate)
count = count + 1;
else
count = count - 1;
}
return majorityCandidate;
}
def majorityElement(X, n):
count = 0
majorityCandidate = 0
for i in range(n):
if count == 0:
majorityCandidate = X[i]
if X[i] == majorityCandidate:
count = count + 1
else:
count = count - 1
return majorityCandidate
We are running a single loop n time and performing a constant operation at each step of the iteration. Time complexity = O(n). Space complexity = O(1), because we are using constant extra space.
Enjoy learning, Enjoy algorithms!
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.