Difficulty: Easy, Asked-in: Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs
Given a sorted array X[] of n elements, search a given element key in X[]. If the key exists, then return its index in the sorted array. Otherwise, return -1.
Example 1
Input: X[] = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3
Explanation: 5 exists in X[] at index 3. So we return 3.
Example 2
Input: X[] = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1
Explanation: 6 does not exist in X[]. So we return -1.
Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. The computer tells us whether the guessed number is equal to, greater, or less than the actual number for each guess.
The linear guess of all the continuous values 1, 2, . . . 16 will be inefficient because, with each question, we eliminate one number only. In the worst-case scenario, we need to guess 16 times! Can we think to find a better solution and find the number with a minimum number of guesses in the worst case? Can we take advantage of the sorted order input? One insight is simple: if we pick any number x in the sorted sequence, all the numbers on the left will be less than x, and all the numbers on the right will be greater than x.
Now the critical question is: What would be the best guess at the start? The idea is simple: numbers are given in sorted order, so the best first guess would be to choose the middle number (8 here) and reduce the set of numbers by half in one go.
After every guess, we reject half of the given numbers in one go. This is a significant improvement! We are decreasing the set of numbers by a factor of 1/2, so the minimum number of guesses required in the worst case is 4. How? Think!
We start searching the target key on the sorted array X[] of size n and calculate the mid index.
In a similar way, we search until the key is found (successful search) or the subarray size gets reduced to 0 (unsuccessful search). The core idea is simple: At each stage of the process, we continuously reduce the search interval or input size by half.
Suppose we use a function binarySearch(X[], l, r, key) to search the given key in the sorted array. Here l and r as the index of the left and right end of the subinterval in the array X[]. We start with l = 0 and r = n - 1.
Note: why did we not use the equation (l + r)/2 to calculate the mid index? Here is the reason: practically, for the large value of the left and right index, the value of (l + r) may exceed the range of integers of the data type (even if l and r are within the range). This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. Follow this wonderful reference for better understanding.
This is a trivial step because, after comparing the middle element, one sub-problem solution (either left or right half) will return the index (if the key is present) or return -1 (if the key is not present). No need to combine the solution to the sub-problems. Think!
The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words: this would be the last stage of the recursion or the smallest version of the sub-problem.
int binarySearch(int X[], int l, int r, int key)
{
if (l > r)
return -1
else
{
int mid = l + (r - l) / 2
if (X[mid] == key)
return mid
if (X[mid] > key)
return binarySearch(X, l, mid - 1, key)
else
return binarySearch(X, mid + 1, r, key)
}
}
def binarySearch(X, l, r, key):
if l > r:
return -1
else:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] > key:
return binarySearch(X, l, mid - 1, key)
else:
return binarySearch(X, mid + 1, r, key)
Binary search can be easy to visualize using recursion. The critical question is: Can we implement this using iteration or loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right index. So we need to find a way to update the left and right index of the current sub-array using a loop. Here is an idea:
int binarySearch(int X[], int l, int r, int key)
{
while (l <= r)
{
int mid = l + (r - l) / 2
if (X[mid] == key)
return mid
if (X[mid] < key)
l = mid + 1
else
r = mid - 1
}
return -1
}
def binarySearch(X, l, r, key):
while l <= r:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] < key:
l = mid + 1
else:
r = mid - 1
return -1
After each comparison, the input size decreases by half. Initially, we have n elements, then we have n/2 elements after the 1st comparison, n/4 elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search) i.e. n -> n/2 -> n/4 -> …… 1 -> unsuccessful search.
For the clear picture, let’s assume for the moment that the size of the array is a power of 2 i.e. n = 2^k => k = log2n. Now when we compare the middle element each time, we cut the size of the subarrays by half.
Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k - 1) and after the second step, the subarray size will be 2^(k - 2), and so on. After the k or log2n number of steps, we will reach the scene of a single element array. Now in the next step, we will reach the base case.
So all together we can have at most k + 1 or log2n + 1 number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation. Overall, when given the array size n we perform c (log2 n + 1) operations in the worst case. So binary search worst-case time complexity = O(logn)
Let’s assume that T(n) is the worst-case time complexity of the binary search for n elements. When we have n > 0, we can break down the time complexities as follows :
For calculating the T(n), we need to add the time complexities of the divide, conquer, and combine part => T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation of the worst-case time complexity:
This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can think to apply the master theorem! There are the three cases of the solution via master theorem:
If we compare the recurrence relation of binary search and master theorem:
Above recurrence satisfy the 2nd case of the master theorem. So time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn).
The binary search space complexity depends on its implementation. In the iterative approach, we use constant extra space so that the space complexity would be O(1). In the recursive method, space complexity depends on the size of the recursion call stack.
Here is a note by Jon Bentley from the book Programming Pearls:
I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right.
Please write in the message below if you find anything incorrect, or you want to share more insight. Enjoy learning, Enjoy algorithms!
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.