Difficulty: Easy, Asked-in: Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs
Let's understand the binary search problem
Given a sorted array X of n elements, search a given element key in X. If the key exists, then return its index in the sorted array. Otherwise, return -1.
- All the integers in X are unique.
- X is sorted in ascending order.
Input: X = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3
Explanation: 5 exists in X at index 3. So we return 3.
Input: X = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1
Explanation: 6 does not exist in X. So we return -1.
How does the binary search algorithm work?
Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. The computer tells us whether the guessed number is equal to, greater, or less than the actual number for each guess.
The linear guess of all the continuous values 1, 2, . . . 16 will be inefficient because, with each question, we eliminate one number only. In the worst-case scenario, we need to guess 16 times! Can we think to find a better solution and find the number with a minimum number of guesses in the worst case? Can we take advantage of the sorted order input? One insight is simple: if we pick any number x in the sorted sequence, all the numbers on the left will be less than x, and all the numbers on the right will be greater than x.
Now the critical question is: What would be the best guess at the start? The idea is simple: numbers are given in sorted order, so the best first guess would be to choose the middle number (8 here) and reduce the set of numbers by half in one go.
- If the actual number is equal to the guessed number(8), we are done.
- If the actual number is less than the guessed number(8), then we ignore the range of numbers from 9 to 16 in our next guess. We repeat the same process for the numbers 1 to 8 and select the middle number 4, and so on.
- If the actual number is greater than the guessed number(8), then we ignore the range of numbers from 1 to 7 in our next guess. We repeat the same process for the numbers 9 to 16 and select the middle number 12, and so on.
After every guess, we reject half of the given numbers in one go. This is a significant improvement! We are decreasing the set of numbers by a factor of 1/2, so the minimum number of guesses required in the worst case is 4. How? Think!
Divide and conquer idea of binary search
We start searching the target key on the sorted array X of size n and calculate the mid index.
- If (key == X[mid]): We have found the target value and return the mid index.
- If (key < X[mid]): We search for the key in the left subarray.
- If (key > X[mid]): We search for the key in the right subarray.
In a similar way, we search until the key is found (successful search) or the subarray size gets reduced to 0 (unsuccessful search). The core idea is simple: At each stage of the process, we continuously reduce the search interval or input size by half.
Binary search implementation
Suppose we use a function binarySearch(X, l, r, key) to search the given key in the sorted array. Here l and r as the index of the left and right end of the subinterval in the array X. We start with l = 0 and r = n - 1.
- We calculate the middle index i.e., mid = l + (r - l)/2
- If (X[mid] == key): We return the value mid.
Note: why did we not use the equation (l + r)/2 to calculate the mid index? Here is the reason: practically, for the large value of the left and right index, the value of (l + r) may exceed the range of integers of the data type (even if l and r are within the range). This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. Follow this wonderful reference for better understanding: https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html
- If (X[mid] > key): The key must not be present in the right half of the array. So we recursively search for the key in the left half of the array i.e. binarySearch(X, l, mid - 1, key).
- If (X[mid] < key): The key must not be present in the left half of the array. So we recursively search for the key in the right half of the array i.e. binarySearch(X, mid + 1, r, key).
- Similarly, on the basis of comparison with the middle value, we continue searching recursively on either the left or right half of the array until we find the key or reach the base case.
This is a trivial step because, after comparing the middle element, one sub-problem solution (either left or right half) will return the index (if the key is present) or return -1 (if the key is not present). No need to combine the solution to the sub-problems. Think!
The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words: this would be the last stage of the recursion or the smallest version of the sub-problem.
Recursive binary search pseudocode
int binarySearch(int X, int l, int r, int key)
if (l > r)
int mid = l + (r - l) / 2
if (X[mid] == key)
if (X[mid] > key)
return binarySearch(X, l, mid - 1, key)
return binarySearch(X, mid + 1, r, key)
Iterative implementation of binary search
Binary search can be easy to visualize using recursion. But the critical question is: can we implement the binary search using iteration or loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right index. So we need to find a way to update the left and right index of the current sub-array using a loop. Here is an idea:
- If (X[mid] > key): We need to search the key in the left half of the array. For the left array, the left index would be the same, but the right index would be mid - 1.
- If (X[mid] < key): We need to search the key in the right half of the array. For the right array, the right index would be the same, but the left index would be mid + 1.
- Similarly, after comparison with the mid-value, we continue searching iteratively on either the left or right half of the array, again finding the middle element and proceeding as before. Loop will stop if we either find the key (successful search) or if the size of the subarray shrinks to zero (l > r), which is a case of unsuccessful search). In other words: the loop will continue till (l <= r).
Pseudocode of iterative binary search
int binarySearch(int X, int l, int r, int key)
while (l <= r)
int mid = l + (r - l) / 2
if (X[mid] == key)
if (X[mid] < key)
l = mid + 1
r = mid - 1
Time complexity analysis of binary search
After each comparison, the input size decreases by half. Initially, we have n elements, then we have n/2 elements after the 1st comparison, n/4 elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search) i.e. n -> n/2 -> n/4 -> …… 1 -> unsuccessful search.
- Suppose we reach the base case after the k number of steps => n/2^k = 1 => n = 2^k => k = log2n. In simple words: after log2n number of steps, binary search algorithms reches its base case.
- At each step of the recursion, we perform O(1) operations. So worst-case time complexity of binary search = log2n * O(1) = O(logn).
Another idea of analysis !
For the clear picture, let’s assume for the moment that the size of the array is a power of 2 i.e. n = 2^k => k = log2n. Now when we compare the middle element each time, we cut the size of the subarrays by half.
Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k - 1) and after the second step, the subarray size will be 2^(k - 2), and so on. After the k or log2n number of steps, we will reach the scene of a single element array. Now in the next step, we will reach the base case.
So all together we can have at most k + 1 or log2n + 1 number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation. Overall, when given the array size n we perform c (log2 n + 1) operations in the worst case. So binary search worst-case time complexity = O(logn)
Binary search analysis using master theorem
Let’s assume that T(n) is the worst-case time complexity of the binary search for n elements. When we have n > 0, we can break down the time complexities as follows :
- Divide part : The time complexity of this part is O(1) because we only calculate the middle index of the array, which takes constant time.
- Conquer part : We are recursively solving one sub-problems of size n/2. So the overall time complexity of the conquer part is T(n/2).
- Combine part : As mentioned above, this part is trivial. So time complexity of the combine part is O(1).
For calculating the T(n), we need to add the time complexities of the divide, conquer, and combine part => T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation of the worst-case time complexity:
- T(n) = c, if n = 1
- T(n) = T(n/2) + c, if n > 1
This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can think to apply the master theorem! There are the three cases of the solution via master theorem:
- If f(n) = O(n^k) where k < logb(a) then T(n) = O(n*logb(a))
- If f(n) = O(n^k) where k = logb(a) then T(n) = O(n^k * logn)
- If f(n) = O(n^k) where k > logb(a) then T(n) = O(f(n))
If we compare the recurrence relation of binary search and master theorem:
- a = 1, b = 2 where a≥1 and b>1
- f(n) = c = cn^0 = O(n^0) => k = 0
- Similarly, logb(a) = log 2(1) = 0. Hence, logb(a) = k = 0
Binary search recurrence satisfy the 2nd case of the master theorem. So time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn)
Space complexity of binary search algorithm
The binary search space complexity depends on its implementation. In the iterative approach, we use constant extra space so that the space complexity would be O(1). In the recursive method, space complexity depends on the size of the recursion call stack.
- Recursion call stack size depends on the total number of levels in the recursion tree or the height of the recursion tree.
- The height of the recursion tree = logn + 1. From top to down till base case, we have only one recursive call at each level! So the space complexity of the recursive binary search = O(log n)
An interesting story from history!
Here is a note by Jon Bentley about binary search from the book Programming Pearls:
I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right. But they aren’t the only ones to find this task difficult: in section 6.2.1 of his Sorting and Searching history, Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962.
Critical ideas to think about!
- Most programmers make an error in defining the base condition of recursive code or exit condition of iterative code. Think!
- For small array sizes, linear search is faster than binary search!
- We can use the binary search idea to solve several coding problems where the array has some order property similar to the sorted array. For example, for a given value, how can we modify the binary search to find the count of smaller elements, the value of the next-smallest element, all nearest neighbors, etc.?
- Even if duplicate elements are in the array, a binary search may return an index that equals the target value. Still, it does not always return the first or last occurrence of the element. Can we modify the binary search to find the first or last occurrence of the element? Think!
- The binary search algorithm may not work efficiently on sorted arrays when insertion and deletion frequently occur with searching. The insertion and deletion operations will take O(n) time in the case of a sorted array. Think!
- Exponential search is a variation of binary search for the unbounded array, i.e., an array for which the rightmost boundary is unknown. It can also work perfectly on the bounded array, but efficiency is only better than binary search if the target value lies near the start of the array. Think!
- In a binary search tree, elements are arranged in sorted order of the tree structure, and each data can be searched using an idea similar to a binary search. Insertion and deletion work perfectly, requiring an average O(logn) time faster than the insertion and deletion of sorted arrays.
Similar concepts to explore further
Binary search-based problems to practice
Please write in the message below if you find anything incorrect, or you want to share more insight. Enjoy learning, Enjoy algorithms!