**Difficulty:** Easy, **Asked-in:** Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs

Given an array X of n integers **sorted** in ascending order and an integer **key**, write a program to search for the key in X. If the key exists, then return its index. Otherwise, return -1.

**Example 1**

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3

Explanation: 5 exists in X[] at index 3. So we return 3.

**Example 2**

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1

Explanation: 6 does not exist in X[]. So we return -1.

Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. For each guess, the computer will tell us whether the guessed number is equal to, greater than, or less than the actual number.

A linear guess of all the continuous values 1, 2, . . . 16 would be inefficient because, with each question, we eliminate only one number. In the worst-case scenario, we need to guess 16 times! Can we find a better solution and discover the number with a minimum number of guesses?

Now, the critical question is: What would be the best guess at the start? Since the computer provides us with comparative insights about the guessed number and the actual number, the best first guess would be to choose the middle number (8).

- If the actual number is equal to 8, the computer will return true, and we are done.
- If the actual number is less than 8, then the computer will tell us that the actual number is less than 8. So we can ignore the range of numbers from 9 to 16 in our next guess. We repeat the same process for the numbers 1 to 8 and select the middle number 4, and so on.
- If the actual number is greater than 8, then the computer will tell us that the actual number is greater than 8. So we ignore the range of numbers from 1 to 7 in our next guess. We repeat the same process for the numbers 9 to 16 and select the middle number 12, and so on.

After every guess, we are rejecting half of the given numbers in one go. In the worst case, we need four comparisons to identify the actual number. This is a significant improvement!

Now, based on the above idea, can we take advantage of the sorted input order to find the target value efficiently? One insight is simple: if we pick any number "x" in the sorted sequence, all the numbers on the left will be less than "x", and all the numbers on the right will be greater than "x".

So based on this idea, we can design a simple divide and conquer strategy where we compare the target value **key** with the value at the **mid** index. If the middle element is equal to the key, then we are done. Otherwise, based on the comparison, we search for the key in the left half or right half.

**If key is equal to X[mid]**, we have found the target value and return the mid index.**If key is less than X[mid]**, we search for the key in the left subarray.**If key is greater than X[mid**], we search for the key in the right subarray.

Similarly, we keep searching until the key is found (successful search) or the subarray size is reduced to 0 (unsuccessful search). Here, we are solving the searching problem of input size n by using the searching problem of input size n/2 (either the left half or the right half). The core idea is simple: at each stage of the process, we continuously reduce the search interval by half. Think!

Suppose we use a function **binarySearch(X[], l, r, key)** to search for the given key in the sorted array. Here, l and r are the indices of the left and right ends of the subarray. We start with l = 0 and r = n - 1.

- We calculate the middle index as mid = l + (r - l)/2.
- If (X[mid] == key), we return the value of mid.

**Note:** Why did we not use the equation (l + r)/2 to calculate the mid-index? Here is the reason: Practically, for large values of the left and right index, the value of (l + r) may exceed the range of integers in programming, even if l and r are within the range. This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. For a better understanding, follow this wonderful reference.

**If (X[mid] > key):**The key must not be present in the right half. So we recursively search for the key in the left half, i.e.**binarySearch(X, l, mid - 1, key)**.**If (X[mid] < key):**The key must not be present in the left half. So we recursively search for the key in the right half, i.e.**binarySearch(X, mid + 1, r, key)**.- Similarly, on the basis of comparison with the middle value, we continue searching recursively on either the left or right half until we find the key or reach the base case.

This is a trivial step because after comparing the middle element, one sub-problem solution (either the left or right half) will return the index or return -1 . There is no need to combine the solutions to the sub-problems.

The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero, i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words, this would be the last stage of the recursion or the smallest version of the sub-problem.

```
int binarySearch(int X[], int l, int r, int key)
{
if (l > r)
return -1;
else
{
int mid = l + (r - l) / 2;
if (X[mid] == key)
return mid;
if (X[mid] > key)
return binarySearch(X, l, mid - 1, key);
else
return binarySearch(X, mid + 1, r, key);
}
}
```

```
def binarySearch(X, l, r, key):
if l > r:
return -1
else:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] > key:
return binarySearch(X, l, mid - 1, key)
else:
return binarySearch(X, mid + 1, r, key)
```

Binary search can be easy to visualize using recursion. The critical question is: Can we implement this using iteration or a loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right ends of the search subarray.

So we need to find a way to update the left and right ends of the current subarray using a loop. Here's an idea:

**If (X[mid] > key):**We need to search for the**key**in the left half of the array. For the left array, the left end would be the same, but the right end would be**mid - 1**.**If (X[mid] < key):**We need to search for the**key**in the right half of the array. For the right array, the right end would be the same, but the left end would be**mid + 1**.- Similarly, after comparison with the mid-value, we continue searching iteratively on either the left or right half of the array, again finding the middle element and proceeding as before.
- The loop will stop if we either find the
**key**(successful search) or if the size of the subarray shrinks to zero (**l > r**), which is a case of an unsuccessful search. In other words, the loop will continue until**(l <= r)**.

```
int binarySearch(int X[], int l, int r, int key)
{
while (l <= r)
{
int mid = l + (r - l) / 2;
if (X[mid] > key)
r = mid - 1;
if (X[mid] < key)
l = mid + 1;
else
return mid;
}
return -1;
}
```

```
def binarySearch(X, l, r, key):
while l <= r:
mid = l + (r - l) // 2
if X[mid] > key:
r = mid - 1
elif X[mid] < key:
l = mid + 1
else:
return mid
return -1
```

After each comparison, the input size decreases by half. Initially, we have **n** elements, then we have **n/2** elements after the 1st comparison, **n/4** elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search), i.e., **n -> n/2 -> n/4 -> ... 1 -> unsuccessful search**.

Suppose we reach the base case after k number of steps **=> n/2^k = 1 => n = 2^k => k = log2n**. In simple words, after **log2n** number of steps, the algorithm will reach its base case.

At each step of the recursion, we perform O(1) operations. So the worst-case time complexity of the binary search is **log2n * O(1) = O(logn).**

For a clear picture, let's assume for the moment that the size of the array is a power of 2, i.e., n = 2^k => k = log2n. Now, when we compare the middle element each time, we cut the size of the subarrays by half.

Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k-1) and after the second step, the subarray size will be 2^(k-2), and so on. After the k or log2n number of steps, we will reach a subarray of size 1. Now, in the next step, we will reach the base case.

So altogether, we can have at most k+1 or log2n+1 number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation.

Overall, when given the array size n, we perform c(log2n + 1) operations in the worst case. So the worst-case time complexity of the binary search is O(logn).

Let's assume that T(n) is the worst-case time complexity of binary search for **n** elements. When n > 0, we can break down the time complexities as follows:

**Divide part**: The time complexity of this part is O(1) because we only calculate the middle index of the array, which takes constant time.**Conquer part**: We are recursively solving one sub-problem of size n/2. So the overall time complexity of the conquer part is T(n/2).**Combine part**: As mentioned above, this part is trivial. So the time complexity of the combine part is O(1).

To calculate T(n), we need to add the time complexities of the divide, conquer, and combine parts: T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation for the worst-case time complexity:

- T(n) = c, if n = 1
- T(n) = T(n/2) + c, if n > 1

This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can apply the master theorem! There are three cases for the solution via the master theorem:

- If f(n) = O(n^k) where k < logb(a), then T(n) = O(n^logb(a)).
- If f(n) = O(n^k) where k = logb(a), then T(n) = O(n^k * logn).
- If f(n) = O(n^k) where k > logb(a), then T(n) = O(f(n)).

If we compare the recurrence relation of binary search and the master theorem:

- a = 1, b = 2 where a ≥ 1 and b > 1.
- f(n) = c = cn^0 = O(n^0) => k = 0.
- Similarly, logb(a) = log2(1) = 0. Hence, logb(a) = k = 0.

The above recurrence satisfies the second case of the master theorem. So, the time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn). Note: You can explore analysis of recursion blog to learn more about analysis using the master theorem.

The space complexity of the binary search algorithm depends on its implementation. In iterative approach, we use constant extra space, so the space complexity is **O(1)**. However, in the recursive method, the space complexity depends on the size of the recursion call stack, which depends on the height of the recursion tree.

The height of the recursion tree is **logn + 1** because the input size is decreasing by a factor of 1/2. So the space complexity of the recursive binary search algorithm is **O(log n)**.

Here is a note by Jon Bentley from the book Programming Pearls:

*I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right.*

- We can use the binary search idea to solve several coding problems where the array has some order property similar to the sorted array. For example, for a given value, how can we modify the binary search to find the count of smaller elements, the value of the next-smallest element, all nearest neighbours, etc.?
- Even if duplicate elements are in the array, the binary search may return an index that equals the target value. However, it does not always return the first or last occurrence of the element. Can we modify the binary search algorithm to find the first or last occurrence of the element? Think!
- The binary search may not work efficiently on sorted arrays when insertion and deletion frequently occur while searching. The insertion and deletion operations will take O(n) time in the case of a sorted array. Think!
- Exponential search is a variation of the binary search algorithm for an unbounded array, i.e., an array for which the rightmost boundary is unknown. It can also work perfectly on a bounded array, but its efficiency is better than binary search only if the target value lies near the start of the array. Think!
- In a BST, elements are arranged in sorted order of the tree structure, and each data can be searched using an idea similar to binary search. Insertion and deletion work perfectly, requiring an average O(logn) time, faster than the insertion and deletion of sorted arrays.

- max element in an array which is first increasing and then decreasing
- First and last positions of an element in a sorted array
- Find the row with the maximum number of 1s
- Median of two sorted arrays of the equal size
- Search in a row-wise sorted 2D matrix
- Find the square root of an integer
- Find the element that appears once in a sorted array
- Find the missing number in arithmetic progression

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!