**Difficulty:** Easy, **Asked-in:** Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs

Given a sorted array X[] of n elements, search a given element **key** in X[]. If the key exists, then return its index in the sorted array. Otherwise, return -1.

- All the integers in X[] are unique.
- X[] is sorted in ascending order.

**Example 1**

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3

Explanation: 5 exists in X[] at index 3. So we return 3.

**Example 2**

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1

Explanation: 6 does not exist in X[]. So we return -1.

Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. The computer tells us whether the guessed number is equal to, greater, or less than the actual number for each guess.

The linear guess of all the continuous values 1, 2, . . . 16 will be inefficient because, with each question, we eliminate one number only. In the worst-case scenario, we need to guess 16 times! Can we think to find a better solution and find the number with a minimum number of guesses in the worst case? Can we take advantage of the sorted order input? One insight is simple: if we pick any number x in the sorted sequence, all the numbers on the left will be less than x, and all the numbers on the right will be greater than x.

Now the critical question is: What would be the best guess at the start? The idea is simple: numbers are given in sorted order, so the best first guess would be to choose the middle number (8 here) and reduce the set of numbers by half in one go.

- If the actual number is equal to the guessed number(8), we are done.
- If the actual number is less than the guessed number(8), then we ignore the range of numbers from 9 to 16 in our next guess. We repeat the same process for the numbers 1 to 8 and select the middle number 4, and so on.
- If the actual number is greater than the guessed number(8), then we ignore the range of numbers from 1 to 7 in our next guess. We repeat the same process for the numbers 9 to 16 and select the middle number 12, and so on.

After every guess, we reject half of the given numbers in one go. This is a significant improvement! We are decreasing the set of numbers by a factor of 1/2, so the minimum number of guesses required in the worst case is 4. How? Think!

We start searching the target **key** on the sorted array X[] of size n and calculate the **mid** index.

**If (key == X[mid]):**We have found the target value and return the mid index.**If (key < X[mid]):**We search for the key in the left subarray.**If (key > X[mid]):**We search for the key in the right subarray.

In a similar way, we search until the key is found (successful search) or the subarray size gets reduced to 0 (unsuccessful search). The core idea is simple: At each stage of the process, we continuously reduce the search interval or input size by half.

Suppose we use a function **binarySearch(X[], l, r, key)** to search the given key in the sorted array. Here l and r as the index of the left and right end of the subinterval in the array X[]. We start with l = 0 and r = n - 1.

- We calculate the middle index i.e., mid = l + (r - l)/2
- If (X[mid] == key): We return the value mid.

**Note:** why did we not use the equation (l + r)/2 to calculate the mid index? Here is the reason: practically, for the large value of the left and right index, the value of (l + r) may exceed the range of integers of the data type (even if l and r are within the range). This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. Follow this wonderful reference for better understanding.

**If (X[mid] > key):**The key must not be present in the right half of the array. So we recursively search for the key in the left half of the array i.e.**binarySearch(X, l, mid - 1, key)**.**If (X[mid] < key):**The key must not be present in the left half of the array. So we recursively search for the key in the right half of the array i.e.**binarySearch(X, mid + 1, r, key)**.- Similarly, on the basis of comparison with the middle value, we continue searching recursively on either the left or right half of the array until we find the key or reach the base case.

This is a trivial step because, after comparing the middle element, one sub-problem solution (either left or right half) will return the index (if the **key** is present) or return -1 (if the **key** is not present). No need to combine the solution to the sub-problems. Think!

The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words: this would be the last stage of the recursion or the smallest version of the sub-problem.

```
int binarySearch(int X[], int l, int r, int key)
{
if (l > r)
return -1
else
{
int mid = l + (r - l) / 2
if (X[mid] == key)
return mid
if (X[mid] > key)
return binarySearch(X, l, mid - 1, key)
else
return binarySearch(X, mid + 1, r, key)
}
}
```

```
def binarySearch(X, l, r, key):
if l > r:
return -1
else:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] > key:
return binarySearch(X, l, mid - 1, key)
else:
return binarySearch(X, mid + 1, r, key)
```

Binary search can be easy to visualize using recursion. The critical question is: Can we implement this using iteration or loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right index. So we need to find a way to update the left and right index of the current sub-array using a loop. Here is an idea:

**If (X[mid] > key):**We need to search the**key**in the left half of the array. For the left array, the left index would be the same, but the right index would be**mid - 1**.- If (X[mid] < key): We need to search the
**key**in the right half of the array. For the right array, the right index would be the same, but the left index would be**mid + 1**. - Similarly, after comparison with the mid-value, we continue searching iteratively on either the left or right half of the array, again finding the middle element and proceeding as before. Loop will stop if we either find the
**key**(successful search) or if the size of the subarray shrinks to zero (**l > r**), which is a case of unsuccessful search). In other words: the loop will continue till**(l <= r).**

```
int binarySearch(int X[], int l, int r, int key)
{
while (l <= r)
{
int mid = l + (r - l) / 2
if (X[mid] == key)
return mid
if (X[mid] < key)
l = mid + 1
else
r = mid - 1
}
return -1
}
```

```
def binarySearch(X, l, r, key):
while l <= r:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] < key:
l = mid + 1
else:
r = mid - 1
return -1
```

After each comparison, the input size decreases by half. Initially, we have **n** elements, then we have **n/2** elements after the 1st comparison, **n/4** elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search) i.e. n -> n/2 -> n/4 -> …… 1 -> unsuccessful search.

- Suppose we reach the base case after the k number of steps => n/2^k = 1 => n = 2^k => k = log2n. In simple words: after log2n number of steps, algorithm will reach its base case.
- At each step of the recursion, we perform O(1) operations. So worst-case time complexity of binary search = log2n * O(1) = O(logn).

For the clear picture, let’s assume for the moment that the size of the array is a power of 2 i.e. n = 2^k => k = log2n. Now when we compare the middle element each time, we cut the size of the subarrays by half.

Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k - 1) and after the second step, the subarray size will be 2^(k - 2), and so on. After the k or log2n number of steps, we will reach the scene of a single element array. Now in the next step, we will reach the base case.

So all together we can have at most **k + 1** or **log2n + 1** number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation. Overall, when given the array size n we perform **c (log2 n + 1)** operations in the worst case. So binary search worst-case time complexity = **O(logn)**

Let’s assume that T(n) is the worst-case time complexity of the binary search for **n** elements. When we have n > 0, we can break down the time complexities as follows :

**Divide part**: The time complexity of this part is O(1) because we only calculate the middle index of the array, which takes constant time.**Conquer part**: We are recursively solving one sub-problems of size n/2. So the overall time complexity of the conquer part is T(n/2).**Combine part**: As mentioned above, this part is trivial. So time complexity of the combine part is O(1).

For calculating the T(n), we need to add the time complexities of the divide, conquer, and combine part => T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation of the worst-case time complexity:

- T(n) = c, if n = 1
- T(n) = T(n/2) + c, if n > 1

This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can think to apply the master theorem! There are the three cases of the solution via master theorem:

- If f(n) = O(n^k) where k < logb(a) then T(n) = O(n*logb(a))
- If f(n) = O(n^k) where k = logb(a) then T(n) = O(n^k * logn)
- If f(n) = O(n^k) where k > logb(a) then T(n) = O(f(n))

If we compare the recurrence relation of binary search and master theorem:

- a = 1, b = 2 where a≥1 and b>1
- f(n) = c = cn^0 = O(n^0) => k = 0
- Similarly, logb(a) = log 2(1) = 0. Hence, logb(a) = k = 0

Above recurrence satisfy the 2nd case of the master theorem. So time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn).

The binary search space complexity depends on its implementation. In the iterative approach, we use constant extra space so that the space complexity would be O(1). In the recursive method, space complexity depends on the size of the recursion call stack.

- Recursion call stack size depends on the total number of levels in the recursion tree or the height of the recursion tree.
- The height of the recursion tree = logn + 1. From top to down till base case, we have only one recursive call at each level! So the space complexity of the recursive binary search algorithm = O(log n)

Here is a note by Jon Bentley from the book Programming Pearls:

I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right.

- Most programmers make an error in defining the base condition of recursive code or exit condition of iterative code. Think!
- We can use the binary search idea to solve several coding problems where the array has some order property similar to the sorted array. For example, for a given value, how can we modify the binary search to find the count of smaller elements, the value of the next-smallest element, all nearest neighbors, etc.?
- Even if duplicate elements are in the array, binary search may return an index that equals the target value. Still, it does not always return the first or last occurrence of the element. Can we modify the binary search algorithm to find the first or last occurrence of the element? Think!
- The binary search may not work efficiently on sorted arrays when insertion and deletion frequently occur with searching. The insertion and deletion operations will take O(n) time in the case of a sorted array. Think!
- Exponential search is a variation of binary search algorithm for the unbounded array, i.e., an array for which the rightmost boundary is unknown. It can also work perfectly on the bounded array, but efficiency is only better than binary search if the target value lies near the start of the array. Think!
- In a BST, elements are arranged in sorted order of the tree structure, and each data can be searched using an idea similar to a binary search. Insertion and deletion work perfectly, requiring an average O(logn) time faster than the insertion and deletion of sorted arrays.

- Exponential Search, Interpolation Search
- Divide and conquer idea of merge sort
- Divide and conquer idea of quicksort
- Analysis of recursion
- Concept of recursion in programming
- Time complexity analysis in DSA

- max element in an array which is first increasing and then decreasing
- First and last positions of an element in a sorted array
- Find the row with the maximum number of 1s
- Median of two sorted arrays of the equal size
- Search in a row-wise sorted 2D matrix
- Find the square root of an integer
- Find the element that appears once in a sorted array
- Find the missing number in arithmetic progression

Please write in the message below if you find anything incorrect, or you want to share more insight. Enjoy learning, Enjoy algorithms!

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.