Binary search tree (BST) is a linked representation of a binary tree, where each node has a key and associated value. The key stored in each node must satisfy the following **binary search tree property**.

- Keys in the left subtree of a node ≤ Node key
- Keys in the right subtree of a node ≥ Node key
- The above property holds true for every node. In other words, both left and right subtrees must be binary search trees.

**Note:** In the linked representation of a binary tree, each node is an object which contains a key, value, pointer to the left child, and pointer to the right child. Based on requirements, we can also store additional attributes like a pointer to the parent node, count of nodes, etc.

Binary search tree data structure supports many basic operations like search, insert, delete, maximum, minimum, predecessor, successor, kth smallest, etc. Each of these basic operations on a binary search tree takes time proportional to the tree’s height, i.e. O(h). Note: We will discuss each of these operations in a separate blog.

- For a balanced binary search tree with n nodes, the above operations run in O(logn) time. If a binary search tree is a linear chain of n nodes, the above operations will take O(n) time.
- The average height of a randomly built binary search tree is O(logn), so basic operations on such a tree take O(logn) time on average.

```
class BinarySearchTree
{
private BSTNode root
private class BSTNode
{
private int key
private int value
private BSTNode left
private BSTNode right
public BSTNode(int x)
{
key = x;
left = right = NULL
}
}
public BinarySearchTree()
{
root = null
}
public void insert(int key)
{
root = insertKey(root, key)
}
private BSTNode insertKey(BSTNode root, int key)
{
//Implementation logic
}
public BSTNode search(int key)
{
return searchKey(root, key)
}
private BSTNode searchKey(BSTNode root, int key)
{
//Implementation logic
}
public void delete(int key)
{
root = deleteKey(root, key)
}
private BSTNode deleteKey(BSTNode root, int key)
{
//Implementation logic
}
public void inorder()
{
inorderWalk(root)
}
private void inorderWalk(BSTNode root)
{
//Implementation logic
}
.... Other Methods....
}
```

One idea for constructing binary search trees will be to insert each key one by one, starting from an empty tree. But the height of such a binary search tree depends on the order in which keys are inserted. For example:

- In the best case, BST with n keys could be perfectly balanced, with O(logn) nodes between each path from the root node to the leaf node.
- In the worst case, if n elements are inserted in increasing or decreasing order, the tree will be a chain of nodes with height n — 1.

As we know, the time complexity of each basic operation on a binary search tree depends on the height of the tree. If there are frequent insertions and deletions, the binary search tree height will also change. The height will be O(n) in the worst case and O(logn) in the best case. So critical question is: How can we maintain a balanced BST to perform operations in O(logn) time?

If we create BST by inserting n keys randomly, the average height will be closer to the best case, i.e. O(logn). Here we assume that each of the n! permutations of the input keys are equally likely. **Note:** We will discuss the proof and properties of randomized BST in a separate blog.

This behaviour is similar to randomized quicksort, where the average case [O(nlogn)] is much closer to the best case [O(nlogn)] than to the worst case [O(n²)]. In other words, the construction of BST and quicksort algorithm looks similar: The root node of the BST corresponds to the pivot in the quicksort (no keys to the left are larger, and no keys to the right are smaller) and both subtrees correspond to the left and right partitions.

One idea is clear: Basic operations are fast if the binary search tree is balanced, i.e. h = O(logn). So critical question is: How can we maintain a height-balanced BST when frequent insertions and deletions happen? Here comes the idea of self-balancing BSTs like AVL trees and red-black trees that guarantee the O(logn) height of BST in the worst case.

**AVL Tree** is a self-balancing binary tree such that, for every node, the difference between the heights of its left and right subtrees is at most 1. The main idea is to perform rotation operations (to balance the tree) in constant time during insertion and deletion, such that the tree's height is always O(logn).

**Red-black tree** is a self-balancing binary search tree where each node store an extra bit of information, i.e. colour (red or black). These colours ensure that the tree remains balanced during insertions and deletions. Although the balance of the tree is not perfect, it is good enough to reduce the searching time and maintain it around O(log n) time, where n is the total number of nodes in the tree.

**Property 1:** Binary search tree property allows us to explore all keys in sorted order if we traverse the binary search tree using in-order traversal. The idea is simple: In order traversal, first, explore all keys in the left subtree which are less than the root key, then explore the root key and finally explore all the keys in the right subtree which are greater than the root key.

**Property 2:** A binary search tree is a recursive object consisting of the root node and two smaller binary search trees: left and right subtrees. So we can solve a lot of BST questions using recursion.

**Property 3:** Binary search tree combines the idea of fast searching in a sorted array and flexibility of insertion or deletion in a linked list.

**Property 4:** A BST represents a set of keys associated values, and many different BSTs can represent the same set. The total number of possible binary search trees with n different keys = Catalan number Cn = (2n)! / ((n + 1)! * n!).

**Property 5:** A binary search tree labels each node in a binary tree with a single key such that for any node labelled x, all nodes in the left subtree of x have keys < x while all nodes in the right subtree of x have keys > x. So for any binary tree with n keys, there is exactly one labelling that makes it a binary search tree.

**Property 6:** In a binary search tree, new nodes are inserted into null links at the bottom of the tree, which preserve the tree structure. For example, the root has the first key inserted, one of the children of the root has the second key inserted, and so forth. Because each node has two links, the tree grows out rather than just down. So we only need to explore keys on the path from the root to the leaf and number of keys explored becomes a smaller fraction of the number of keys in the tree as the tree size increases.

The time complexity of dictionary operations (insert, delete and search) is inefficient on some data structures. For example, doubly-linked lists can perform insertion and deletion in O(1) time, but searching will take O(n) time in the worse case. Similarly, a searching operation on the sorted array will take O(logn) time, but the insert and delete operation will take O(n) time in the worst case.

Self-balancing binary search tree will take O(logn) time to perform insert, delete and search operations. But one of the best solutions is to use a hash table, which performs dictionary operations in O(1) time average. The critical question is: Hash table seems to beat BST on these three basic operations. When should we prefer BST over the hash table? Let’s think!

Binary search trees are memory-efficient because they do not require more memory than needed. On the other hand, a hash table can waste some space if we don’t know the exact number of elements we want to store. For example, if a hash function has a range 0 ≤ h(k) ≤ 100, then we need to allocate an array of 100 elements, even if we are storing 20 elements. To store the same information, the binary search tree will only allocate as much space as we need.

Binary search tree store keys in sorted order and help us perform many other operations efficiently in O(logn) time. Such flexibility is not there with hash tables, which may require extra effort. For example, BST performs these operations in O(logn) time: finding min/max, deleting min/max, finding successor/predecessor, finding kth largest/smallest, etc. That’s why a self-balancing binary search tree can be used as both dictionary and a double-ended priority queue. With a heap data structure, we can either implement a priority queue which supports extractMin() or with extractMax() operation. If we wish to support both operations, we can use a self-balancing binary search tree as a priority queue.

BSTs are easy to implement and customise in comparison to hash tables. To implement hashing, we mostly rely on libraries provided by programming languages. On the other hand, poor choice of the hash function can cause a lot of collisions and hamper the hash table performance. When the number of elements grows too much in a hash table, we need to extend and reallocate the hash table, which may be an expensive operation. But BST is simple in terms of implementation and does not require sudden allocation of a lot of data and rehashing operations.

Binary search tree allows us to do range searches efficiently. In other words, BST can do range searches quite efficiently since it does not search a subtree which is impossible to have the answer. But in a hash table, such operations can be complex and inefficient, i.e. we need to iterate over every slot, which is O(n).

- We can sort a given set of n numbers by first building a binary search tree containing these numbers and then printing the numbers by an inorder tree walk. What is the worst-case and best-case time complexity for this sorting algorithm?
- Design an array implementation of BST with three arrays: one to store keys, one to store array indices corresponding to left pointers, and one to store array indices corresponding to right pointers. Also, compare the efficiency of operations with standard linked implementation.
- How can we modify BST implementation to keep the most recently accessed key so that it can be accessed in O(1) time if the next insert() or search() uses the same key?
- Design a method that takes a node and two keys,
**min**and**max**, as arguments and returns true if all the keys are between min and max. - What is the difference between the binary-search-tree property and the min-heap property? Can we use the min-heap property to print keys in sorted order in O(n) time?
- How many binary search tree shapes of n nodes are there with height n? How many ways are there to insert n distinct keys into an initially empty BST that results in a tree of height n?
- Suppose we construct a BST by repeatedly inserting distinct keys. Proof that the number of nodes explored in searching for a key is one plus the number of nodes examined when a key was first inserted into the tree.
- Is deleting key x and then key y from a binary search tree leaves the same tree as deleting key y and then key x?
- Prove that if a node in a BST has two children, its successor has no left child, and its predecessor has no right child.
- Suppose we have an estimate ahead of time of how often search keys are to be accessed in a BST and the freedom to insert them in any order we desire. Should the keys be inserted into the tree in increasing order, decreasing order of likely frequency of access, or some other order?
- Comparison-based sorting takes atleast O(nlogn) time for sorting n elements. Similarly, proves that no comparison-based algorithm can build a BST using fewer than O(nlogn) comparisons.

We hope you enjoyed learning the concepts of the binary search tree. We will add more insights to this blog in the coming future. Enjoy learning, enjoy algorithms! Looking forward to your feedback and insights.

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.