A Segment Tree is a data structure used to answer multiple range queries on an array efficiently. Also, it allows us to modify the array by replacing an element or an entire range of elements in logarithmic time. This blog will focus on the build and query operations on segment trees.
Consider this problem: We have an array with some integers. We get multiple range queries [l, r], and the answer to each query is the minimum value from the array between the positions l and r.
Example
Suppose input array is arr[] = [3, 6, 2, 1, 5, 10, 2, 11, 4, 8], then following are the queries and their corresponding answer.
Range: [1, 2], Output: 2, Explanation: We understand from the query that we need to return the minimum value from the array between indices l = 1 and r = 2. Thus, if we look at the subarray, arr[1 : 2] = [6, 2], the minimum value in this range is 2. So the answer to this query is 2.
Range: [4, 5], Output: 1, Explanation: Similarly, for this query, we return the minimum value from the subarray arr[2 : 5] = [2, 1, 5, 10], which is 1.
The brute force method to solve this problem is straightforward. We iterate from index l to r of the array for each query, finding the minimum in this range. Thus, one query takes O(r - l) time or O(n) time in the worst case. If there are q queries, this method will take O(q*n) time in the worst case. One insight is clear: as the number of queries becomes large, our code will take much time to process all queries. So our goal is to design a data structure that can answer each query in logarithmic time O(logn).
A segment tree is a binary tree such that each node stores information about a range, which depends on the problem. In our case, each node will store a minimum value in an interval. We can also use a segment tree to store different types of information such as range maximum, range sum, range XOR, etc.
Consider an array arr[] with size n and the segment tree st.
We can store segment trees in an array. As we know that in a 0 based indexing of array: if a parent is present at index i, its children will be present on indices 2*i + 1, 2*i + 2. Thus the root of the tree represents the entire range of the array from 0 to n - 1.
Now we break down this interval into two half intervals. The children of the root node represent the intervals arr[0, n /2] and arr[n/2 + 1 : n - 1]. Now each interval is further divided into two halves, and the two children of each node represent these two halves. As the segment tree is divided into half in each step, the height of the segment tree will be O(logn) and the tree has n leaves corresponding to the n indices in the array.
The following image shows the segment tree for the given array. The red text represents the current node's range; the blue text is the node's indexes, and the black text is the minimum value from the array in this range.
We use a divide and conquer strategy to build the segment tree. In the divide step, we come top-down and recursively divide the range into two halves till there is only one element in the range. We know that if there is only one element in the range, that element itself is the answer.
Now segment tree formation takes place during conquer and combine step i.e. it will start in a bottom-up fashion from the leaf nodes to the root node. In other words, it will update the nodes that come in the path from a leaf to root in this process, where data of the child nodes are used to form a parent node in each step. So, recursion will end up at the root node representing the whole array. Note: The combine step may be different for different questions like range maximum, range sum, etc.
We define a function buildSegTree(int st[], int arr[], int nodeIndex, int leftRange, int rightRange) that takes these values as an input parameters:
Initially, nodeIndex = 0(root node), leftRange = 0, and rightRange = n - 1.
Divide Step: By calculating the mid-index, we divide the array into two halves: leftRange to mid and mid + 1 to rightRange, where mid = leftRange + (rightRange - leftRange)/2.
Conquer Step: We recursively call the same function with modified input parameters to build the segment tree for the left and right half of the array i.e. buildSegTree( 2 * nodeIndex + 1, leftRange, mid) and buildSegTree( 2 * nodeIndex + 2, mid + 1, rightRange).
Combine Step: This is the most crucial step. We use the data of the child nodes to decide what information the parent node stores. In our case, as we need to find the range minimum, we take the minimum of the values stored in the child nodes and keep them in the parent node i.e. st[nodeIndex] = min (st[2 * nodeIndex + 1], st[2 * nodeIndex + 2]). Note: If we need to find the range sum, we will take the sum of the values stored in the child nodes and store it in the parent node.
Base Case: This is the trivial case when leftRange == rightRange, i.e. this represents a single element from the array which is the situation of a leaf node. The segment tree will store this element i.e. if(leftRange == rightRange), st[nodeIndex] = arr[leftRange).
As discussed above, we also represent the segment tree as an array similar to the heap. The difference here is that heap is a complete binary tree structure, and segment tree is a full binary tree structure. The idea is simple: we always divide segments into two halves at every level in a segment tree.
Properties of a full binary tree:
If the size of the input array n is a power of 2, then the size of the segment tree is 2n -1. The tree has n leaf nodes and n - 1 internal nodes. Now, if we increase the input array size by 1, a new level gets added to the tree. So, if n is not a power of 2, we need to go to the next power of 2. In other words, when the array size n is not a power of 2, approximately 2k - 1 nodes are required for the segment tree, where k is the smallest power of 2 greater than n.
Example array and corresponding segment tree for finding the minimum in a range as the size of the array varies from 1 to 5. Initially, arr = [1], st = [1], size of tree = 1.
arr = [1, 2], st = [1, 1, 2]
The size of tree = 3 (2n -1, where n = 2).
Now n = 3, arr = [1, 2, 3], st =[1, 1, 3, 1, 2, null, null]
arr = [1, 2, 3, 4], st = [1, 1, 3, 1, 2, 3, 4]
The size of tree = 7 (2n -1, where n = 4).
arr = [1, 2, 3, 4, 5], st =[1, 1, 4, 1, 3, 4, 5, 1, 2, null, null, null, null, null, null]
void buildSegTree(int st[], int arr[], int nodeIndex,
int leftRange, int rightRange)
{
if (leftRange == rightRange)
{
//leaf node
st[nodeIndex] = arr[leftRange]
}
else
{
int mid = leftRange + (rightRange - leftRange)/2
//left child
buildSegTree(st, arr, 2 * nodeIndex + 1, leftRange, mid)
//right child
buildSegTree(st, arr, 2 * nodeIndex + 2, mid + 1, rightRange)
//union of left and right, which is minimum in our case
st[nodeIndex] = min(st[2 * nodeIndex + 1], st[2 * nodeIndex + 2])
}
}
We recursively solve the problem of size n by solving the two smaller sub-problems of size n/2 and using the solutions of these two smaller problems to answer the parent node.
We can solve this recurrence using both the recursion tree and the master method. To better understand analysis, you can refer to this blog: Analysis of recursion. If we observe closely, this recurrence relation is similar to the recurrence of the divide and conquer solution of finding max and min in an array. So the time complexity is O(n). In other words, we can build a segment tree in linear time.
Space complexity = Space used st[] array + Space used by recursion call stack.
So overall space complexity = O(n) + O(logn) = O(n)
In this operation, we can query on a range and return the answer to the problem (minimum of the range in our case). We start from the root node of the segment tree. At each node, we check for three conditions:
int rangeMinQuery(int st[], int nodeIndex, int leftRange,
int rightRange, int left, int right)
{
if(right < leftRange || rightRange < left)
{
//case 2: Range represented by node is completely outside the given range
return INT_MAX
}
if(left <= leftRange && rightRange <= right)
{
//case 1: Range represented by node is completely inside the given range
return st[nodeIndex]
}
//case 3: Range represented by a node is partially inside and partially outside the given range
int mid = leftRange + (rightRange - leftRange)/2
int leftRes = rangeMinQuery(2 * nodeIndex + 1,
leftRange, mid, left, right)
int rightRes = rangeMinQuery(2 * nodeIndex + 2,
mid + 1, rightRange, left, right)
return min(leftRes, rightRes)
}
To query a range minimum, we claim that we process two nodes at every level at most. We will try to prove this by contradiction. For example, consider the following tree. Suppose three nodes are explored on a level in the given segment tree.
The range covered by these nodes is from the leftmost colored node to the rightmost colored node. However, in this case, the range of the middle node is entirely covered. Thus, this node will immediately return the value and not be explored. Therefore, we proved that we expand at most two nodes at each level, and since there are logn levels, the number of explored nodes is 2logn = O(logn). Thus time complexity of the query operation is O(logn).
In this blog, we studied how to solve multiple queries on an array in logarithmic time. We looked at the build and query operations of the segment tree. In the next blog, we will study the update operation of the segment tree.
Enjoy learning, Enjoy coding, Enjoy algorithms!
Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.
©2023 Code Algorithms Pvt. Ltd.
All rights reserved.