Why do we study design, analysis and comparison of sorting algorithms in data structures and algorithms? Here are some critical reasons:
- We can learn several problem-solving approaches using sorting algorithms: Incremental approach (selection and insertion sort), Divide and conquer approach (merge and quick sort), Two pointers approach (merging and partition), Problem-solving using data structures (heap and tree sort), etc.
- Sorting is one of the best ideas to learn complexity analysis of recursive and iterative code.
- We also use sorting as a problem-solving approach, i.e., we can solve several problems efficiently by organizing data into sorted order.
So based on importance, we need to understand the critical comparison between various sorting algorithms based on different parameters like time complexity, space complexity, in-place, stability, online vs. offline, etc.
Comparison based sorting algorithms
In comparison-based sorting algorithms, we establish the order of elements in the sorted output by comparing elements in the input. It's important to note that all comparison-based sorting algorithms have a lower bound complexity of O(nlogn) i.e. any comparison-based sorting algorithm will take at least O(nlogn) time to sort an array of n elements.
Here are the sorting algorithms and their comparison methods:
- Bubble sort: Compares elements to find the maximum element in the unsorted part and places that element in the sorted part.
- Selection sort: Compares elements to find the minimum element in the unsorted part and places that element in the sorted part.
- Insertion sort: Compares elements to determine the position of an element in the partially sorted array.
- Merge sort: Compares elements of two sorted halves to merge them into the final sorted array.
- Quick sort: Compares elements to partition the unsorted array into two different halves around the pivot.
- Heapsort: Compares elements during the heapify process to place the minimum elements at the front of the array (if we are using the min-heap).
The worst-case time complexities of the above sorting algorithms can be categorized into two parts: O(n^2) sorting and O(nlogn) sorting.
In addition to the comparison operations, we also perform other types of operations in these sorting algorithms. However, the count of these operations will always be less than or equal to the count of comparison operations. That's why the comparison operation is the deciding factor of time complexity in the above sorting algorithms.
- Bubble sort: Swapping
- Selection sort: Swapping
- Insertion sort: Sifting
- Merge sort: Allocation of extra memory and data copy.
- Quicksort: Swapping
- Heapsort: Swapping
Linear time sorting algorithm
There are sorting algorithms that have a faster time complexity than O(nlogn). However, they require special assumptions about the input sequence to determine the sorted order of elements. These sorting algorithms use operations other than comparisons to determine the sorted order, and they work in O(n) time complexity. So the lower bound of O(nlogn) does not apply to these sorting algorithms.
Examples of sorting algorithms that run in linear time are counting sort, radix sort, and bucket sort. Counting sort and radix sort assume that the input consists of integers in a small range. Meanwhile, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval.
- Counting sort: Each input element must be an integer in the range from 0 to k.
- Radix sort: Given n integers, each integer can have up to k possible values.
- Bucket sort: The input must be generated by a random process that distributes elements uniformly and independently over the interval [0, 1).
Here is the comparison of time and space complexities of some popular sorting algorithms:
In-place sorting algorithms
An "in-place" sorting algorithm is one that sorts the elements of an input array without using extra memory i.e. a constant number of elements can be temporarily stored outside the array while sorting.
Examples of in-place sorting algorithms include bubble sort, selection sort, insertion sort, quicksort, and heapsort. In contrast, merge sort and counting sort are not in-place algorithms, as they require additional memory space.
Stable sorting algorithms
A sorting algorithm is considered "stable" if it preserves the relative order of elements with the same value. For instance, if A[i] is equal to A[j] and i < j, then A[i] must appear before A[j] in the sorted output. Stability is important when sorting the same data set multiple times because it maintains the relative order of equal elements. If all elements have unique values, stability is not a concern.
- Bubble sort, insertion sort, merge sort, counting sort and radix sort are stable sorting algorithms.
- Selection sort, quicksort and heapsort are non-stable sorting algorithms.
Non-stable sorting algorithms can be made stable by extending the comparison operation to include tie-breaking rules based on the order of the original input. However, this process may require additional time and space to remember the original order.
Online and offline sorting algorithms
An algorithm that sorts data as it becomes available is called an "online" sorting algorithm. Such algorithms sequentially process input data and allow sorting to occur before the entire input set is available.
For example, insertion sort considers one input element at a time and places that element in a partially sorted array. So this idea helps us to maintain a sorted output by placing a new input element in its proper place as it becomes available. So insertion sort is an online sorting algorithm.
In contrast, an "offline" algorithm requires all input data to be present in memory before sorting can start. For example, the selection sort algorithm sorts an array by finding the minimum element in the unsorted part and placing it in the partially sorted array. So it requires access to the entire input. That's why selection sort is an offline sorting algorithm.
Comparison based on problem-solving approaches
Learning sorting algorithms is an effective method for mastering problem-solving and time complexity analysis of algorithms. Sorting is also frequently used as a key technique to solve various coding problems.
- Divide and conquer approach: merge sort and quick sort.
- Incremental approach using nested loops: bucket sort, selection sort, insertion sort.
- Problem-solving using data structures: heap sort, tree sort.
- Problem-solving using hashing: counting sort.
How to choose best sorting algorithm?
The choice of the best sorting algorithm depends on several factors, including size of the input data, order of the data, memory usage, stability, performance, etc.
- If input data is small, a simple algorithm like insertion sort can work best. But for larger data sets, we can prefer more efficient algorithms like quicksort or merge sort or heap sort.
- If data is almost sorted, insertion sort work best with O(n) time complexity. If data is random, quicksort, merge sort, or heap sort can be better options.
- When memory usage is an important consideration, algorithms like heap sort [O(1) extra space] or quick sort [O(logn) extra space] can be preferred over merge sort [O(n) extra space].
- For sorting linked lists, merge sort is the optimal choice. It is relatively simple to implement, requires O(nlogn) time, and O(1) extra space. However, linked list has slow random-access performance, which results in poor performance for algorithms such as quicksort and makes others, like heap sort, infeasible.
- In a parallel computing environment, merge sort is often a preferred choice due to divide-and-conquer approach. With this method, input is divided equally at each stage and each smaller sub-problem is independent of the others, making it easy to distribute and process data parallel across multiple clusters.
- Quick sort and merge sort can be relatively simple to implement, but heap sort may require a deeper understanding of binary heaps.
- If stability is a concern, merge sort is the best choice because quick sort and heap sort are not stable sorting algorithms.
- Quick sort can be cache-friendly due to its in-place sorting property and fewer memory accesses compared to merge sort. Heap sort can also have good cache performance due to its use of binary heaps.
- Regardless of the order of the data, when guaranteed O(nlogn) performance is required, merge sort and heap sort are the best choices for sorting. Quick sort performs exceptionally well on random data with an O(nlogn) average time complexity, but its performance can degrade on sorted or nearly sorted data, resulting in O(n^2) in the worst case.
Critical ideas to think!
- How can we improve the running time of quicksort by taking advantage of the fast running time of insertion sort when its input is almost sorted?
- Can we modify the quick sort or heap sort algorithm to make it stable?
- How can we implement quicksort and merge sort iteratively?
- Some additional sorting algorithms to explore: Shell sort, Tree sort, and Tournament sort.
Enjoy learning, Enjoy sorting!