Comparison of Sorting Algorithms

Why do we study the design, analysis, and comparison of sorting algorithms in data structures? Here are some critical reasons:

We can learn several problem-solving approaches using sorting: Incremental approach (selection and insertion sort), Divide and conquer approach (merge and quicksort), Two pointers approach (merging and partition), Problem-solving using data structures (heapsort and tree sort), etc.
Sorting is one of the best ways to learn complexity analysis of recursive and iterative code.
We also use sorting as a problem-solving approach i.e. we can efficiently solve several problems by organizing data into sorted order.

Based on their importance, we need to understand the critical comparisons between various sorting algorithms based on different parameters such as time complexity, space complexity, in-place sorting, stability, online vs. offline sorting, etc.

Comparison based sorting algorithms

In comparison-based sorting algorithms, we get the sorted order output by comparing elements in the input. It's important to note that all comparison-based sorting algorithms have a lower bound of O(nlogn). In other words, any comparison-based sorting algorithm will take at least O(nlogn) time to sort an array of n elements.

Here are some examples of sorting algorithms which use comparison as a critical operation to complete the sorting process:

Bubble sort: Compares elements to bubble up the maximum to the end.
Selection sort: Compares elements to find the minimum element in the unsorted part and places that element in the sorted part.
Insertion sort: Compares elements to determine the position of an element in the partially sorted array.
Merge sort: Compares elements of two sorted halves to merge them into the final sorted array.
Quick sort: Compares elements to partition the unsorted array into two different halves around the pivot.
Heapsort: Compares elements during the heapify process to place the elements at the correct position in the sorted array.

The worst-case time complexities of the above sorting algorithms will be categorized into two parts: O(n^2) sorting and O(nlogn) sorting.

O(n^2) sorting: Bubble sort, Selection sort, and Insertion sort
O(nlogn) sorting: Merge sort, Quicksort, Heapsort. Note: The worst-case performance of Quicksort is O(n^2), but it works very fast in O(nlogn) time on average.

In addition to the comparison operations, these sorting algorithms also use other types of operations.

Bubble sort: Swapping
Selection sort: Swapping
Insertion sort: Shifting
Merge sort: Allocation of extra memory and data copy.
Quicksort: Swapping
Heapsort: Swapping

However, the count of these operations will always be less than or equal to the count of comparison operations. That's why the comparison operation is the deciding factor for the time complexity in the above sorting algorithms.

Linear time sorting algorithm

There are sorting algorithms that have a faster time complexity than O(nlogn), but they require special assumptions about the input sequence to determine the sorted order. These sorting algorithms use operations other than comparisons to determine the sorted order, and they work in O(n) time complexity. So, the lower bound of O(nlogn) does not apply to these sorting algorithms.

Examples of sorting algorithms that run in linear time are counting sort, radix sort, and bucket sort. Counting sort and radix sort assume that the input consists of integers in a small range. Meanwhile, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval [0, 1).

Here is a comparison of the time and space complexities of some popular sorting algorithms:

Time complexity comparison of sorting algorithms

In-place sorting algorithms

An "in-place" sorting algorithm sorts the elements without using extra memory. Here we store a constant number of elements temporarily outside the array while sorting.

Examples of in-place sorting algorithms are bubble sort, selection sort, insertion sort, quicksort, and heapsort. In contrast, merge sort and counting sort are not in-place algorithms as they require additional memory.

Stable sorting algorithms

A sorting algorithm is "stable" if it preserves the relative order of elements with the same value. For example, if A[i] is equal to A[j] and i < j, then A[i] must appear before A[j] in the sorted output.

Stability is important when we sort the same data set multiple times because it maintains the relative order of equal elements. If all elements have unique values, stability is not a concern.
Bubble sort, insertion sort, merge sort, counting sort, and radix sort are stable sorting algorithms. Selection sort, quicksort, and heapsort are non-stable sorting algorithms.

We can make non-stable sorting algorithms stable by extending the comparison operation to include tie-breaking rules based on the order of the original input. But this process may require additional time and space to remember the original order.

Online and offline sorting algorithms

An algorithm that sorts data as it becomes available is called an "online" sorting algorithm. Such algorithms sequentially process input data and allow sorting to occur before the entire input set is available.

For example, insertion sort considers one input element at a time and places that element in a partially sorted array. This will maintain a sorted output by inserting each new input element in its proper place as it becomes available. So, insertion sort is an online sorting algorithm.

In contrast, an "offline" algorithm requires all input data to be present in memory before sorting can start. For example, the selection sort algorithm sorts an array by finding the minimum element in the unsorted part and placing it in the partially sorted array. This process requires access to the entire input. So, selection sort is an offline sorting algorithm.

Comparison based on problem-solving approaches

Learning sorting algorithms is an effective idea for mastering problem-solving and time complexity analysis of algorithms. Sorting is also frequently used as a key technique to solve various coding problems.

Divide and conquer approach: merge sort and quick sort.
An incremental approach using nested loops: bucket sort, selection sort, and insertion sort.
Problem-solving using data structures: heap sort, tree sort.
Problem-solving using hashing: counting sort.

How to choose the best sorting algorithm?

The choice of the best sorting algorithm depends on several factors, including the size of the input data, the order of the data, memory usage, stability, performance, etc.

For small input data, a simple algorithm like insertion sort can work best. However, for larger data sets, more efficient algorithms like quicksort, merge sort, or heapsort are the best choices.
If the data is almost sorted, insertion sort works best with O(n) time complexity. If the data is random, quicksort, merge sort, or heapsort can be better options.
When memory usage is an important consideration, algorithms like heapsort [O(1) extra space] or quicksort [O(logn) extra space] are preferred over merge sort [O(n) extra space].
For sorting linked lists, merge sort is the optimal choice. It is relatively simple to implement and requires O(nlogn) time, and O(1) extra space. However, linked lists have slow random-access performance, resulting in poor performance for algorithms such as quicksort and making others like heapsort infeasible.
In a parallel computing environment, merge sort is often the preferred choice due to its divide-and-conquer approach. This method divides the input equally at each stage, and each smaller sub-problem is independent of the others. This makes it easy to distribute and process data in parallel across multiple clusters.
Quick sort and merge sort can be relatively simple to implement, but heapsort may require a deeper understanding of binary heaps.
If stability is a concern, merge sort is the best choice because quicksort and heapsort are not stable sorting algorithms.
Quick sort can be cache-friendly due to its in-place sorting property and fewer memory accesses compared to merge sort. Heapsort can also have good cache performance due to its use of binary heaps.
Regardless of the order of the data, when guaranteed O(nlogn) performance is required, merge sort and heapsort are the best choices for sorting. Quick sort performs exceptionally well on random data with O(nlogn) average time complexity, but its performance can degrade on sorted or nearly sorted data [O(n^2) time].

Critical ideas to think!

How can we improve the running time of quicksort by taking advantage of the fast running time of insertion sort when its input is almost sorted?
Can we modify the quick sort or heap sort algorithm to make it stable?
How can we implement quicksort and merge sort iteratively?
Some additional sorting algorithms to explore: Shell sort, Tree sort, and Tournament sort.

Enjoy learning, Enjoy sorting!