sorting algorithm, in computer science, a procedure for ordering elements in a list by repeating a sequence of steps. Sorting algorithms allow a list of items to be sorted so that the list is more usable than it was, usually by placing the items in numerical order (from the least value to the greatest or vice versa) or lexicographical order (also called dictionary order, a generalization of alphabetical order). Sorting algorithms are a vital building block of many other applications, including search tools, data analysis, and e-commerce. There are many sorting algorithms, but most applications use sorts with relatively low computational complexity—for example, Quicksort or merge sort.
In computer science, an algorithm is a procedure for solving a computational problem. For a problem to be solved with an algorithm, it must be clearly defined so that the accuracy of the algorithm can be evaluated. In the case of sorting, this is generally straightforward, as the elements are typically sorted in one of two well-defined ways. One way is numerical order, which is based on some numerical value the elements possess. Sorting dates from earliest to latest, sorting physicians by patient ratings from highest to lowest, or sorting items for sale in a store from least to most expensive are all examples of numerical sorting. The other common way of sorting is lexicographical order, or dictionary order, which is how alphanumeric string values are typically sorted. In most cases, this is equivalent to alphabetical order, though it is possible to use alternative rules to sort dictionaries of elements.
One of the key ways sorting algorithms are evaluated is by their computational complexity—a measure of how much time and memory a particular algorithm requires to function. Because the actual time and space requirements vary depending on the specifics of the problem being solved (e.g., sorting a hundred items or a million), computer scientists use big-O notation to describe how quickly resource costs grow as the size of the problem grows. The O of big-O notation refers to the order, or kind, of growth the function experiences. O(1), for example, indicates that the complexity of the algorithm is constant, while O(n) indicates that the complexity of the problem grows in a linear fashion as n increases, where n is a variable related to the size of the problem—for example, the length of the list to be sorted. The O value of a particular algorithm may also depend upon the specifics of the problem, so it is sometimes analyzed for best-case, worst-case, and average scenarios.
Sorting algorithms can be described in several ways. Some algorithms are stable, meaning that the order of equivalent elements will be preserved, while others are unstable, meaning that the order may not be preserved. For example, if a database of personal contacts is sorted first by name and then by address, a stable sort algorithm will ensure that people living at the same address remain alphabetized by name within each address.
Sorting algorithms can also be divided into comparison algorithms, which sort by comparing one element in a list to another, and non-comparison algorithms, which find other ways to sort. Non-comparison algorithms are sometimes less computationally complex than even the best comparison algorithms, but they are often limited in other ways. For example, a counting sort has lower time requirements than any comparison sort, but it can be applied only to integers. The examples given below are comparison algorithms.
One simple sorting algorithm is known as selection sort. In this algorithm, the sort function analyzes an unsorted list to find the first element—typically the minimum—then swaps it with the first element in the list. It then analyzes the unsorted part of the list again to find the new minimum element and swaps it with the first element in the unsorted part of the list. It repeats this process until each element in the list has been sorted. For example, consider the list {8,5,1,3,7}. The algorithm sorts the list into {1,5,8,3,7} by switching the 8 and the 1 and continues until the list is completely sorted.
Selection sort is relatively intuitive, resembling the way many people sort elements in their daily lives. However, it is also computationally complex. A pass through the list must occur for every element in the list, and, during each pass, every unsorted element must be evaluated to find the minimum. Consequently, selection sort uses O(n2), or quadratic time, where n is the length of the list. This notation means that the time it takes to sort a list grows in proportion to the square of the length of the list.
A similar sorting method, insertion sort, involves building a sorted list at the beginning of the unsorted list. Each element in the unsorted list is inserted in its correct place in the sorted list. In the list {8,5,1,3,7}, the first step is to place the 5 before the 8 to make {5,8,1,3,7}.
Like selection sort, insertion sort uses O(n2) time on average, but, in its best case, it is much quicker than selection sort. For an already sorted list, insertion sort uses O(n) time, because it must compare each item only to the previous one to find the item’s place.
A more efficient algorithm than insertion sort is merge sort. Merge sort uses a divide-and-conquer approach, splitting an unsorted list into two sublists and continuing to split the sublists in that fashion until each has one or fewer elements. Then the algorithm iteratively merges the sublists in sorted order until the entire list is sorted. The list {8,5,1,3,7} is thus split into five sublists of one element each that are then merged back into one sorted list.
Merge sort uses O(n logn) time, which is significantly more efficient than O(n2). The amount of time required grows with the length of the list multiplied by the log, or logarithm, of the length, because the number of iterations required depends on how many times the list must be divided in half.
Another popular divide-and-conquer sort algorithm is Quicksort. In Quicksort, one of the values in a list is chosen as a “pivot.” Each other value on the list is then compared with the pivot and placed either before it, if it is smaller, or after it, if it is greater. Then a new pivot is chosen for each of the unsorted sublists, and they are organized in the same fashion. This process is repeated until the entire list is sorted. In the list {8,5,1,3,7}, 1 may be chosen as the pivot, resulting in {1,8,5,3,7}. The number 5 may be chosen as the next pivot, leading to {1,3,5,8,7}, and, with 8 as the next pivot, the list is fully sorted.
Unlike merge sort, the computational complexity of Quicksort is highly dependent on the existing order of the list to be sorted. In most cases, Quicksort will average out to O(n logn) time, like merge sort. However, if a list happens to be ordered backward (e.g., from greatest to least when one intends to sort from least to greatest), Quicksort uses O(n2) time.