# 2.2 Sorting

## SORTING ALGORITHMS

Sorting is another very important area of algorithms. Computers often have to sort large amounts of data into order based on some attribute of that data, such as sorting a list of files by their name or size, or emails by the date they were received, or a customer list according to people’s names. Most of the time this is done to make searching easier. For example you might have a large amount of data and each piece of data could be someone’s name and their phone number. If you want to search for someone by name it would help to first have the data sorted alphabetically according to everyones names, but if you then wanted to search for a phone number it would be more useful to have the data sorted according to people’s phone numbers.

Like searching there are many different sorting algorithms, but some take much longer than others. In this section you will be introduced to two slower algorithms and one much better one.

### 2.3.1. SCALES INTERACTIVE

Throughout this section you can use the sorting interactive to test out the algorithms we talk about. When you’re using it make sure you take note of the comparisons at the bottom of the screen, each time you compare two boxes the algorithm is making ‘one comparison’ so the total number of comparisons you have to make with each algorithm is the cost of that algorithm for the 8 boxes.

Use the scales to compare the boxes (you can only compare two boxes at a time) and then arrange them along the bottom of the screen. Arrange them so that the lightest box is on the far left and the heaviest is on the far right. Once you think they are in order click ‘Test order’.

If the interactive does not run properly on your computer you can use a set of physical balance scales instead — just make sure you can only tell if one box is heavier than the other, not their exact weight (so not digital scales that show the exact weight).

## SELECTION SORT

One of the most intuitive ways to sort a group of boxes into order, from lightest to heaviest, is to start by first finding the lightest (or the heaviest) box and placing that to the side. Try this with the scales interactive.

After finding the lightest box simply repeat the process again with the remaining boxes until you find the second lightest, now place that to the side alongside the lightest box. If you keep repeating this process you will eventually find you have placed each box into order. Try sorting the whole group of boxes in the scales interactive into order using this method and count how many comparisons you have to make.

Tip: Start by moving all the boxes to the right of the screen and then once you have found the lightest box place it to the far right (if you want to find the heaviest first instead then move them all to the left).

If you record how many comparisons you had to make each time to find the next lightest box you might notice a pattern (hint: finding the lightest should take 7 comparisons, and then finding the second lightest should take 6 comparisons…). If you can see the pattern then how many comparisons do you think it would take to then sort 9 boxes into order? What about 20? If you knew how many comparisons it would take to sort 1000 boxes, then how many more comparisons would it take to sort 1001 instead?

This algorithm is called Selection sort, because each time you look through the list you are ‘selecting’ the next lightest box and putting it into the correct position. If you go back to the algorithms racing interactive at the top of the page you might now be able to watch the selection sort list and understand what it is doing at each step.

The selection sort algorithm can be described as follows.

• Find the smallest item in the list and place it to one side. This will be your sorted list.
• Next find the smallest item in the remaining list, remove it and place it into your sorted list beside the item you previously put to the side.
• Repeat this process until all items have been selected and moved into their correct position in the sorted list.

You can swap the word ‘smallest’ for ‘largest’ and the algorithm will still work, as long as you are consistent it doesn’t matter if you are looking for the smallest or the largest item each time.

## QUICKSORT

Selection Sort may seem like logical ways to sort things into order, but they both take far too many comparisons when they are used for large amounts of data. Remember computers often have to search through HUGE amounts of data, so even if they use a good searching algorithm like Binary Search to look through their data, if they use a bad sorting algorithm to first sort that data into order then finding anything will take far too long!

A much better sorting algorithm is Quicksort! (the name is a bit of a giveaway)

This algorithm is a little more complicated, but is very powerful. To do this algorithm with the sorting interactive, start by randomly choosing a box and placing it on the scales. Now compare every other box to the one you selected; heavier boxes should be put on the right of the second row and lighter boxes are put on the left. When you are done, place the box you were comparing everything else to between these two groups, but to help you keep track of things, put it in the row below. The following example shows how it might look after this step. Note that the selected block is in the right place for the final sorted order, and everything on either side will remain on the side that it is on.

Now apply this process to each of the two groups of boxes (the lighter ones, then the heavier ones). Keep on doing this until they are all sorted. The boxes should then be in sorted order!

It might be worth trying this algorithm out a few times and counting the number of comparisons you perform each time. This is because sometimes you might be unlucky and happen to pick the heaviest, or the lightest box first. On the other hand you might be very lucky and choose the middle box to compare everything to first. Depending on this the number of comparisons you perform will change.

Quicksort can be described in the following way:

• Choose an item from the list and compare every other item in the list to this (this item is often called the pivot).
• Place all the items that are greater than it into one subgroup and all the items that are smaller into another subgroup. Place the pivot item in between these two subgroups.
• Choose a subgroup and repeat this process. Eventually each subgroup will contain only one item and at this stage the items will be in sorted order.

The following files will run quicksort in various languages:

## 2.4. OTHER TOPICS IN ALGORITHMS

• There is another searching algorithm which performs even better than Binary Search. It is called Hashing and can be investigated with the CS Unplugged Battleships Game.
• There are some problems for which no good algorithms have been found (and many people believe they will never be found). For more on these kinds of algorithms see the Complexity and Tractability chapter in the Field Guide.

## 2.5. THE WHOLE STORY!

We’ve only really scraped the surface of algorithms in this chapter, as there are millions of different algorithms for millions of different problems! Algorithms are used in maths, route planning, network planning and operation, problem solving, artificial intelligence, genetic programming, computer vision, the list goes on and on! But by going through this chapter you should have gained an understanding of the key concepts of algorithms and will be well prepared to tackle more complicated ones in the future.

In this chapter we have only talked about the number of comparisons an algorithm makes, and the amount of time a program takes to complete as ‘costs’ of algorithms. There are actually many other ways of measuring the cost of an algorithm. These include the amount of memory the algorithm uses and its computational complexity. Computer Scientists use ‘Big O notation’ to more accurately describe the performance or complexity of an algorithm, and you are likely to come across this notation very quickly when investigating the performance of algorithms. It characterises the resources needed by an algorithm and is usually applied to the execution time required, or sometimes the space used by the algorithm.

Here are some Big O examples:

• O(1) - An algorithm with O(1) complexity will always execute in the same amount of time regardless of how much data you give it to process
• O(n) - The amount of time an algorithm with O(n) complexity will take to execute will increase linearly and in direct proportion to the amount of data you give it to process. Remember that Big O describes the worst case scenario so the algorithm might sometimes take less time, but the greatest amount of time it can take will increase in direct proportion to the amount of data it is given.
• O(n2) - The performance of an algorithm with this complexity is directly proportional to the square of the size of the input data set.
• O(2n) - The amount of time an algorithm with this complexity will take to complete will double with each additional element added to the data set! Does this remind you of any of the algorithms you have looked at in this chapter?

Big O Notation however requires some advanced mathematics to explore thoroughly so has been intentionally left out of this main chapter, but if you want to learn more check out the Useful Links section. These topics are looked at in more depth in the Complexity and Tractability chapter.

To make things even more complicated, in practice algorithms are running on computers that have cached memory and virtual memory, where the time to access a particular value can be particularly short, or particularly long. There is a whole range of algorithms that are used for this situation to make sure that the algorithm still runs efficiently in such environments. Such algorithms are still based on the ideas we’ve looked at in this chapter, but require some clever adjustments to ensure that they work well.