Searching

2.1. WHAT’S THE BIG PICTURE?

Every computer device you have ever used, from your school computers to your calculator, has been using algorithms to tell it how to do whatever it was doing. Algorithms are a very important topic in Computer Science because they help software developers create efficient and error free programs. The most important thing to remember about algorithms is that there can be many different algorithms for the same problem, but some are much better than others!

Computers are incredibly fast at manipulating, moving and looking through data. However the amount of data computers use is often so large that it doesn’t matter how fast the computer is, it will take it far too long to examine every single piece of data (companies like Google, Facebook and Twitter process about 1 billion things per day). This is where algorithms come in. If a computer is given a better algorithm to process the data then it doesn’t matter how much information it has to look through, it will still be able to do it in a reasonable amount of time.

If you have read through the Introduction chapter you may remember that the speed of an application on a computer makes a big difference to a human using it. If an application you create is too slow, people will get frustrated with it and won’t use it. It doesn’t matter if your program can solve all their life problems, if it takes too long they will simply get bored and close it!

2.1.1. ALGORITHMS, PROGRAMS AND INFORMAL INSTRUCTIONS

At this stage you might be thinking that algorithms and computer programs kind of sound like the same thing, but they are actually two very distinct concepts. Descriptions of these and another important concept, Informal Instructions, are below. They are each different ways of describing how to do something:

  • Informal Instruction: An instruction using natural language. They are un-precise so computers cannot understand them, but humans are able to use their own intelligence to interpret them. This is the least precise of our three descriptions for doing things, and is typically used to give a very simple description of the general idea of an algorithm.
  • Algorithm: step by step process that describes how to solve a problem and/or complete a task, which will always give a result. They are more precise than Informal Instructions and do not require any knowledge or intelligence to follow, however they are still not precise enough for a computer to follow. These are often expressed using pseudo-code, which matches a programming language fairly closely, but leaves out details that could easily be added later by a programmer, and doesn’t specify the kinds of commands that can be used.
  • Program: a specific implementation of an algorithm, which is written in a specific programming language and will give a certain result. This is the most precise of these three descriptions and computers are able to follow and understand these.

For example…

  • Informal Instruction: “Get me a glass of water”. A human can understand what this means and can figure out how to accomplish this task by thinking, but a computer would have no idea how to do this!
  • Algorithm: 1) Go to the kitchen. 2) Pick up a glass. 3) Turn on the tap. 4) Put the glass under the running water and remove it once it is almost full. 5) Turn off the tap. 6) Take the glass back to the person who gave the instruction. A human could follow these instructions easily, but a computer could not figure out exactly what to do.
  • Program: A computer program, written in a programming language, which would tell a robot exactly how to retrieve a glass of water and bring it back to the person who asked for the water.

ALGORITHM COST

When Computer Scientists are comparing algorithms they often talk about the ‘cost’ of an algorithm. The cost of an algorithm can be interpreted in several different ways, but it is always related to how well an algorithm performs based on the size of its input, n. In this chapter we will talk about the cost of an algorithm as either the time it takes a program (which performs the algorithm) to complete, and the number of comparisons the algorithm makes before it finishes.

The amount of time a program which performs the algorithm takes to complete may seem like the simplest cost we could look at, but this can actually be affected by a lot of different things, like the speed of the computer being used, or the programming language the program has been written in. This means that if the time the program takes to complete is used to measure the cost of an algorithm it is important to use the same program and the same computer (or another computer with the same speed) for testing the algorithm with different numbers of inputs.

Extra For Experts

The number of comparisons an algorithm makes however will not change depending on the speed of a computer, or the programming language the program using the algorithm is written in. Some algorithms will always make the same number of comparisons for a certain input size, while others might vary.

If you want to find out more about how the cost of an algorithm is described in industry, with ‘Big-O Notation’, then check out “The Whole Story!” section at the end of this chapter.

Algorithm Correctness

If we develop or are given an algorithm to solve a problem, how do we know that it works? Sometimes we create test cases to verify the algorithm produces correct output for specific input values. While this is a useful practice and can help verify that we are on the right track, it is not enough to show that our algorithm is correct. The old adage "even a broken watch is correct twice a day" is a good analogy. Even an algorithm that is correct for two test cases might be incorrect for every other input. A computer scientist must reason formally or mathematically about an algorithm to show its correctness. Typically this is done by classifying ranges of input values and showing that algorithm produces expected results for boundary values of the range and all values in between.

Correctness is particularly important when comparing two algorithms that solve the same problem. If one algorithm is very fast to complete but produces incorrect results some of the time it may be far less useful than a correct algorithm that is slower. Correctness is also important when using an algorithm as the building block for another algorithm. Here is an algorithm for assigning animals as pets to people on a waitlist:

  1. Search for the person who is earliest on the the waitlist
  2. Assign the person who is earliest on the waitlist with their preferred animal as a pet
  3. Repeat 1-2 until no people remain on the waitlist

This algorithm relies on a correct search algorithm in the first step. If the search algorithm incorrectly chose a random person, the algorithm for assigning animals as pets would also be incorrect.

As you will see in this chapter with searching and sorting there exist multiple correct algorithms for the same problem. Often there are good reasons to know multiple correct algorithms because there are tradeoffs in simplicity, algorithm cost, and assumptions about inputs.

SEARCHING 

In this chapter we will look at two of the most common and important types of algorithms, Searching and Sorting. You probably come across these kinds of algorithms every time you use a computer without even realising!

2.2. SEARCHING

Searching through collections of data is something computers have to do all the time. It happens every time you type in a search on Google, or when you type in a file name to search for on your computer. Computers deal with such huge amounts of data that we need fast algorithms to help us find information quickly. Lets investigate searching with a game…

CLICK HERE FOR ACTIVITY

You may have noticed that the numbers on the monsters and pets in the game were in a random order, which meant that finding the pet was basically luck! You might have found it on your first try, or if you were less lucky you might have had to look inside almost all the presents before you found it. This might not seem like such a bad thing since you had enough lives to look under all the boxes, but imagine if there had been 1,000 boxes, or worse 1,000,000! It would have taken far too long to look through all the boxes and the pet might have never been found.

Now this next game is slightly different. You have less lives, which makes things a bit more challenging, but this time the numbers inside the boxes will be in order. The monsters, or maybe the pet, with the smallest number is in the present on the far left, and the one with the largest number is in the present on the far right. Let’s see if you can collect all the pets without running out of lives…

CLICK HERE FOR ACTIVITY

Now that you have played through the whole game (and hopefully found all of the lost pets!) you may have noticed that even though you had less lives in the second part of the game, and lots of presents to search through, you were still able to find the pet. Why was this possible?

LINEAR SEARCH 

Since the boxes in the first game were in a random order there really wasn’t any strategy you could have used to find the pet, except simply keep opening presents one by one until you found the pet. This is very similar to the Linear Search Algorithm (sometimes called a sequential search). In plain english, this algorithm is as follows:

  • Check if the first item in a list is the item you are searching for, if it is the one you are looking for, you are done.
  • If it isn’t the item you are searching for move on and check the next item.
  • Continue checking items until you find the one you are searching for.

If you used this algorithm you might get lucky and find what you are looking for on your first go, but if you were really unlucky you might have to look through everything in your list before you found the right object! For a list of 10 items this means on average you would only have to look at 5 items to find what you were looking for, but for a list of 10000 you would have to look through on average 5000.

Curiosity: Bozo Search

BINARY SEARCH

A much better algorithm to use is called Binary Search. In the second part of the present searching game the boxes were in order, which meant you were able to be more clever when you were searching for the pet, and you might have been using a Binary Search without realising...

If you used a Binary Search on each of the levels then you would have always had enough lives to find the pet! Informally, the Binary Search algorithm is as follows.

  • Look at the item in the centre of the list and compare it to what you are searching for
  • If it is what you are looking for then you are done.
  • If it is larger than the item you are looking for then you can ignore all the items in the list which are larger than that item (if the list is from smallest to largest this means you can ignore all the items to the right of the centre item).
  • If it is smaller then you can ignore all the items in the list which are smaller than that centre item.
  • Now repeat the algorithm on the remaining half of the list, checking the middle of the list and choosing one of the halves, until you find the item you are searching for.

Binary Search is a very powerful algorithm. If you had 1000 presents to Search through it would take you at most 10 checks for Binary search to find something and Linear search would take at most 1000 checks, but if you doubled the number of presents to search through how would this change the number of checks made by Binary Search and Linear search?

Hopefully you’ve noticed that the answer for each of these algorithms would be different.

It is important to remember that you can only perform a Binary Search if the items you are searching through are sorted into order. This makes the sorting algorithms we will look at next even more important because without sorting algorithms we wouldn’t be able to use Binary Search to quickly look through data!

Spoiler: How does doubling the number of boxes affect the number of checks required?