https://sites.fas.harvard.edu/~cscie119/lectures/coursepack.pdf

## Trees

Topic 5 - Abstract data structures- Student Booklet

Topic 5—Abstract data structures

By M Brooke

# 5.1 Abstract data structures (23 hours)

## Thinking recursively

• Recursion
• Base Case

### 5.1.1 Identify a situation that requires the use of recursive thinking.

#### Factorial (http://www.wolframalpha.com/input/?i=factorial)

Let us consider factorials as a basis for investigating recursion. Factorials are represented by an exclamation mark "!". These can be seen below.

3! = 3 * 2 * 1 = 6

4! = 4 * 3 * 2 * 1 = 24

##### Factorial - iterative solution

In order to solve this we could use iteration to do this. Here is an iterative approach to calculating 4!. In this solution we simply loop four times multiplying by one less each time.

loop J from 1 to 4
J = J + 1
end loop

##### Factorial - Recursive solution

In order to solve this with a recursive method there are two elements that must be in place for a recursive algorithm to work. There has to be a base case at which the recursion will stop and there must be a recursive call that moves towards the base case. In the case of a factorial. The two elements are as follows.

Step 1 - Identify the recursive call

In this case it can be noted that 4! = 4 * 3! and 3! = 3* 2! etc. The general case can be expressed as:

n! = n * (n –1)!

Step 2 - Identify the base case

In this case the base case at which we want the iteration to stop is when n = = 1. This can be expressed as follows:

FACTORIAL (N) {
if (N=1) then
output 1
endif
else
output N * FACTORIAL (N-1)
end else
}

#### Towers of Hanoi (http://www.superkids.com/aweb/tools/logic/towers/)

The towers of Hanoi is a mathematical puzzle. It consists of three rods and a number of different sized disks that can be slid on and off the poles. The idea of the game is to move the entire stack of disks from one rod to another following these simple rules:

1. Only one disk can be moved at a time

2. Each move involves moving one disk from one stack to the top of another stack (or empty pole).

3. You can only move disks on top of a smaller disk.

There are several example of this game online for you to play out the scenario. [1] The more disks you have the more turns it will take and in fact there is legend that surround the game in which some monks in a monastery play the game with 64 golden disks. They can only move one disk a day and when the puzzle is complete that time will end.[2]

In order to write the recursive method we will ne to find the base case and the recursive call. It is a little more complex than the factorial example above but let us summarise the code. [3]

##### Towers of Hanoi - Recursive solution

Step 1 - Identify the recursive call

1. To move all but one disk to the spare peg. This can be done recursively.

2. Then move the bottom disk to the destination peg.

3. Move all the other disks onto the destination peg. (This can be done recursively)

Step 2 - Identify the base case

The base case is when the base disk has nothing.

So in pseudocode this would look like this:

HANOI ( N , BEGINNINGPEG, SPAREPEG, ENDPEG) {
if N=1 then
move BEGINNINGPEG to ENDPEG
end if
else
HANOI (N-1, BEGINNINGPEG, ENDPEG, SPAREPEG)
HANOI (1, BEGINNINGPEG, SPAREPEG, ENDPEG)
HANOI (N-1, SPAREPEG, BEGINNINGPEG, ENDPEG)
end else
}

### Fibonnaci number

#### Fibonnaci number (http://en.wikipedia.org/wiki/Fibonacci_number)

In mathematics, the Fibonacci numbers or Fibonacci sequence are the numbers in the following integer sequence[4]:

,1,2,3,5,8,13,21,34,55,89

#### Fibonnaci - Iterative solution

FibonacciIterative (N)

if (n == 0)

Output 0

end if

if (n == 1)

output 1

else

PREVPREV = 0;

PREV = 1;

RESULT = 0;

LOOP I from 2 to <=N

RESULT = PREV + PREVPREV

PREVPREV = PREV

PREV = RESULT

I = I+1

End loop

output RESULT

end else

#### Fibonnaci - Recursive solution.

Step 1 - Identify the recursive call

fibonacci(n) = fibonacci(n - 1) + fibonacci(n -2)

Step 2 - Identify the base case

fibonacci(0) = 0

fibonacci(1) = 1

So in pseudocode this would look like this:

fib(N) {
if (n = 0) then
output 0
endif
else if(n = 1)
output 1
end elseif
else
output fib(n – 1) + fib(n – 2)
end else
}

### 5.1.2 Identify recursive thinking in a specified problem solution.

Towers of Hanoi (http://www.cs.cmu.edu/~cburch/survey/recurse/hanoi.html)

### 5.1.3 Trace a recursive algorithm to express a solution to a problem.

Exercise 40 in Blue Pelican.

#### Key Terms

• Abstract Data Structure
• Array
• 2D array
• Stack
• Queue
• FIFO
• LIFO

## Abstract data structures

### 5.1.4 Describe the characteristics of a two-dimensional array.

Consider the following array/matrix of numbers

31 42 33

21 35 79

22 86 69

75 30 15

In mathematics and in computer science the convention is to declare this array/matrix in terms of rows x columns. ie the above is a 4x3 array. This can be declared in JAVA as

int array [ ] [ ] = new int [4] [3]

### 5.1.5 Construct algorithms using two-dimensional arrays.

Many methods that operate on two dimensional arrays tend rely on nested “for” loops as you need to methodically go through all rows and columns. For example in order to print out each item above example array the following steps would need to be followed.

Note that two dimensional arrays are built from an “array of arrays” approach. In practice this means that each row is an object and in fact each row can be given a different length (though this is rarely used).

int array [ ] [ ] = new int [4] [3];

for (int r = 0, i<array.length, r++)

{

for (int c = 0, i<array[r].length, c++)

{

System.out.print(" " + array[r][c]);

{

System.out.println("");

}

### 5.1.6 Describe the characteristics and applications of a stack.

Think of a pile of plates or coins – you start off on the empty table top and add (push) the first coin onto the pile (stack). The next coin goes on top of the first and so on. When you remove (pop) a coin off the stack, the top one comes off first, then the next and so on until the stack is empty. In a computer program, data (rather than coins) are pushed onto and popped off stacks as needed. Each item of data is known as an element. Therefore a stack is a LIFO (Last In First Out) structure, because the last element pushed onto the stack is always the first element popped off. See http://www.ece.cmu.edu/~koopman/stack_computers/sec1_2.html#121 for more details of the overall concept of stacks.

#### Parameter storage and subprogram return addresses

When a program calls another program or function, the called function can also call another function. When the first program is called, the current program needs to be temporarily suspended and its return address and the values of the parameters it is using kept so they can be used when the called function returns.

To do this the return address and the contents of the parameters are placed in a stack in memory. As subsequent calls are made, the next return address and set of parameters are PUSHed onto the stack. When a function stops executing, the return address is POPped off the stack, and the program continues.

#### Interrupt handling

Interrupts work in the same way as above. When the CPU detects interrupt operations, the current state of the computer must be saved. The contents of variables and return addresses are placed on the stack. When the interrupt has been handled, the computer returns to where it was by POPping the data from the stack.

#### Evaluation of arithmetic expressions

To see why stacks are well suited to expression evaluation, consider how the following arithmetic expression would be computed:

X = (A + B) * (C + D)

First, A and B would be added together. Then, this intermediate result is pushed onto the expression evaluation stack. Next, C and D are added and the result is also pushed onto the expression evaluation stack. Finally, the top two stack elements (A+B and C+D) are multiplied and the result is stored in the variable X. The expression evaluation stack provides automatic management of intermediate results of expressions, and allows as many levels of precedence in the expression as there are available stack elements.

See 5.1.10

### 5.1.8 Describe the characteristics and applications of a queue.

The primary use of a queue is to ensure the service that is being queued for is given first to those that joined the queue first. This is known as a First In First Out structure (FIFO). Some examples of how queues are used in computer science are as follows:

#### Keyboard queue

The keyboard buffer into which characters are stored as they are pressed operates as a queue. The first letter typed is the first letter sent. Subsequent letters are added at the tail of the buffer.

#### Printer queues

Requests for printing are sent and the fist sent is the first printed. A printer queue stores the requests as they arrive.

#### Customer queue simulations

Check-outs and traffic lights are good examples of queues. The aim is to minimize the wait time on queues and to minimize the number of queues to keep the costs down. For more complex systems computer programs are written to simulate these queues and to predict the behavior of queues under certain conditions in order to maximize efficiency and cost.

See 5.1.10

### 5.1.10 Explain the use of arrays as static stacks and queues.

#### Arrays as stacks

Linear arrays provide a good way of implementing a stack. You already know that a linear array consists of a collection of elements, each one of which has an index number (starting at zero)

 0 1 2 3 4

To make it easier to visualise a stack we will represent the array vertically:

 4 3 2 1 0

To get the array to act as a stack we need to have some way of keeping track of which element contains the topmost item. This can be easily achieved using an int variable (“top” in the diagrams). An empty stack could be represented by -1 (zero would be a stack with one item). Attempting to pop an item off an empty stack would result in a stack underflow. Similarly, attempting to push an item onto a full stack results in stack overflow.

Empty stack top ==-1
 4 3 2 1 0
Threeitems pushed top ==2
 4 3 2 D 1 E 0 A
Full stack top ==4
 4 T 3 F 2 D 1 E 0 A
One itempopped top ==3
 4 T 3 F 2 D 1 E 0 A

Notice that when an item is popped off the stack, it is not actually deleted from the array element, it will simple be over-written by a new item being pushed onto the stack.

Thus to push an item onto the stack, the logical steps would be:

· Test for a full stack (don’t allow elements to be pushed onto a full stack)

If stack is not full..

· Get the value to be stored.

· Increment top

· store value in stack[top].

To pop an item off stack

· Test for an empty stack (don’t allow elements to be popped off an empty stack)

If stack is not empty..

· Retrieve value at stack[top]

· Decrement top

#### Arrays as linear queues

There are two ways of implementing a queue using an array – a linear queue and a circular queue.

Linear queues are slightly easier to visualize and code, but circular queues are more efficient as they avoid the need to move data long the array as items are dequeued from the head.

There are a number of possible strategies for implementing an array as a linear queue; the example shown is merely one such possibility.

The head of the queue is always element zero (which may be effectively empty if there are no items in the queue). The tail of the queue grows from zero as the queue is filled – thus a full queue is signaled by tail being the last element in the queue. In this example, that is element 4 – more generically the index of the last element in a Java array is always the size of the array -1. In a similar way to the stack top variable, we can store the position of the tail of the queue in an int variable. One way of flagging an empty queue would be to set the tail marker to -1 (zero would show that there was one item in the queue):

 tailMarker == -1 0 1 2 3 4

As items are enqueued, they are inserted into the array at tailMarker +1 and tailMarker is then updated to reflect the new position of the tail (or you could increment tailMarker first and then enqueue the data).

 tailMarker == 2 A D F 0 1 2 3 4

Once tailMarker reaches the index of the last element in the array, the queue is full. Any further enqueuing operation would result in an overflow error.

 tailMarker == 4 A D F G T 0 1 2 3 4

Items are always dequeued from array element zero (the head of the queue). Once an item has been dequeued, two things must happen. Firstly, all the other items in the queue must be shuffled “forward” by one element (i.e. moved into the element with the next lower element number). Secondly, tailMarker must be decremented to reflect the new position of the tail of the queue.

If decrementing tailMarker results in it having a value of -1, the queue is empty. Any further dequeueing operation will result in an underflow error.

#### Arrays as circular queues

Circular queues are more efficient as they avoid the need to shuffle the data along the array as items are dequeued from the head (the exception to this might be when promoting an item already in the queue to the head). The coding is a little fiddly, but not difficult once you get the logical steps clear – which after all is the secret of all programming…

To visualise an array as a circular queue it might be helpful to imagine an array bent round into a circle:

To operate an array as a circular queue (queue[] might be a good identifier) we will need to have two int variables as markers, in this example we will name them head and tail.

##### Operations on a circular queue

Below is a summary of the methods needed to operate a circular queue. It is assumed that a char queue of order size has been instantiated (i.e. char queue = new char[size];). The terms inputChar and outputChar will be used to refer to items queued and dequeued.

The methods are presented as pseudocode to clarify the logical steps needed.

###### New and empty queues:

Void method to initialise a queue as “empty” (applied to new queues or queues where the last item has just been dequeued). A good strategy is to flag an empty queue by setting head and tail to -1.

initialiseQueue: head gets -1 ; tail gets -1;

Boolean method that returns true if the queue is empty

isEmptyQueue: If head == -1 and tail == -1 then return true

else return false;

###### Enqueueing and dequeueing items:

A new queue should be initialized by setting head and tail to -1.

Testing that head and tail are equal to -1 is also a good way of verifying that an existing queue is empty.

The first item enqueued should go into element 0. This item becomes both the head and the tail of the queue.

 0 A headtail 1 2 3 4 5 6 7 8 9

We can then carry on enqueueing items until the queue is full.

If the queue is full, we can’t enqueue any more items.

Therefore we need a method to test for a full queue:

If head == 0 and tail == size-1 then return true;

 0 A head 1 B 2 C 3 D 4 E 5 F 6 G 7 H 8 I 9 J tail

As items are dequeued from the head, the head marker is incremented to reflect this.

The diagram shows the queue with three items dequeued (A, B , C).

We now have 3 elements available for use...

 0 A 1 B 2 C 3 D head 4 E 5 F 6 G 7 H 8 I 9 J tail

...so tail needs to “wrap around” to store the next item at queue[0]

(remember that this is a circular queue)

If tail == size-1 and queue is not full then tail gets 0;

queue[tail] gets inputChar;

 0 K tail 1 B 2 C 3 D head 4 E 5 F 6 G 7 H 8 I 9 J

Tail then increments as before, until we get to a full queue again.

When this happens, tail == head-1 so we need to add that condition to our test for a full queue.

 0 K 1 L 2 M tail 3 D head 4 E 5 F 6 G 7 H 8 I 9 J

One other particular condition that needs to be catered for is what happens to the queue when the last remaining item is dequeued.

If the queue contains one item only, both the head and the tail of the queue will be referencing the same element of the array (it could be any element in the array).

Once the last item has been dequeued, the queue is empty and should be initialized by setting the head and tail reference variables to -1

 0 K 1 B 2 C 3 D tail and head 4 E 5 F 6 G 7 H 8 I 9 J

###### Summary:

To operate a circular queue, in addition to methods to enqueue and dequeue items you will also need to have Boolean methods to test for a full queue, an empty queue and to detect the last item in the queue.

Linked lists will be examined at the level of diagrams and descriptions. Students are not expected to construct linked list algorithms using pseudocode.

### 5.1.11 Describe the features and characteristics of a dynamic data structure.

Dynamic data structures are able to increase and decrease in size as the need arises. This is done by allocating and deallocating memory space. The table below illustrates the features of a dynamic data structure comparing it with a static data structure such as an array.

 Dynamic data structure Static data structure Memory is allocated to the data structure dynamically as the program runs. Memory size is fixed and set at the time of compiling. Disadvantage. There may be overflow or underflow during runtime if allocations are exceeded. Advantage.Memory size is fixed and so there will be no problems during run time with memory allocations Advantage. Makes the most efficient use of memory as it is using only as much as it needs Disadvantage. As memory size is fixed there may be a lot of wasted memory space during the running of the program. Disadvantage. Harder to program as the program needs to keep track of data size and locations during the running of the program Advantage. Easier to program as there is no need to check on memory size or make use of algorithms that require links to new data.

### 5.1.12 Describe how linked lists operate logically.

A list is a data structure that allows data items to be stored. They can be of a fixed size such as arrays that are static or can be of a dynamic size. Linked lists are an example of a dynamic list data structure. In a linked list each item of data points to the next item of data. This is best illustrated diagrammatically. lol

A number of pointers are required in the singly linked list. The start or head pointer points to the first element in the list. The last node has a null pointer which is an empty pointer. Each other node has a pointer pointing to the next node in the list. In order to access the list we need to always start at the head. To move through the list we follow the pointers to the end until we encounter the null pointer. In a singly linked list we can only move one way through the list.

Typical operations on a list are as follows:

Delete/remove node

Identify next

Locate/search node

Test for empty

Imagine the example where we want to add the following of names into an alphabetically ordered list.

Rogério

William

Tomás

Miguel

António

The steps below highlight where the node needs to be inserted and gives a decription of the rearrangements of pointers that need to take place in the programming.

In order to find the correct position for each node the list is accessed at the head and the data compared to the node if necessary move to the next node and compare again. Repeat this until the correct position for insertion is found or until the end of the list is reached.

1. List is empty Rogério is inserted at the head. Head points to Rogério and Rogério points to null.

2. William is alphabetically after Rogério so is appended to the end of the list. .Rogério points to William and William points to null

3. Tomás is inserted between Rogério and William. Rogério points to Tomas and Tomas points to William .

4. António is inserted at the head of the list. Head points to António and António points to Rogério.

These steps can be represented by a flow diagram:

#### Deleting a node

The methods for deleting a node will vary depending on where that node is in a list.

Case 2: The node to be removed is an intermediate one

Move to node before node to be deleted.

Adjust Current node pointer to point to next.next node

Case 3: It is the last node to be removed

Move to node before the last one.

Adjust Current node to point to null

## Trees

Binary trees will be examined at the level of diagrams and descriptions. Students are not expected to construct tree algorithms using pseudocode. Tracing and constructing algorithms are not expected.

### 5.1.14 Describe how trees operate logically (both binary and non-binary).

Tree structures are very common in computing, the most familiar perhaps being the organisation of files and folders (or directories if you prefer) on a storage device.

The diagram shows a simplified folder structure – each folder contains one sub-folder and three files, except the lowest level folder, which only contains files. If we wanted to carry out an operation on all the files in all the folders (e.g. setting them all to read-only), the most efficient approach would be to use a recursive technique.

The following is a Java-ish pseudocode of the recursive algorithm that might be used. It assumes the existence of a Boolean method hasSubFolder and a method called setFiles which will set the attributes of all the files in a given folder to ro (read only) mode:

Root folder

public void setAttrib (Folder thisFolder)

{

if(hasSubFolder(thisFolder))

{

setAttrib(thisFolder.subFolder);

}

setFiles(thisFolder,ro);

}

• The method would initially be called with the root (top level) folder passed as a parameter. Since hasSubFolder would evaluate to true, the recursive call on line 5 to setAttrib will execute, passing the subfolder as a parameter. The first call will be pushed onto a stack.
• Since the second level folder also has a sub-folder, line 5 will execute again and make another call to setAttrib, passing the third level folder as a parameter. The second call will be pushed onto a stack.
• The third level folder has no subfolder, so this call will jump to line 7, set the attributes of all files in the third level folder to read-only and then complete.
• The second call will be popped of the stack and the algorithm continued from where it was interrupted by the recursive call (at the end of line 5). Line 7 will execute and this call will complete.
• Finally, the original call will be popped off the stack and resumed, executing line 7 and completing.
• So we now have all the files in all the folders processed, all calls completed and an empty stack.

In practice, folder tree structures are inevitably more complex than the very formalised example given here. A folder may have one or more subfolders or none; folders may have files or be empty. However, the recursive principles remain the same.

For the purposes of the Computer Science course, you need to be able to trace and construct recursive algorithms related to one particular tree structure, the Binary Tree.

Binary trees are a particular type of tree structure where every node in the tree has exactly two branches (hence binary). A representation of a single binary tree node is shown below. However large they are, binary trees are constructed simply by repeating this node:

Just like a doubly linked list node, a binary tree node consists of some data (or a reference to an object that contains data – more on that later) and two references to other nodes. These references are to the “left sub-tree” and “right sub-tree” respectively:

The best way to understand the anatomy of a binary tree is to build one, so the following section will briefly run through the logical process of constructing a binary tree of nodes that simply contain an integer as their data member.

Like all dynamic date structures, the first thing we need is an externally accessible reference. In trees, this is a reference to the tree’s root. Intially, treeRoot will reference NULL (no tree).

Adding the first node to the tree (known as the root node):

Binary trees use a simple rule – “less than” goes to the left, “more than” goes to the right. So if we add another node with the data value of 2, it will become the left sub-tree of 4.

Now add a node with the value of 3. This is less than 4, so we traverse left, but more than 2, so we traverse right...

Repeating that simple rule, < goes left, > goes right, if we then add 6, 1, 5, 7 in that order, we end up with a complete tree:

### 5.1.15 Define the terms: parent, left-child, right-child, subtree, root and leaf.

The vocabulary of binary trees borrows terminology from both natural trees (those big things with leaves and branches etc.) and family trees (the organisation of parents, children etc.). In the sample tree below, the node containing the value 4 is the Root of the whole tree.

If we take the node highlighted in red (containing the value 2), we can explore the rest of the vocabulary associated with trees.

• Node 2 is the child node of 4, but the parent node of 1 and 3.
• 2 is in 4’s left subtree.
• It is the root node of two subtrees, a left subtree containing 1 and a right subtree containing 3.
• It is a sibling node to 6 (sibling nodes are at the same level in the tree).
• 2’s two child nodes (1 and 3) are both leaf nodes as they have no subtrees.

Removing nodes from a tree can be a little more complex and is covered well here:

http://www.algolist.net/Data_structures/Binary_search_tree/Removal

### 5.1.16 State the result of inorder, postorder and preorder tree traversal.

Consider the following expression tree:

The expression can be seen as follows:

In order traversal would result in infix. Rule: A node is visited in-between its left subtree and right subtree (Left visited first).

a/b+c*d

Preorder traversal would result in prefix. Rule: A node is visited before its descendants.

+/ab*cd

Postorder traversal would result in postfix. Rule: A node is visited after its descendants.

ab/cd*+

### 5.1.17 Sketch binary trees. structure.

#### Binary tree as a decision tree

Here is a binary decision tree for fault diagnosis for a washing machine.

The non-leaf nodes are the questions and the leaf nodes are the advice - even if this is "no further advice is available".

Here is a binary decision tree for interpretation of an aerial photograph.

#### Tree as a directory structure

Trees can also be used to store folders and subfolders. This is common to all operating systems. Here is an example of the UNIX directory structure

All the files are grouped together in the directory structure. The file-system is arranged in a hierarchical structure, like an inverted tree. The top of the hierarchy is traditionally called root (written as a slash / )

## Applications

### 5.1.18 Define the term dynamic data

Dynamic data is data that changes when further updates to the data become available.

### 5.1.19 Compare the use of static and dynamic data structures.

 Dynamic data structure Static data structure Memory is allocated to the data structure dynamically as the program runs. Memory size is fixed and set at the time of compiling. Disadvantage. There may be overflow or underflow during runtime if allocations are exceeded. Advantage.Memory size is fixed and so there will be no problems during run time with memory allocations Advantage. Makes the most efficient use of memory as it is using only as much as it needs Disadvantage. As memory size is fixed there may be a lot of wasted memory space during the running of the program. Disadvantage. Harder to program as the program needs to keep track of data size and locations during the running of the program Advantage. Easier to program as there is no need to check on memory size or make use of algorithms that require links to new data.

### 5.1.20 Suggest a suitable structure for a given situation.

There are many situations where the size of the structure is not known beforehand. These cases lend themselves to dynamic data structures. An example of this is the programming of a printer spooler where the number of print jobs will vary and is not known beforehand.

Other situations may occur where the size of the data structure can be determined beforehand in this case a static data structure would be used.

## Now test yourself

https://quizlet.com/122628585/test/embed

Michael Brooke emjbe.net

Report Abuse

–Updated automatically every 5 minutes