Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

A2 — Treemaps: CSC148H1S 20231 (All Sections): Introduction to Computer Science

A2 — Treemaps

Due date: Tuesday, April 4th, at 1:00 pm sharp.

You may complete this assignment individually or with a partner, who can be from any section of the course.

FAQ

Please make sure to check the A2 FAQ post on piazza

(https://piazza.com/class/lci0mq5wy287iw/post/1192) . Clarifications and answers to common questions will be posted there.

Grade Breakdown

Category

Weight

python TA

10

self tests

20

hidden

tests

70

Learning goals

After completing this assignment, you will be able to:

model different kinds of real-world hierarchical data using trees

implement recursive operations on trees (both non-mutating and mutating)

implement an algorithm to generate a geometric tree visualisation called a treemap

use the  os library to write a program that interacts with your computer’s file system use inheritance to design classes according to a common interface

Introduction

As we’ve discussed in class, trees are a fundamental data structure used to model hierarchical data. Often, a tree represents a hierarchical categorization of a base set of data, where the leaves represent the data values themselves, and internal nodes represent groupings of this data. Files on a computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source code) are grouped into directories, which are then grouped into larger directories, etc.

Sometimes the data in a tree has some useful notion of size. For example, in a tree representing the departments of a company, the size of a node could be the dollar amount of all sales for that department, or the number of employees in that department. Or in a tree representing a computer’s file system, the size of a node could be the size of the file.

A treemap (http://en.wikipedia.org/wiki/Treemapping) is a visualisation technique to show a tree’s structure according to the weights (or sizes) of its data values. It uses rectangles to show subtrees, scaled to reflect the proportional sizes of each piece of data.

Treemaps are used in many contexts. Among the more popular uses are news headlines, various kinds of financial information, and computer disk usage. Some free programs use treemaps to visualize the size of files on your computer; for example:

WinDirStat (http://portableapps.com/apps/utilities/windirstat_portable) for Windows

Disk Inventory X (http://www.derlien.com/) for MacOS

KDirStat (http://kdirstat.sourceforge.net/) for Linux

For this assignment, you will write an interactive treemap visualisation tool that you can use to visualize hierarchical data.

It will have a general API (implemented with inheritance, naturally!) and you will define specific subclasses that will allow you to visualize different kinds of data.

And with that, let’s get started on your final CSC148 assignment!

Task 0: Getting started

1. Download and unzip the starter code (a2_starter_files.zip ()

(https://q.utoronto.ca/courses/292974/files/25399282/download?download_frd=1) ) into your directory for Assignment 2.

2. With  tm_trees.py open, take the time to read through the tasks below to get a sense of what code you’ll be writing throughout this assignment.

3. Familiarize yourself with the following coding guidelines:

You may NOT define any new public or private attributes.

You are always free to define private helpers (functions or methods).

Complete docstrings are not required for any helpers that you define or for methods that you

override or extend, but you are encouraged to take the time to write them.

Task 1: TMTree  basics

In this task, you will implement:

TMTree.__init__

TMTree.is_displayed_tree_leaf

TMTree.get_path_string

In order to use a treemap to visualize data, we first need to define our tree structure and be able to    initialize it. For this program, you will develop a  TMTree class to define the basic functionality required by the provided treemap visualizer. To visualize any specific hierarchical dataset, we simply need to   define a subclass of  TMTree , which we will do later.

The  TMTree class is similar to the basic  Tree class that you have already worked with, but with some differences. A few of the most conceptually important ones are discussed here.

First, the class has a  data_size instance attribute that is used by the treemap algorithm to

determine the size of the rectangle that represents this tree.  data_size is defined recursively as follows:

If the tree is a single leaf, its  data_size is some measure of the size of the data being

modelled. For example, it could be the size of a file.

If the tree has one or more subtrees, its  data_size is equal to the sum of the  data_size s of its

subtrees, plus some additional size of its own.

Second, the class not only stores its subtrees, but it also stores its parent as an attribute. This

allows us to traverse both up and down the tree, which you will see is necessary when we want to mutate the tree.

Third, well also track some information inside our tree related to how to display it. More on this

shortly.

Fourth, we don’t bother with a representation for an empty  TMTree , as we’ll always be visualizing non-empty trees in this assignment.

To get started, you should do the following:

1. Complete the initializer of  TMTree according to its docstring. This will get you familiar with the attributes in this class.

2. Implement the  is_displayed_tree_leaf method according to its docstring and the definition of the displayed-tree (see below).

3. Implement the  get_path_string method according to its docstring.

In the subsequent parts of this assignment, your program will allow the user to interact with the      visualizer to change the way the data is displayed to them. At any one time, parts of the tree will be displayed and others not, depending on what nodes are expanded.

We’ll use the terminology displayed-tree to refer to the part of the data tree that is displayed in the visualizer. It is defined as follows:

The root of the entire tree is in the displayed-tree.

For each expandedtree in the displayed-tree, its children are in the displayed-tree.

The trees whose rectangles are displayed in the visualization will be the leaves of the displayed-tree. A tree is a leaf of the displayed-tree if it is not expanded, but its parent is expanded. Note that the root node is a leaf of the displayed-tree if it is not expanded.

We will require that (1) if a tree is expanded, then its parent is expanded, and (2) if a tree is not expanded, then its children are not expanded. Note that this requirement is naturally encoded as a representation invariant in the  TMTree class.

Note: the displayed-tree is not a separate tree! It is just the part of the data tree that is displayed.

Progress check!

After this step is done, you should be able to instantiate and print a  TMTree , as is done in the provided main block of  tm_trees.py .

In the next task, you will implement the code required for the treemap algorithm.

Task 2: The Treemap algorithm

In this task, you will implement:

TMTree.update_rectangles

TMTree.get_rectangles

The next component of our program is the treemap algorithm itself. It operates on the data tree and a 2-D window to fill with the visualization, and generates a list of rectangles to display, based on the     tree structure and  data_size attribute for each node.

For all rectangles in this program, we’ll use the pygame representation of a rectangle, which is a    tuple of four integers  (x, y, width, height) , where  (x, y) represents the upper-left corner of the    rectangle. Note that in pygame, the origin is in the upper-left corner and the y-axis points down. So,   the lower-right corner of a rectangle has coordinates (x + width, y + height). Each unit on both axes is a pixel on the screen. Below, we show a grid representing the rectangle  (0, 0, 8, 4) :

We are now ready to see the treemap algorithm.

Note: Well use sizeto refer to the  data_size attribute in the algorithm below.

Input: An instance of  TMTree , which is part of the displayed-tree, and a pygame rectangle (the display area to fill).

Output: A list of pygame rectangles and each rectangles colour, where each rectangle corresponds

to a leaf in the displayed-tree for this data tree, and has area proportional to the leaf’s  data_size . Treemap Algorithm:

Note I: Unless explicitly written as displayed-tree”, all occurrences of the word “tree” below refer to a data tree.

Note II: (Very Important) The overall algorithm, as described, actually corresponds to the combination of first calling to set the attribute of all nodes in the tree and then later calling to actually obtain the rectangles and colours corresponding to only the displayed-trees leaves.

1. If the tree is a leaf in the displayed-tree, return a list containing just a single rectangle that covers the whole display area, as well as the  colour of that leaf.

2. Otherwise, if the display area’s width is greater than its height: divide the display area into vertical rectangles (by this, we mean the x-coordinates vary but the y-coordinates are fixed), one smaller rectangle per subtree of the displayed-tree, in proportion to the sizes of the subtrees (don’t use    this tree’s  data_size attribute, as it may actually be larger than the sizes of the subtrees because of how we defined it!)

Example: suppose the input rectangle is (0, 0, 200, 100), and the displayed-tree for the input tree has three subtrees, with sizes 10, 25, and 15.

The first subtree has 20% of the total size, so its smaller rectangle has width 40 (20% of 200):

(0, 0, 40, 100).

The second subtree should have width 100 (50% of 200), and starts immediately after the first

one: (40, 0, 100, 100).

The third subtree has width 60, and takes up the remaining space: (140, 0, 60, 100). Note that the three rectangles’ edges overlap, which is expected.

3. If the height is greater than or equal to the width, then make horizontal rectangles (by this, we     mean the y-coordinates vary, but the x-coordinates are fixed) instead of vertical ones, and do the analogous operations as in step 2.

4. Recurse on each of the subtrees in the displayed-tree, passing in the corresponding smaller rectangle from step 2 or 3.

Important: To ensure that you understand the treemap algorithm, complete the A2 release activity

(https://q.utoronto.ca/courses/292974/assignments/963601) , double-check your answers

https://q.utoronto.ca/courses/292974/pages/a2-treemaps

(https://q.utoronto.ca/courses/292974/files/25397463?wrap=1)

(https://q.utoronto.ca/courses/292974/files/25397463/download?download_frd=1) , and ask clarifying questions as needed.

Note about rounding: because you’re calculating proportions here, the exact values will often be      floats instead of integers. For all but the last rectangle, always truncate the value (round down to the nearest integer). In other words, if you calculate that a rectangle should be (0, 0, 10.9, 100), “round” (or truncate) the width down to (0, 0, 10, 100). Then a rectangle below it would start at (0, 100), and a rectangle beside it would start at (10, 0).

However, the final (rightmost or bottommost) edge of the last smaller rectangle must always be equal to the outer edge of the input rectangle. This means that the last subtree might end up a bit bigger     than its true proportion of the total size, but that’s okay for this assignment.

Important followup to Note II above: You will implement this algorithm in the method in . Note that rather than returning the rectangles for only the leaves of the displayed-tree, the code will instead set the attribute of each node in the entire tree to correspond to what its rectangle would be if it were a leaf in the displayed-tree. Later, we can retrieve the rectangles for only the leaf nodes of the displayed-tree by using the method.

Note: Sometimes a  TMTree will be sufficiently small that its rectangle has very little area (it has a      width or height very close to zero). That is to be expected, and you don’t need to do anything special to deal with such cases. Just be aware that sometimes you won’t be able to actually see all the         displayed rectangles when running the visualizer, especially if the window is small.

Tips: For the recursive step, a good stepping stone is to think about when the tree has height 2, and   each leaf has the same  data_size value. In this case, the original rectangle should be partitioned into equal-sized rectangles.

Warning: make sure to follow the exact rounding procedure in the algorithm description above. If you deviate from this, you might get slightly incorrect rectangle bounds.

Important: The docstring of the  get_rectangles method refers to the displayed-tree. For now, the       whole tree is the displayed-tree, as all trees start expanded. Make sure that you come back later and further test this method on trees where the displayed-tree is a subset of the full data tree to ensure     that your code is fully correct.                                                                                                                  Progress check!                                                                            Note that the basic treemap algorithm does not require pygame or the visualizer at all! You can check your work simply by instantiating a  TMTree object, calling  update_rectangles on it, then calling            get_rectangles to see the list of rectangles returned.

This is primarily how we’ll be testing your implementation of the treemap algorithm!

At this point, you can now run  treemap_visualiser.py and see your treemap. By default, it will display the example from the worksheet, which should look something like below (with random colours of      course). Note: the bar across the bottom is where the path string of the selected rectangle will appear once you complete the next task.

Next, you will implement the method necessary to enable the selection of rectangles in the visualizer.

Task 3: Selecting a tree

In this task, you will implement:

TMTree.get_tree_at_position

The first step to making our treemap interactive is to allow the user to select a node in the tree. To do this, complete the body of the  get_tree_at_position method. This method takes a position on the      display and returns the displayed-tree leaf corresponding to that position in the treemap.

IMPORTANT: Ties should be broken by choosing the rectangle on the left for a vertical boundary, or the rectangle above for a horizontal boundary. That is, you should return the first leaf of the             displayed-tree that contains the position, when traversing the tree in the natural order.

In the visualizer, the user can click on a rectangle to select it. The text display updates to show two things:

the selected leafs path string returned by the  get_path_string method.

the selected leafs  data_size

Clicking on the same rectangle again unselects it. Clicking on a different rectangle selects that one instead.

Reminder: Each rectangle corresponds to a leaf of the displayed-tree, so we can only select leaves of the displayed-tree.

Progress check!

Try running the program and then click on a rectangle on the display. You should see the information for that displayed-tree leaf shown at the bottom of the screen, and a box appear around the selected rectangle. Click again to unselect that rectangle, and try selecting another one. Note that you will also see a thinner box appear around any rectangle that you hover over in the display.

For the worksheet example, it should look something like the below, if you have selected rectangle “d” and are not hovering over any rectangle:

Note: As you run the visualizer, you will see output in the python console in PyCharm. This output is to help you see what events are being executed. You may find this information useful when              debugging your code. Feel free to modify anything that is being printed in  treemap_visualiser.py if  you find it helpful to do so.

Task 4: Making the display interactive

In this task, you will implement:

TMTree.expand

TMTree.expand_all

TMTree.collapse

TMTree.collapse_all

TMTree.change_size

Note: Unless explicitly written as displayed-tree”, all occurrences of the word tree” below refer to a data tree.

Now that we can select nodes, we can move on to making the treemap more interactive.

In addition to displaying the treemap, the pygame graphical display responds to a few different events (by event, we mean the user presses a key, hovers, or clicks on the window). The first such event was actually clicking on a rectangle to select it, which we implemented in the previous task. We will now implement the rest of the actions associated with events, by implementing methods in the      TMTree class which the visualizer calls:

a. The user can close the window and quit the program by clicking the X icon (like any other window).

No additional code required.

b. The user can resize the window by clicking and dragging a corner or side of the window (like any other resizable window).

No additional code required.

The next four events change which rectangles are included in the displayed-tree.

c. If the user selects a rectangle, and then presses  e , the tree corresponding to that rectangle is expanded in the displayed-tree. If the tree is a leaf in the data tree, nothing happens, since we  can’t expand a leaf. When a tree is expanded, the visualization will make the first subtree of the newly expanded tree the new selected tree.

implement  TMTree.expand

d. If the user selects a rectangle, and then presses  c , the parent of that tree is unexpanded (or       “collapsed”) in the displayed-tree. (Note that since rectangles correspond to leaves in the              displayed-tree, it is the parent that needs to be unexpanded.) If the parent is None because this is the root of the whole tree, nothing happens. When a tree is collapsed, the visualization will make  its parent tree the new selected tree, as it is now a leaf of the displayed-tree.

implement  TMTree.collapse

e. If the user selects a rectangle, and then presses  a , the tree corresponding to that rectangle, as  well as all of its subtrees, are expanded in the displayed-tree. If the tree is a leaf, nothing              happens. When a tree is expanded in this way, the visualization will make the “last” subtree of the newly expanded trees the new selected tree. Note: The docstring for the relevant method clarifies what we mean by “last” .

implement  TMTree.expand_all

f. If the user selects any rectangle, and then presses  x , the entire displayed-tree is collapsed down to just a single tree node. If the displayed-tree is already a single node, nothing happens. When a tree is collapsed in this way, the visualization will make this single tree node the new selected       tree, as it is now the only leaf of the displayed-tree.

implement  TMTree.collapse_all

The following two events allow the user to actually mutate the data tree, and see the changes reflected in the display.

g. If the user presses the Up Arrow or Down Arrow key when a rectangle is selected, the selected node’s  data_size increases or decreases by 1% of its current value, respectively. Note: This   affects the  data_size of its ancestors as well, given our definition of  data_size !

implement  TMTree.change_size

Details:

A nodes  data_size cannot decrease below 1. There is no upper limit on the value of  data_size .

Just keep in mind that if a node is too large, then other nodes will appear relatively small in the visualization!

If a node has children, its  data_size cannot decrease below the sum of its child data sizes. The 1% amount is always rounded up” before applying the change. For example, if a leaf’s

data_size value is 140, then 1% of this is 1.4, which is rounded up” to 2. So its value could increase up to 152 142, or decrease down to 148 138. (Note: the docstring for the relevant  method provides similar clarification.)

h. If the user selects a rectangle, then hovers the cursor over another rectangle and presses  m , the selected node should be moved to be the last subtree of the node being hovered over.                 Remember: we can only select and hover over leaves of the displayed-tree.

implement  TMTree.move

Very Important: Whenever you modify a tree’s  data_size , the  data_size attributes of all its ancestors need to be updated too! You must make use of the  parent_tree attribute to do this: it’s a widely-used technique for traversing up a tree (this should feel a lot like traversing a linked list!).

Progress check!

At this point, your visualizer should be fully functional!

One of the nice things about code with an interactive display is that it’s usually pretty straightforward to test basic correctness.

Try running the treemap visualizer, and see if you can resize, move rectangles, and manipulate the displayed-tree as outlined above.

Of course, you can also write pytests to check the correctness of the methods you have written too! Next, we’ll write code to allow us to visualize the files on your computer!

Task 5: File System Data

In this task, you will implement:

path_to_nested_tuple

dir_tree_from_nested_tuple

You will also:

override any methods necessary for your subclasses of  TMTree to behave as specified below. decide on an appropriate inheritance hierarchy for the  FileTree and  DirectoryTree classes.

Consider a directory on your computer (e.g., the  assignments directory you have in your  csc148      directory). It contains a mixture of other directories (“subdirectories”) and some files; these                  subdirectories themselves contain other directories, and even more files! This is a natural hierarchical structure, and can be represented by a tree.

The  _name attribute will always store the name of the directory or file (e.g.  preps or  prep8.py ), not its path; e.g.,  /Users/courses/csc148/preps/prep8/prep8.py .

The  data_size attribute of a file is simply how much space (in bytes) the file takes up on the            computer. Note: to prevent empty files from being represented by a rectangle with zero area, we will always add a +1 to file sizes. The  data_size of a directory corresponds to the size of all files            contained in the directory, including its subdirectories. As with files, we’ll also add a +1 to directory   sizes for this assignment.

In our code, we will represent this filesystem data using the  FileTree and  DirectoryTree classes.

1. Complete the  path_to_nested_tuple function according to its docstring.

2. Complete the  dir_tree_from_nested_tuple function. This function is the public interface for how we will create  DirectoryTree objects to test your code for correctness.

3. Override or extend methods from  TMTree , as appropriate, in the  FileTree and  DirectoryTree   classes. You will need to decide if you can just use the inherited methods as they are or not (see additional requirements below).

To complete step 1, you will need to read the Python module documentation (https://docs.python.org/3/library/os.html) to learn how to get data about your computer’s files in a     Python program. You may also find the provided  print_dirs.py to be a useful example to base your implementation on.

We recommend using the following functions:

os.path.isdir

os.path.join

os.path.getsize

os.path.basename

os.listdir (see note below) <