Skip to content

Data Visualizer

TYS2 edited this page Apr 7, 2026 · 22 revisions
1

The Data Visualizer tool is a tool that allows the rendering of data in a convenient, understandable manner, via box-and-pointer diagrams.

draw_data is the Source function corresponding to the use of the Data Visualizer tool.

How to use

draw_data is a varargs function: multiple data structures can be used as parameters in a single draw_data call.

For example, the following call of draw_data will result in subsequent drawing generated, containing both structures in the call.

draw_data(list(1, 2, 3), list(4, 5, 6));
2

Meanwhile, each individual call of draw_data maps to an individual drawing, which can be stepped through using the "Previous" and "Next" buttons.

For example, the following calls of draw_data will result in the subsequent drawings generated at once.

draw_data(list(1, 2, 3));
draw_data(list(4, 5, 6));
3 4

Code structure

The data visualizer code resides in src/features/dataVisualizer.

src/features/dataVisualizer/
- drawable/ # low-level Konva/React components for visual primitives
  - Drawable.ts # re-exports all drawable components
  - ArrayDrawable.tsx # draws the box structure for a pair/array node, and renders inline text for primitive children when possible
  - ArrowDrawable.tsx # draws a forward arrow from a parent node to a child node
  - BackwardArrowDrawable.tsx # draws a special routed arrow back to an already-drawn node, used for shared/cyclic references
  - FunctionDrawable.tsx # draws the visual representation of a function object as two circles
  - NullDrawable.tsx # draws the diagonal slash used for an empty tail/null box

- tree/ # parses Source values into tree nodes, then renders them in different view modes
  - AlreadyParsedTreeNode.ts # placeholder node used when the same drawable node has already been parsed earlier
  - ArrayTreeNode.tsx # tree node for pairs/arrays; creates the cached drawable for an array node and its outgoing arrows
  - BaseTreeNode.ts # base TreeNode class with shared fields such as children and node position
  - BinaryTreeDrawer.tsx # renderer for binary-tree view; extends the original drawer with binary-tree-specific layout logic
  - DataTreeNode.tsx # leaf node for primitive data values (anything that is neither a function nor a pair/array)
  - DrawableTreeNode.tsx # abstract node class for nodes whose drawable should be cached and reused
  - FunctionTreeNode.tsx # tree node for function values; creates the cached function drawable and connecting arrow
  - GeneralTreeDrawer.tsx # renderer for general-tree view; extends the original drawer with general-tree layout logic
  - OriginalTreeDrawer.tsx # base renderer for the original box-and-pointer style view
  - Tree.tsx # converts Source data into a tree of nodes, memoizes shared structures/functions, and returns the appropriate drawer for the active mode
  - TreeNode.ts # re-exports the tree node classes

- Config.ts # drawing constants such as dimensions, spacing, stroke, fill, and arrow offsets
- dataVisualizer.tsx # public entry point and state manager; stores steps, tracks mode flags, detects tree shape, and triggers redraws
- dataVisualizerTypes.ts # lightweight type aliases for Source data and drawing steps
- dataVisualizerUtils.ts # helper predicates and formatting utilities, including text conversion and pair/list checks
- list.js # Source-style pair/list helper library used by the visualizer for pair/list operations

Summary

The main outfacing code is present in dataVisualizer.tsx, which interacts with code in Tree.tsx.

dataVisualizer.tsx exposes init, drawData, clear, clearWithData, mode toggles, and redraw. Internally, it stores a list of drawing steps, where each step corresponds to one draw_data call, and each element inside that step corresponds to one argument passed into that call. It also tracks the current rendering mode, remembers previously drawn inputs for redraw, and performs shape checks such as whether the incoming structure can be treated as a binary tree or a general tree.

Tree.tsx is responsible for turning an input Source structure into an internal tree of nodes. It distinguishes among arrays or pairs, functions, and primitive data; memoizes drawable nodes for repeated structures and functions; and emits AlreadyParsedTreeNode when a previously seen structure is encountered again. Once parsing is done, its draw() method chooses the correct drawer class based on the currently selected mode: OriginalTreeDrawer, BinaryTreeDrawer, or GeneralTreeDrawer.

The tree/ folder has two responsibilities.

The node classes: BaseTreeNode, DataTreeNode, DrawableTreeNode, ArrayTreeNode, FunctionTreeNode, AlreadyParsedTreeNode represent parsed values and cached drawables. The drawer classes: OriginalTreeDrawer, BinaryTreeDrawer, GeneralTreeDrawer are responsible for layout and rendering for each view mode. BinaryTreeDrawer and GeneralTreeDrawer both extend OriginalTreeDrawer, rather than replacing the entire drawing pipeline from scratch.

The drawable/ folder contains only the small reusable visual pieces used by the tree nodes and drawers.

ArrayDrawable renders the box itself. ArrowDrawable handles normal parent-to-child arrows, while BackwardArrowDrawable handles arrows that point back to an already drawn node. FunctionDrawable renders the two-circle function symbol, and NullDrawable renders the diagonal slash for an null tail. Drawable.ts simply re-exports these components.

Config.ts, dataVisualizerTypes.ts, dataVisualizerUtils.ts, and list.js are support files. Config.ts standardises drawing dimensions and styling constants. dataVisualizerTypes.ts defines the aliases used across the module, such as Data, Pair, List, Drawing, and Step. dataVisualizerUtils.ts contains text formatting and type guards such as isArray, isFunction, isPair, isList, and isEmptyList, while also re-exporting head and tail. list.js is the underlying Source-style pair or list utility library used for those checks and operations.

What happens in a single "run" of the code in Source Academy is:

  1. WorkspaceSaga cleans up the environment and calls DataVisualizer.clearWithData to fully reset the data visualizer before a fresh run. This clears both the currently displayed drawings and any previously saved input history from earlier runs.
  2. Each call to draw_data is handled by DataVisualizer.drawData. That call produces one Step, which is an array of drawings, one for each data structure in the varargs input. The new Step is appended to the internal steps list, and DataVisualizer updates the side content React component by calling setSteps with the updated list.
  3. As the program runs, the side content React component receives the growing array of Steps. By the end of all the draw_data calls, it holds the full array, where each Step corresponds to one draw_data call made during the run.

View modes

There are 3 view modes available, the Original mode, the Binary Tree mode and the General Tree mode. The code for the checkboxes for each mode is written in SideContentDataVisualizer.tsx. The react checkbox component is purely for status only, but is not set to disabled to reduce confusion. The mode is changed using onMouseUp to ensure that any change in mode will change the rendered image instantly.

Original mode

This is the default view mode which shows only the box and pointer diagrams without any additional spacing, formatting or colour.

draw_data(list(1, list(2, null, null), list(3, null, null)));
posterA

Binary Tree mode

This is the binary tree view mode which shows the binary tree representation of a valid binary tree input, as per the following definition of a binary tree, and using the structure of a 3-tuple input as written in Source Academy's binary_tree module.

  • A binary tree of a certain data type is either null, or it is a list with 3 elements: the first being an element of that data type, and the remaining being trees of that data type.
  • Structure of a 3-tuple input: (value: any, left: BinaryTree, right: BinaryTree)

Each node in the tree comprises of group of 3 boxes:

  • A box containing the node's value
  • A box from which the left subtree originates
  • A box from which the right subtree originates

These 3 boxes are closely arranged in a triangular node group. The box containing the value is at the top of the node group, with the boxes pointing to the left and right subtrees at its bottom left and right, respectively.

For example, consider the following data visualisation.

draw_data(list(1, list(2, null, null), list(3, null, null)));
posterB

The tree has a root node with a value of 1, and it also has a left subtree and a right subtree. The left subtree has a parent node with value 2, while the right subtree has a parent node with value 3.

General Tree mode

This is the general tree view mode which shows the tree representation (left aligned) of a valid tree input, as per the following definition of a tree:

  • A tree of a certain data type is either null, or it is a list whose elements are of that data type, or trees of that data type.

For example, consider the following data visualisation.

draw_data(list(1, list(2), list(3), list(4)));
posterC

This is equivalent to a ternary tree, whose root node has a value of 1, with 3 child nodes with values 2, 3 and 4.

Spacing

There are 3 steps to generating space in the visualizer.

  1. Creating the entire visual canvas (the dark blue backdrop)
  2. Setting the offset from the top left of the visual canvas, from which the data will begin drawing from
  3. Draw the data

Both are done through draw() in the respective view modes. The visual canvas is created through the Stage while the offset is set through the Layer.

Example in GeneralTreeDrawer.tsx:

return (
  <Stage
    key = {key}
    width = {(Config.NWidth + Config.BoxWidth) * (DataVisualizer.longestNodePos + 1) - Config.BoxWidth + x * 2}
    height = {this.downCOUNTER * Config.BoxHeight * 4 + Config.BoxHeight + y * 2}>
    <Layer 
      key = {x + ', ' + y}
      offsetX = {0} 
      offsetY = {this.minY}>
      {this.drawables}
    </Layer>
  </Stage>
);

For the tree view modes, drawing the data involves deliberate calculations to be done to determine the specifications of the tree structure, such as its depth, how much it stretches left or right, and the index of any node at any point in the tree, among others.

The following sections explain specific fields that appear in various files, specially created for the purpose of tree generation in the tree view modes.

structures

structures is an Array of data that is passed into the dataVisualizer as a parameter. If structures represent a pair, structures would have length 2, with structures[0] representing the value of the head, while structures[1] represent the tail.

dataRecords (in dataVisualizer.tsx)

Keeps a copy of all inputs to ensure that when another view mode is chosen, all the instances of draw_data are redrawn.

TreeDepth, nodeCount, nodePos, longestNodePos (in dataVisualizer.tsx)

The input data is initially iterated through once to get the maximum depth of the tree. This is done through traversing the input array. Whenever the first element of the array or any subarrays is another nested array, the recursion increases the depth by 1. When dataVisualizer.initializeTreeMetaData() reaches the end of its recursion, the final depth found is compared to the maximum depth of the tree so far and the maximum depth, saved as TreeDepth, is updated accordingly.

For the General Tree mode, as the graph is left-aligned, nodeCount keeps track of a new node's position from the left, for each level of the tree. This is done through dataVisualizer.initializeTreeMetaData(). During each iteration of analysing the input data, the current count of nodes at that level is pushed into the end of the array representing the current node (structures). When this input data is converted from Data[] into nodes (Tree.constructTree), the 2nd last value of the array is popped and inserted into the nodePos field of the node, which is then accessed subsequently to space out the nodes. (The last value of the array is colorIndex as explained later.)

Meanwhile, in dataVisualizer.initializeTreeMetaData(), the longestNodePos is also updated if necessary to keep track of the maximum number of nodes in any level for the tree, so as to appropriately generate sufficient width of the visual canvas.

leftCOUNTER, rightCOUNTER, downCOUNTER (in OriginalTreeDrawer.tsx)

For the Binary Tree mode, it is necessary to identify how far the tree stretches left / right away from the centre (the root node), in order to generate sufficient space to show the tree in the visualizer itself.

As the tree is being rendered box by box, the field leftCounter is incremented whenever a new node group is created towards the left of the root node, and is further left than any previous node. Similarly, the field rightCounter is incremented whenever a new node group is created towards the right of the root node, and is further right than any previous node. Lastly, the field downCounter is incremented whenever a new node group is created below the root node, and is further down than any previous node.

These 3 fields are used in the subsequent calculations of the variables EY1 and EY2, used in the generation of space in the visualizer for the Binary Tree mode. The downCounter is also used in the generation of space in the visualizer for the General Tree mode.

scalerV (in BinaryTreeDrawer.tsx)

For the Binary Tree mode, in order to make the tree appear compact, the horizontal spacing between distinct node groups should be inversely proportional to level of these node groups, i.e. the larger / deeper the level in the tree, the closer the node groups.

This is done through a scalerV, applied to the boxes when they are being rendered.

Example in BinaryTreeDrawer:

if (index === 0 && y === parentY + Config.DistanceY) {
  myY = y + Config.DistanceY * 2;
  myX = x - Config.NWidth * scalerV;
  OriginalTreeDrawer.colorCounter++;
  colorIndex = OriginalTreeDrawer.colorCounter;
}

Since scalerV should be inversely proportional to the level of the node groups, the calculation for scalerV is equivalent to:

  • 2depth of tree divided by 2current level

This way, as the current level increases (going down the tree), the resultant scalerV decreases. The current level can be determined by dividing the y value of the box to be rendered by 6 * Config.BoxHeight, which is the amount of height used by each node group + vertical spacing between levels.
Powers of 2 are used to appropriately space the binary tree, given that each node group can have 2 subtrees.

Equation for scalerV:

let scalerV = Math.round(
  Math.pow(2, DataVisualizer.TreeDepth) /
  Math.pow(2, Math.round(y / (6 * Config.BoxHeight))));

EY1, EY2 (in BinaryTreeDrawer.tsx)

Purpose of the EY Variables:

  • EY1: Get the maximum of the fields leftCounter and rightCounter.
  • EY2: Used to set the horizontal width for Binary Tree mode.
    Due to scalerV, as one goes lower down the tree, the horizontal spacing between the distinct node groups decreases, allowing the tree to appear compact. This decreasing space is equivalent to decreasing powers of 2 * Config.NWidth as explained in the section for scalerV.
    Thus, to calculate how much offset is required before generating the tree, it is equivalent to: 21 + 22 + 23 + ... + 2EY1-1. This is a sum of a finite geometric progression with first term 2, common ratio 2, and (EY1-1) terms. Hence, using the formula for the sum of a finite geometric progression, we get the following equation for EY2:
EY2 = 2 * (Math.pow(2, EY1 - 1) - 1) + 1;

node.children, originIndex (in GeneralTreeDrawer.tsx)

After the input data has been converted into nodes, they are accessed through recursive calls of draw_Node(), which make use of the fields packaged within each node to determine its position to be rendered.

For example, the following images show a particular node with value "4", and its corresponding ArrayTreeNode. 9 10

Each node only has 2 children, with index 0 and 1, which refer to the head and tail of each node. By making use of this fact, we are able to determine whether the current node either has (1) a pointer to another node in its head, or (2) a pointer to another node in its tail. Depending on these cases, the subsequent node will be rendered on the next level or on the same level as the current node, respectively. The left margin created (since the General Tree mode is left-aligned) makes use of the appropriately calculated originIndex as reference, which itself uses the nodePos saved within each node.

Example usage:

if (node.children![1] instanceof ArrayTreeNode) {
  if (node.children![1].children![0] instanceof ArrayTreeNode) {
    originIndex = node.children![1].children![0].nodePos;
    originX = 0 + this.leftMargin + (Config.NWidth + Config.BoxWidth) * originIndex;
  }
}

Coloring

Under ArrayTreeNode.tsx, there exists a class field which consists of an array of colors. The color that is rendered is determined by the colorIndex parameter that is passed into createDrawable, which will find the appropriate color by using this.Colors[colorIndex % this.Colors.length]. The colorIndex of -1 is specially used to represent black color.

Original view

All the boxes are by rendered in black color, hence all of the colorIndex passed into createDrawable is -1.

Binary Tree and General Tree mode

Only the root node is black color. For other nodes, the colorIndex of the leftmost node increases by one each level. Within a level, the colorIndex increases by 1 for the next node on the right. This creates an effect where the colors for each level is offset by 1, reducing the chance of color collisions.

Example of tree coloring for a simple B-Tree: 8

nodeColor (in dataVisualizer.tsx)

This coloring logic is achieved through the first traversal of the tree in DataVisualizer.initializeTreeMetaData(). nodeColor is used to keep track of the previous color to assigned to a node at that level. When it reaches a new level that is not explored yet (ie. this.nodeColor[depth] == undefined), it sets the starting color index to the depth it is on. If it is on a new node, it will also add 1 to nodeColor at that depth. This is then pushed to the end of the array representing that box (structures). That value will then be recorded in the instance field of a node similar to nodePos above.

To reserve the black color for only the root node, nodeColor[0] is set to -1 to ensure that all nodes at level 0 is black.

Tree verification

Binary Tree mode

The input data would be checked to ensure that it is a binary tree using isBinaryTree(). This is done by recursively checking if every node is made up of 3 boxes. If the given input is not a binary tree and the binary tree mode is selected, an error would be shown.

General Tree mode

The input array would be iterated through to ensure that the length of nested arrays, checking if their size exceed 2. This is because trees are list, and lists are stored as pairs, hence the size of the input array and nested arrays should be less than 2.

Further improvements

  • Converting this Data Visualizer into a Source Academy module could be considered, after all, the tree modes rely on valid inputs that follow the existing binary_tree module.
  • How trees are rendered in the General Tree mode could be further studied, to potentially implement a variable spacing algorithm that allows the root node to be centralised and the appropriate amount of space to be generated between adjacent nodes in tree, such that nodes in lower levels further down the tree still no not overlap. (currently is left-aligned to allow for a fixed, compact layout that is easy to follow)
  • Creating a Stepper for the tree modes could be implemented, that allows a learner to "step through" the nodes of the tree, highlighted them in order based on a specific DFS tree traversal: In-order, Pre-order or Post-order.

Clone this wiki locally