-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpapers.py
More file actions
233 lines (182 loc) · 8 KB
/
Copy pathpapers.py
File metadata and controls
233 lines (182 loc) · 8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
"""Assignment 2: Modelling CS Education research paper data
=== CSC148 Summer 2022 ===
This code is provided solely for the personal and private use of
students taking the CSC148 course at the University of Toronto.
Copying for purposes other than this use is expressly prohibited.
All forms of distribution of this code, whether as given or with
any changes, are expressly prohibited.
All of the files in this directory and all subdirectories are:
Copyright (c) 2022 Bogdan Simion, David Liu, Diane Horton,
Haocheng Hu, Jacqueline Smith
=== Module Description ===
This module contains a new class, PaperTree, which is used to model data on
publications in a particular area of Computer Science Education research.
This data is adapted from a dataset presented at SIGCSE 2019.
You can find the full dataset here: https://www.brettbecker.com/sigcse2019/
Although this data is very different from filesystem data, it is still
hierarchical. This means we are able to model it using a TMTree subclass,
and we can then run it through our treemap visualisation tool to get a nice
interactive graphical representation of this data.
TODO: (Task 6) Complete the steps below
Recommended steps:
1. Start by reviewing the provided dataset in cs1_papers.csv. You can assume
that any data used to generate this tree has this format,
i.e., a csv file with the same columns (same column names, same order).
The categories are all in one column, separated by colons (':').
However, you should not make assumptions about what the categories are, how
many categories there are, the maximum number of categories a paper can have,
or the number of lines in the file.
2. Read through all the docstrings in this file once. There is a lot to take in,
so don't feel like you need to understand it all the first time.
Draw some pictures!
We have provided the headers of the initializer as well as of some helper
functions we suggest you implement. Note that we will not test any
private top-level functions, so you can choose not to implement these
functions, and you can add others if you want to for your solution.
For this task, we will be testing that you are building the correct tree,
not that you are doing it in a particular way. We will access your class
in the same way as in the client code in the visualizer.
3. Plan out what you'll need to do to implement the PaperTree initializer.
In particular, think about how to use the boolean parameters to do different
things in setting up the tree. You may also find it helpful to review the
Python documentation about the csv module, which you are permitted and
encouraged to use. You should have a good plan, including what your subtasks
are, before you begin writing any code.
4. Write the code for the PaperTree initializer and any helper functions you
want to use in your design. You should not make any changes to the public
interface of this module, or of the PaperTree class, but you can add private
attributes and helpers as needed.
5. Tidy and test your code, and try it with the visualizer client code. Make
sure you have documented any new private attributes, and that PyTA passes
on your code.
"""
import csv
from typing import List, Dict, Tuple
from tm_trees import TMTree
# Filename for the dataset
DATA_FILE = 'cs1_papers.csv'
class PaperTree(TMTree):
"""A tree representation of Computer Science Education research paper data.
=== Private Attributes ===
TODO: Add any of your new private attributes here.
These should store information about this paper's <authors> and <doi>.
_authors:
The authors of this paper
_doi:
The URL of the paper
=== Inherited Attributes ===
rect:
The pygame rectangle representing this node in the treemap
visualization.
data_size:
The size of the data represented by this tree.
_colour:
The RGB colour value of the root of this tree.
_name:
The root value of this tree, or None if this tree is empty.
_subtrees:
The subtrees of this tree.
_parent_tree:
The parent tree of this tree; i.e., the tree that contains this tree
as a subtree, or None if this tree is not part of a larger tree.
_expanded:
Whether or not this tree is considered expanded for visualization.
=== Representation Invariants ===
- All TMTree RIs are inherited.
"""
_authors: str
_doi: str
by_year: bool
all_papers: bool
def __init__(self, name: str, subtrees: List[TMTree], authors: str = '',
doi: str = '', citations: int = 0, by_year: bool = True,
all_papers: bool = False) -> None:
"""Initialize a new PaperTree with the given <name> and <subtrees>,
<authors> and <doi>, and with <citations> as the size of the data.
If <all_papers> is True, then this tree is to be the root of the paper
tree. In that case, load data about papers from DATA_FILE to build the
tree.
If <all_papers> is False, Do NOT load new data.
<by_year> indicates whether or not the first level of subtrees should be
the years, followed by each category, subcategory, and so on. If
<by_year> is False, then the year in the dataset is simply ignored.
"""
self._authors = authors
self._doi = doi
self.by_year = by_year
self.all_papers = all_papers
if all_papers:
nested_dict = _load_papers_to_dict(by_year)
subtrees = _build_tree_from_dict(nested_dict)
super().__init__(name, subtrees, citations)
def get_separator(self) -> str:
"""Return the file separator for this paper.
"""
return '/'
def get_suffix(self) -> str:
"""Return the final descriptor of this tree.
"""
return ' (paper)' if len(self._subtrees) == 0 else ' (category)'
def _load_papers_to_dict(by_year: bool = True) -> Dict:
"""Return a nested dictionary of the data read from the papers dataset file.
If <by_year>, then use years as the roots of the subtrees of the root of
the whole tree. Otherwise, ignore years and use categories only.
"""
dic = {}
with open(DATA_FILE) as csv_file:
file = csv.reader(csv_file, delimiter=',')
next(file)
for line in file:
authors = line[0]
name = line[1]
year = line[2]
categories = line[3].split(': ')
doi = line[4]
citations = int(line[5])
info = (authors, name, doi, citations)
if by_year and year not in dic:
dic[year] = {}
if by_year:
_process_info(dic[year], categories, info)
else:
_process_info(dic, categories, info)
return dic
def _process_info(old_d: Dict, categories: List[str],
paper: Tuple[str, str, str, int]) -> None:
"""
process the information given
"""
for cat in categories:
if cat not in old_d:
old_d[cat] = {}
old_d = old_d[cat]
if '' not in old_d:
old_d[''] = []
old_d[''].append(paper)
def _build_tree_from_dict(nested_dict: Dict) -> List[PaperTree]:
"""Return a list of trees from the nested dictionary <nested_dict>.
"""
trees = []
for key in nested_dict:
if key == '':
papers = nested_dict['']
trees.extend(PaperTree(paper[1], [], paper[0], paper[2], paper[3])
for paper in papers)
else:
tree = _build_tree_from_dict(nested_dict[key])
trees.append(PaperTree(key, tree))
return trees
def find_al(outer: int, val: int):
"""
take the outer loop and insert it into the main loop
"""
i = 0
for i in range(len(outer)):
if
if __name__ == '__main__':
import python_ta
python_ta.check_all(config={
'allowed-import-modules': ['python_ta', 'typing', 'csv', 'tm_trees'],
'allowed-io': ['_load_papers_to_dict'],
'max-args': 8
})