File tree Expand file tree Collapse file tree
slides/_freeze/chapters/10_data_ops/execute-results Expand file tree Collapse file tree Original file line number Diff line number Diff line change 22 "cells" : [
33 {
44 "cell_type" : " markdown" ,
5- "id" : " c8b7476a " ,
5+ "id" : " e049e39d " ,
66 "metadata" : {},
77 "source" : [
88 " # Exercise: exploring a new table\n " ,
1515 {
1616 "cell_type" : " code" ,
1717 "execution_count" : null ,
18- "id" : " 10dc3dd9 " ,
18+ "id" : " 5a34e0a7 " ,
1919 "metadata" : {},
2020 "outputs" : [],
2121 "source" : [
2626 },
2727 {
2828 "cell_type" : " markdown" ,
29- "id" : " b95a5029 " ,
29+ "id" : " f01d2a92 " ,
3030 "metadata" : {},
3131 "source" : [
3232 " Now use the skrub `TableReport` and answer the following questions:"
3535 {
3636 "cell_type" : " code" ,
3737 "execution_count" : null ,
38- "id" : " a205456e " ,
38+ "id" : " 279facf2 " ,
3939 "metadata" : {},
4040 "outputs" : [],
4141 "source" : [
4747 },
4848 {
4949 "cell_type" : " markdown" ,
50- "id" : " 23c471c3 " ,
50+ "id" : " b41135ea " ,
5151 "metadata" : {},
5252 "source" : [
5353 " ## Questions\n " ,
6060 " - Which columns have an imbalanced distribution?\n " ,
6161 " - Which columns are strongly correlated with each other?\n " ,
6262 " \n " ,
63- " ```{.python}\n " ,
6463 " # PLACEHOLDER\n " ,
6564 " #\n " ,
6665 " #\n " ,
7170 " #\n " ,
7271 " #\n " ,
7372 " #\n " ,
74- " ```\n " ,
7573 " \n " ,
74+ " # %% [markdown]\n " ,
7675 " ## Answers\n " ,
7776 " - What's the size of the dataframe? (columns and rows)\n " ,
7877 " - 9228 rows × 8 columns\n " ,
9897 },
9998 {
10099 "cell_type" : " markdown" ,
101- "id" : " f20bde70 " ,
100+ "id" : " cd729648 " ,
102101 "metadata" : {},
103102 "source" : [
104103 " # Exercise: clean a dataframe using the `Cleaner`\n " ,
108107 {
109108 "cell_type" : " code" ,
110109 "execution_count" : null ,
111- "id" : " 1a512d31 " ,
110+ "id" : " 5bf185b2 " ,
112111 "metadata" : {},
113112 "outputs" : [],
114113 "source" : [
119118 },
120119 {
121120 "cell_type" : " markdown" ,
122- "id" : " 2d8454f4 " ,
121+ "id" : " 29c88f0c " ,
123122 "metadata" : {},
124123 "source" : [
125124 " Use the `TableReport` to answer the following questions:\n " ,
132131 {
133132 "cell_type" : " code" ,
134133 "execution_count" : null ,
135- "id" : " 50244f15 " ,
134+ "id" : " c3a0807e " ,
136135 "metadata" : {},
137136 "outputs" : [],
138137 "source" : [
143142 },
144143 {
145144 "cell_type" : " markdown" ,
146- "id" : " 03dcbdcb " ,
145+ "id" : " 15005ad9 " ,
147146 "metadata" : {},
148147 "source" : [
149148 " Then, use the `Cleaner` to sanitize the data so that:\n " ,
156155 {
157156 "cell_type" : " code" ,
158157 "execution_count" : null ,
159- "id" : " e78ad1a3 " ,
158+ "id" : " 2add235b " ,
160159 "metadata" : {},
161160 "outputs" : [],
162161 "source" : [
176175 {
177176 "cell_type" : " code" ,
178177 "execution_count" : null ,
179- "id" : " f7370994 " ,
178+ "id" : " f1f97ec9 " ,
180179 "metadata" : {},
181180 "outputs" : [],
182181 "source" : [
199198 },
200199 {
201200 "cell_type" : " markdown" ,
202- "id" : " 627265cd " ,
201+ "id" : " 72a1d694 " ,
203202 "metadata" : {},
204203 "source" : [
205204 " We can inspect which columns were dropped and what transformations were applied:"
208207 {
209208 "cell_type" : " code" ,
210209 "execution_count" : null ,
211- "id" : " eb157043 " ,
210+ "id" : " a94cbf09 " ,
212211 "metadata" : {},
213212 "outputs" : [],
214213 "source" : [
Original file line number Diff line number Diff line change 3030# - Which columns have an imbalanced distribution?
3131# - Which columns are strongly correlated with each other?
3232#
33- # ```{.python}
3433# # PLACEHOLDER
3534# #
3635# #
4140# #
4241# #
4342# #
44- # ```
45- #
43+ #
44+ # # %% [markdown]
4645# ## Answers
4746# - What's the size of the dataframe? (columns and rows)
4847# - 9228 rows × 8 columns
Original file line number Diff line number Diff line change 22 "cells" : [
33 {
44 "cell_type" : " markdown" ,
5- "id" : " d986e59e " ,
5+ "id" : " 64cbdacf " ,
66 "metadata" : {},
77 "source" : [
88 " # Exercise: using selectors together with `ApplyToCols`\n " ,
1212 {
1313 "cell_type" : " code" ,
1414 "execution_count" : null ,
15- "id" : " 5053fabc " ,
15+ "id" : " 97d637b0 " ,
1616 "metadata" : {
1717 "lines_to_next_cell" : 0
1818 },
2424 {
2525 "cell_type" : " code" ,
2626 "execution_count" : null ,
27- "id" : " ba2624c3 " ,
27+ "id" : " f2ea076c " ,
2828 "metadata" : {},
2929 "outputs" : [],
3030 "source" : [
4646 },
4747 {
4848 "cell_type" : " markdown" ,
49- "id" : " 31811f22 " ,
49+ "id" : " 66804095 " ,
5050 "metadata" : {},
5151 "source" : [
5252 " Using the skrub selectors and `ApplyToCols`:\n " ,
5959 {
6060 "cell_type" : " code" ,
6161 "execution_count" : null ,
62- "id" : " fadcb2e8 " ,
62+ "id" : " 1fb49ecd " ,
6363 "metadata" : {},
6464 "outputs" : [],
6565 "source" : [
8484 {
8585 "cell_type" : " code" ,
8686 "execution_count" : null ,
87- "id" : " 6be168d7 " ,
87+ "id" : " dc056e23 " ,
8888 "metadata" : {},
8989 "outputs" : [],
9090 "source" : [
103103 },
104104 {
105105 "cell_type" : " markdown" ,
106- "id" : " df81b2a5 " ,
106+ "id" : " 6ee9b01e " ,
107107 "metadata" : {},
108108 "source" : [
109109 " Given the same dataframe and using selectors, drop only string columns that contain\n " ,
113113 {
114114 "cell_type" : " code" ,
115115 "execution_count" : null ,
116- "id" : " c72984a5 " ,
116+ "id" : " 671dbeba " ,
117117 "metadata" : {},
118118 "outputs" : [],
119119 "source" : [
132132 {
133133 "cell_type" : " code" ,
134134 "execution_count" : null ,
135- "id" : " 22bf8031 " ,
135+ "id" : " d1075488 " ,
136136 "metadata" : {},
137137 "outputs" : [],
138138 "source" : [
143143 },
144144 {
145145 "cell_type" : " markdown" ,
146- "id" : " 3a7f1af8 " ,
146+ "id" : " 86430e36 " ,
147147 "metadata" : {},
148148 "source" : [
149149 " Now write a custom function that selects columns where all values are lower than\n " ,
153153 {
154154 "cell_type" : " code" ,
155155 "execution_count" : null ,
156- "id" : " ed9e22db " ,
156+ "id" : " 5e3efdf7 " ,
157157 "metadata" : {},
158158 "outputs" : [],
159159 "source" : [
172172 {
173173 "cell_type" : " code" ,
174174 "execution_count" : null ,
175- "id" : " 84bdfd4b " ,
175+ "id" : " 0f96a4e3 " ,
176176 "metadata" : {},
177177 "outputs" : [],
178178 "source" : [
189189 {
190190 "cell_type" : " code" ,
191191 "execution_count" : null ,
192- "id" : " e9219bd9 " ,
192+ "id" : " 7674715a " ,
193193 "metadata" : {},
194194 "outputs" : [],
195195 "source" : []
Original file line number Diff line number Diff line change 22 "cells" : [
33 {
44 "cell_type" : " markdown" ,
5- "id" : " f76fb6d0 " ,
5+ "id" : " f55abcb2 " ,
66 "metadata" : {},
77 "source" : [
88 " # Exercise\n " ,
1919 {
2020 "cell_type" : " code" ,
2121 "execution_count" : null ,
22- "id" : " 7fb3b7ef " ,
22+ "id" : " a326c03a " ,
2323 "metadata" : {},
2424 "outputs" : [],
2525 "source" : [
2929 {
3030 "cell_type" : " code" ,
3131 "execution_count" : null ,
32- "id" : " 021e2515 " ,
32+ "id" : " 08c44fa2 " ,
3333 "metadata" : {},
3434 "outputs" : [],
3535 "source" : [
5353 {
5454 "cell_type" : " code" ,
5555 "execution_count" : null ,
56- "id" : " 29f082fe " ,
56+ "id" : " 839700b0 " ,
5757 "metadata" : {},
5858 "outputs" : [],
5959 "source" : [
7777 {
7878 "cell_type" : " code" ,
7979 "execution_count" : null ,
80- "id" : " b9ede08c " ,
80+ "id" : " 65918722 " ,
8181 "metadata" : {},
8282 "outputs" : [],
8383 "source" : [
100100 {
101101 "cell_type" : " code" ,
102102 "execution_count" : null ,
103- "id" : " fb275e7c " ,
103+ "id" : " 4c8b4794 " ,
104104 "metadata" : {},
105105 "outputs" : [],
106106 "source" : [
120120 },
121121 {
122122 "cell_type" : " markdown" ,
123- "id" : " 290081f0 " ,
123+ "id" : " bf0c61a4 " ,
124124 "metadata" : {},
125125 "source" : [
126126 " Modify the script so that the `DatetimeEncoder` adds periodic encoding with sine\n " ,
130130 {
131131 "cell_type" : " code" ,
132132 "execution_count" : null ,
133- "id" : " 0c4b22e4 " ,
133+ "id" : " dbb9934b " ,
134134 "metadata" : {},
135135 "outputs" : [],
136136 "source" : [
153153 },
154154 {
155155 "cell_type" : " markdown" ,
156- "id" : " 28e1bb62 " ,
156+ "id" : " 94e6e6be " ,
157157 "metadata" : {},
158158 "source" : [
159159 " Now modify the script above to add spline features (`periodic_encoding=\" spline\" `).\n "
162162 {
163163 "cell_type" : " code" ,
164164 "execution_count" : null ,
165- "id" : " df44d5e7 " ,
165+ "id" : " bbed03e7 " ,
166166 "metadata" : {},
167167 "outputs" : [],
168168 "source" : [
188188 {
189189 "cell_type" : " code" ,
190190 "execution_count" : null ,
191- "id" : " 3d6d4cae " ,
191+ "id" : " 4925a0b9 " ,
192192 "metadata" : {},
193193 "outputs" : [],
194194 "source" : []
You can’t perform that action at this time.
0 commit comments