-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathTime-Series-Analysis.html
More file actions
475 lines (446 loc) · 31.6 KB
/
Time-Series-Analysis.html
File metadata and controls
475 lines (446 loc) · 31.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
<!DOCTYPE html>
<html style="font-size: 16px;" lang="en">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta charset="utf-8">
<meta name="keywords" content="">
<meta name="description" content="">
<title>Data Exploration</title>
<link rel="stylesheet" href="nicepage.css" media="screen">
<link rel="stylesheet" href="Time-Series-Analysis.css" media="screen">
<script class="u-script" type="text/javascript" src="jquery.js" defer=""></script>
<script class="u-script" type="text/javascript" src="nicepage.js" defer=""></script>
<meta name="generator" content="Nicepage 5.10.8, nicepage.com">
<link rel="icon" href="images/favicon1.png">
<link id="u-theme-google-font" rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Roboto:100,100i,300,300i,400,400i,500,500i,700,700i,900,900i|Open+Sans:300,300i,400,400i,500,500i,600,600i,700,700i,800,800i">
<script type="application/ld+json">{
"@context": "http://schema.org",
"@type": "Organization",
"name": "VaccineVerity",
"logo": "images/favicon1.png?rand=72be"
}</script>
<meta name="theme-color" content="#00acee">
<meta property="og:title" content="Data Exploration">
<meta property="og:description" content="">
<meta property="og:type" content="website">
<meta data-intl-tel-input-cdn-path="intlTelInput/">
</head>
<body class="u-body u-xl-mode" data-lang="en">
<header class="u-clearfix u-header u-sticky u-sticky-5f9f u-white u-header" id="sec-ec7c">
<div class="u-clearfix u-sheet u-sheet-1">
<a class="u-image u-logo u-image-1" data-image-width="640" data-image-height="640">
<img src="images/favicon1.png?rand=72be" class="u-logo-image u-logo-image-1">
</a>
<nav class="u-menu u-menu-dropdown u-offcanvas u-menu-1">
<div class="menu-collapse"
style="font-size: 1rem; letter-spacing: 0px; text-transform: uppercase; font-weight: 700;">
<a class="u-button-style u-custom-active-border-color u-custom-border u-custom-border-color u-custom-borders u-custom-color u-custom-hover-border-color u-custom-left-right-menu-spacing u-custom-padding-bottom u-custom-text-active-color u-custom-text-color u-custom-text-decoration u-custom-text-hover-color u-custom-text-shadow u-custom-top-bottom-menu-spacing u-nav-link u-text-active-palette-1-base u-text-hover-palette-2-base"
href="#">
<svg class="u-svg-link" viewBox="0 0 24 24">
<use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#menu-hamburger"></use>
</svg>
<svg class="u-svg-content" version="1.1" id="menu-hamburger" viewBox="0 0 16 16" x="0px" y="0px"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg">
<g>
<rect y="1" width="16" height="2"></rect>
<rect y="7" width="16" height="2"></rect>
<rect y="13" width="16" height="2"></rect>
</g>
</svg>
</a>
</div>
<div class="u-custom-menu u-nav-container">
<ul class="u-nav u-spacing-20 u-unstyled u-nav-1">
<li class="u-nav-item"><a
class="u-border-active-custom-color-3 u-border-hover-custom-color-3 u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4"
href="index.html" style="padding: 10px;">Home</a>
</li>
<li class="u-nav-item"><a
class="u-border-active-custom-color-3 u-border-hover-custom-color-3 u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4"
href="Action-Plan.html" style="padding: 10px;">Action Plan</a>
</li>
<li class="u-nav-item"><a
class="u-border-active-custom-color-3 u-border-hover-custom-color-3 u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4"
rel="nofollow" style="padding: 10px;">Data</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Collection.html">Data Collection</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Collection.html#carousel_d27b">Tweets</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Collection.html#carousel_2ab3">Dataset</a>
</li>
</ul>
</div>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html">Data Exploration</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Google-Colab-Code.html">Google Colab Code</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#sec-7a91">Data Preprocessing</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_267c">Handling Missing Values</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_35ed">Ensuring Formatting Consistency</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_22b9">Categorical Data Encoding</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_0803">Handling Outliers</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_fff1">Normalization/Standardization/Scaling</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Exploration.html#carousel_13f0">Natural Language Processing</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Time-Series-Analysis.html#carousel_ae2f">Time Series Analysis</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Time-Series-Analysis.html#carousel_c843">Interpolation</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Time-Series-Analysis.html#carousel_5971">Binning</a>
</li>
</ul>
</div>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization.html">Data Visualization</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization.html#carousel_05e7">Types of Plots</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization.html#carousel_c843">Scatterplots/Histograms</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization.html#carousel_b060">Heat Maps</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization-Bar.html">Bar/Swarm/Violin Plots</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Visualization-Bar.html#carousel_3262">Line Graphs</a>
</li>
</ul>
</div>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Modelling.html">Data Modelling</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Modelling.html#sec-1ee7">Data Binning</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Modelling.html#carousel_80ac">Topic Clustering Using LDA and t-SNE</a>
</li>
</ul>
</div>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html">Data Communication</a>
<div class="u-nav-popup">
<ul class="u-border-1 u-border-grey-30 u-h-spacing-21 u-nav u-unstyled u-v-spacing-17">
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html#sec-0e94">Results</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html#carousel_a6ec">Conclusion</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html#carousel_7d61">Acknowledgments</a>
</li>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html#carousel_ed46">References</a>
<li class="u-nav-item"><a
class="u-active-white u-button-style u-nav-link u-text-active-custom-color-4 u-text-custom-color-8 u-text-hover-custom-color-4 u-white"
href="Data-Communication.html#carousel_8aad">The Vaxplorers Team</a>
</li>
</ul>
</div>
</li>
</ul>
</div>
</li>
</ul>
</div>
</nav>
<img class="u-image u-image-contain u-image-default u-image-2" src="images/VACCINEVERITYLOGO.jpg" alt=""
data-image-width="766" data-image-height="115">
</div>
<style class="u-sticky-style" data-style-id="5f9f">
.u-sticky-fixed.u-sticky-5f9f,
.u-body.u-sticky-fixed .u-sticky-5f9f {
box-shadow: 0px 2px 8px 0px rgba(128, 128, 128, 1) !important
}
</style>
</header>
<section class="u-clearfix u-gradient u-section-1" id="carousel_ae2f">
<div class="u-clearfix u-sheet u-sheet-1">
<div class="u-container-style u-expanded-width u-group u-radius-50 u-shape-round u-white u-group-1"
data-animation-name="customAnimationIn" data-animation-duration="1500">
<div class="u-container-layout u-container-layout-1">
<p class="u-align-center u-text u-text-1">
<span style="font-weight: 700;"></span>As a final step to the pre-processing stage of our data exploration,
we attempted to gain more insights from the our dataset’s time-dependent behavior and patterns. From the
beginning of this segment, we separated the the two columns that pertain to temporal data to have a more
streamlined approach to this step:<br>
<br>'Joined (MM/YYYY)', 'Date Posted (DD/MM/YY H:M:S) <br>
</p>
</div>
</div>
<div class="u-container-style u-group u-radius-50 u-shape-round u-white u-group-2"
data-animation-name="customAnimationIn" data-animation-duration="1500" data-animation-direction="X"
data-animation-delay="0">
<div class="u-container-layout u-container-layout-2">
<h4 class="gradient u-text u-text-2"><b>Time Series Analysis</b>
</h4>
</div>
</div>
</div>
</section>
<section class="u-clearfix u-gradient u-section-2" id="carousel_c843">
<div class="u-clearfix u-sheet u-sheet-1">
<div
class="u-align-justify u-container-style u-expanded-width u-group u-radius-50 u-shape-round u-white u-group-1"
data-animation-name="customAnimationIn" data-animation-duration="1500">
<div class="u-container-layout u-container-layout-1">
<div class="u-container-style u-group u-radius-50 u-shape-round u-white u-group-2"
data-animation-name="customAnimationIn" data-animation-duration="1500" data-animation-direction="X"
data-animation-delay="0">
<div class="u-container-layout u-valign-middle u-container-layout-2">
<h4 class="gradient u-text u-text-1"><b>Interpolation</b>
</h4>
</div>
</div>
<p class="u-text u-text-2"> Although we already handled missing values during pre-processing, applying
interpolation methods shall allow us to come up with better, continuous plots in later segments. <br>
<br>The function <span style="font-weight: 700;">linear_interpolator()</span> takes the features that have
missing quantitative data or handled zero values to apply linear interpolation:<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-1" src="images/ZG.png" alt=""
data-image-width="1118" data-image-height="287">
<p class="u-text u-text-3">
<span style="font-weight: 700;"></span>To integrate time-dependency, the index of the dataframe copy that
will store the interpolated result is shifted using the 'Date Posted DD/MM/YY H:M:S’ feature. <br>
<br>Each numeric column with handled zeroes is then passed as an argument to Python library’s built-in <span
style="font-weight: 700;">interpolate() </span>method to fill in the data point gaps using neighboring
data points. <br>
</p>
<p class="u-text u-text-4"> Another method of interpolation is the use of cubic splines through the function
<span style="font-weight: 700;">cubic_spline_interpolator()</span> that also shifts the index of the
dataframe copy to put emphasis on time-dependency: <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-2" src="images/ZH.png" alt=""
data-image-width="1118" data-image-height="268">
<p class="u-text u-text-5">
<span style="font-weight: 700;"></span>Other than the method parameter, there are additional arguments for
spline interpolation using Python’s built-in libraries. <br>
<br>The ‘order’ dictates that a cubic polynomial will be used to divide the data points into smaller
intervals. The resulting piecewise polynomials will then be used to fit a curve. <br>
<br>Ultimately, a smoothing curve parameter ‘s’ is required to control the tension or flexibility of the
curve itself. The higher it is, the higher the emphasis on overall trend will be. <br>
<br>A trade-off, however, is that the higher ‘s’ value might result in a curve that does not exactly fit the
data points. Thus, it cancels out variability in our dataset. <br>
</p>
<p class="u-text u-text-6"> And for the third interpolation method, the polynomial_interpolator() is a
function that attempts to fit a polynomial that passes through all of our data points for each
feature. <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-3" src="images/ZI.png" alt=""
data-image-width="1118" data-image-height="365">
<p class="u-text u-text-7">
<span style="font-weight: 700;"></span>Naturally, this approach requires two separate arrays for the
coefficients and values of the polynomial to be fitted. <br>
<br>As implemented in previous methods, the temporal data for the posting timestamps of the tweets was used
as the independent variable. <br>
<br>Then, the data points in each column serve as the y-values in the resulting range of the polynomial
function. <br>
<br>Python’s NumPy <span style="font-weight: 700;">poly1d() </span>method creates an object from the
obtained polynomial from <span style="font-weight: 700;">polyfit()</span>, which returns an array of
coefficients that best fits the data points. <br>
<br>As for the polynomial’s degree 2, it shall allow us to arrive at a function that will balance both the
individual data points and capture some of the underlying complex trends. <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-4" src="images/ZJ.png" alt=""
data-image-width="1118" data-image-height="115">
<p class="u-text u-text-8"> Similar with the scaling step, copies of the original dataframe were used to save
the interpolated values for later analysis and plotting. To check the result, <span
style="font-weight: 700;">print()</span> method was applied:<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-5" src="images/ZK.png" alt=""
data-image-width="1118" data-image-height="478">
<p class="u-text u-text-9">
<span style="font-weight: 700;"></span>Notice that there are trailing zeroes for the results of both linear
and spline interpolation methods. This is due to the fact that there are valid zero values in our original
dataset. <br>
<br>Since linear interpolation relies on neighboring values to fill in the missing data points, the presence
of multiple zeroes was also manifested in the resulting interpolated features. <br>
<br>As for cubic spline interpolation, the trailing zeroes represent a flat region rather than a curve that
fits the data points. This simply means that there is a constant trend certain features and not that
interpolation did not work. <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-6" src="images/ZL.png" alt=""
data-image-width="1118" data-image-height="241">
<p class="u-text u-text-10">
<span style="font-weight: 700;"></span>Notice that for the result of polynomial interpolation, the value of
the data points are significantly different from the original dataset. <br>
<br>This is due to the nature of the curve fitting process that requires the polynomial function to pass
through the data points as close as possible. <br>
<br>As a result, the coefficients are adjusted as deemed necessary and the interpolated result provide
estimated values to capture overall trends. <br>
</p>
</div>
</div>
</div>
</section>
<section class="u-clearfix u-gradient u-section-3" id="carousel_5971">
<div class="u-clearfix u-sheet u-sheet-1">
<div
class="u-align-justify u-container-style u-expanded-width u-group u-radius-50 u-shape-round u-white u-group-1"
data-animation-name="customAnimationIn" data-animation-duration="1500">
<div class="u-container-layout u-container-layout-1">
<div class="u-container-style u-group u-radius-50 u-shape-round u-white u-group-2"
data-animation-name="customAnimationIn" data-animation-duration="1500" data-animation-direction="X"
data-animation-delay="0">
<div class="u-container-layout u-container-layout-2">
<h4 class="gradient u-text u-text-1"> Binning</h4>
</div>
</div>
<p class="u-text u-text-2"> The second portion of time series analysis enables us to summarize our dataset
using discrete bins that groups the data points for a more inclusive feature analysis and easier
visualization. <br>
<br>In previous sections, it was emphasized that we divided the features of our dataset into groups
according to the nature or level of the data points. <br>
<br>We took this grouping into account when devising the interval bins for this segment of our data
exploration, such that each group of features has its own set of bins. <br>
<br>Python Pandas <span style="font-weight: 700;">cut() </span>method was the primary tool employed to
achieve the creation of succeeding fixed bins for the discretization of our dataset’s numerical and temporal
values.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-1" src="images/ZM.png" alt=""
data-image-width="1118" data-image-height="151">
<p class="u-text u-text-3"> For each group of features, a designated copy of the original dataframe is
instantiated for the binning process to avoid transformation errors.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-2" src="images/ZN.png" alt=""
data-image-width="1118" data-image-height="265">
<p class="u-text u-text-4"> The temporal data columns pertain to two features that identify the account who
posted a tweet and when it was posted. <br>
<br>We used two different sets of fixed bins for the temporal data since either of them pertain to different
aspects of the data i.e., the account and the tweet itself. <br>
<br>For the 'Joined (MM/YYYY)’ feature, we decided to divide the time series according to intervals of four
years starting from when Twitter was first released. This shall help us categorize an account according to
how old or new it is. <br>
<br>'Date Posted (DD/MM/YY H:M:S)' feature is then divided into intervals of six months. The initial
interval bounds the first six months of the pandemic, succeeding six months, then the last six of months of
2022. <br>
<br>Built-in pandas utilities, <span style="font-weight: 700;">pd.date_range</span> and <span
style="font-weight: 700;">pd.to_datetime</span> were used as measures to ensure that the intervals are
compared with the right type of numerical data values before assigning their respective IntervalIndex dtype
value.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-3" src="images/ZO.png" alt=""
data-image-width="1118" data-image-height="92">
<p class="u-text u-text-5"> Columns containing interval data are features that pertain to number of days in
connection with significant COVID-19-related events that took place over the course of the
pandemic. <br>
<br>To attain a more uniform comparison and analysis, their fixed bins are bounded between intervals of
three months or ninety days. <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-4" src="images/ZP.png" alt=""
data-image-width="1118" data-image-height="170">
<p class="u-text u-text-6"> The last group of columns apply to characteristics of the gathered tweets
themselves and are counts that relate to their influence in the social media platform. <br>
<br>Both 'Following' and 'Followers' columns are attributes that identify the account owners so we created
separate fixed bins for these, which represent higher magnitude than other features in this group. <br>
<br>As for the rest of the features, their fixed bins are bounded between intervals of ten counts, including
the absolute zero so the starting value is open around negative one (-1). <br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-5" src="images/ZQ.png" alt=""
data-image-width="1118" data-image-height="151">
<p class="u-text u-text-7"> Upon executing the main binning methods, all of the copies of binned columns are
then concatenated into one dataframe for later use using Pandas <span
style="font-weight: 700;">pd.concat()</span>.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-6" src="images/ZR.png" alt=""
data-image-width="1118" data-image-height="241">
<p class="u-text u-text-8"> The columns containing temporal data, as observable using the straightforward
<span style="font-weight: 700;">print()</span> method, now have their corresponding interval values ranging
from dates and timestamps instantiated in their respective fixed bins.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-7" src="images/ZS.png" alt=""
data-image-width="1118" data-image-height="444">
<p class="u-text u-text-9"> Columns encompassing numbers of days depending on particular COVID-19-related
events also display their matching interval bins. Notice that the columns containing <span
style="font-weight: 700;">IntervalIndex dtype</span> are also renamed accordingly.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-8" src="images/ZT.png" alt=""
data-image-width="1118" data-image-height="237">
<p class="u-text u-text-10"> Lastly, the rational data points and their fitting attributes are also exhibiting
suitable fixed bin values that properly categorizes the tweet’s influence based on the feature count’s
magnitude.<br>
</p>
<img class="u-image u-image-round u-preserve-proportions u-radius-20 u-image-9" src="images/ZU.png" alt=""
data-image-width="1118" data-image-height="241">
<p class="u-text u-text-11"> Combining all the binned copies of the main dataframe, as seen above, shall
prepare us for a more uniform and systematic analysis in latter segments of our project. <br>
<br>This will serve as one separate dataframe that we can utilize further to arrive at more comprehensive
plots. <br>
</p>
</div>
</div>
</div>
</section>
<footer class="u-align-center u-clearfix u-footer u-grey-80 u-footer" id="sec-d432">
<div class="u-clearfix u-sheet u-sheet-1">
<p class="u-align-left u-small-text u-text u-text-variant u-text-1">@ VaccineVerity 2023. All Rights Reserved.</p>
</div>
</footer>
</body>
</html>