Skip to content

Commit 62c604e

Browse files
authored
Merge pull request #10 from Alokzh/fix-minor-naming-issues
Updated DataFrame vs Buffer blogpots & added new alias Methods
2 parents 24038e8 + 0b71af2 commit 62c604e

3 files changed

Lines changed: 85 additions & 50 deletions

File tree

docs/DataFrame vs Buffer.md

Lines changed: 41 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Transcript show: 'Total Average Price: ', averagePrice asString; cr.
2020
This works perfectly for small files. But here's what happens when your CSV file grows from 1MB to 100MB to 1GB:
2121

2222
**The Problems:**
23-
- **Memory Overhead**: DataFrames load the entire file into memory, which can be 10x larger than the file size itself. For a 1GB file, you might end up using 10GB of RAM.
23+
- **Memory Overhead**: DataFrames load the entire file into memory, which can be **100x larger** than the file size itself. For a 16MB file, you might end up using over 12GB of RAM.
2424
- **Garbage Collection**: As the DataFrame grows, garbage collection kicks in frequently, slowing down your program. This is because DataFrames create many temporary objects that need to be cleaned up.
2525
- **Performance**: Calculating the average involves iterating through potentially millions of rows, which can take a long time. The more data you have, the slower it gets.
2626
- **Crashes**: If the file is larger than your available RAM, you get "Out of Memory" errors, causing your program to crash.
@@ -76,6 +76,9 @@ Transcript show: 'Generated file: ', (stockDataFile asFileReference size / 1024
7676
```
7777
This code generates a CSV file with 500,000 rows of realistic stock market data, including serial numbers, prices, daily lows, and highs. Each price is a random fluctuation around the previous day's price.
7878
## Performance Testing: The Numbers Tell the Story
79+
80+
> **Important Note for Benchmarking**: Before running these benchmarks, please disable background processes & ensure your system is not under heavy load. This will help ensure accurate and consistent benchmark results.
81+
7982
Now let's compare the performance of the DataFrame approach with the circular buffer approach for calculating moving averages.
8083

8184
### Test Setup
@@ -84,12 +87,13 @@ Now let's compare the performance of the DataFrame approach with the circular bu
8487
Transcript show: 'Testing DataFrame approach...'; cr.
8588
3 timesRepeat: [ Smalltalk garbageCollect ].
8689
[
87-
| startTime endTime memoryBefore memoryAfter gcBefore gcAfter gcTimeBefore gcTimeAfter
88-
stockData priceColumn movingAverages |
90+
| startTime endTime allocatedMemory numberOfScavenges numberOfFullGCs totalGCTime
91+
stockData priceColumn movingAverages |
8992
90-
memoryBefore := Smalltalk vm parameterAt: 3.
91-
gcBefore := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
92-
gcTimeBefore := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
93+
allocatedMemory := Smalltalk vm parameterAt: 34.
94+
numberOfScavenges := Smalltalk vm parameterAt: 9.
95+
numberOfFullGCs := Smalltalk vm parameterAt: 7.
96+
totalGCTime := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
9397
startTime := Time millisecondClockValue.
9498
9599
"Load entire dataset"
@@ -110,14 +114,17 @@ Transcript show: 'Testing DataFrame approach...'; cr.
110114
].
111115
112116
endTime := Time millisecondClockValue.
113-
memoryAfter := Smalltalk vm parameterAt: 3.
114-
gcAfter := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
115-
gcTimeAfter := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
117+
allocatedMemory := (Smalltalk vm parameterAt: 34) - allocatedMemory.
118+
numberOfScavenges := (Smalltalk vm parameterAt: 9) - numberOfScavenges.
119+
numberOfFullGCs := (Smalltalk vm parameterAt: 7) - numberOfFullGCs.
120+
totalGCTime := ((Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10)) - totalGCTime.
121+
116122
Transcript show: 'DataFrame Test Results:'; cr.
117123
Transcript show: ' Time: ', (endTime - startTime) asString, ' ms'; cr.
118-
Transcript show: ' Memory: ', ((memoryAfter - memoryBefore) / 1024 / 1024) rounded asString, ' MB'; cr.
119-
Transcript show: ' GC Events: ', (gcAfter - gcBefore) asString; cr.
120-
Transcript show: ' GC Time: ', (gcTimeAfter - gcTimeBefore) asString, ' ms'; cr.
124+
Transcript show: ' Memory Allocated: ', (allocatedMemory / 1024 / 1024) rounded asString, ' MB'; cr.
125+
Transcript show: ' Scavenges: ', numberOfScavenges asString; cr.
126+
Transcript show: ' Full GCs: ', numberOfFullGCs asString; cr.
127+
Transcript show: ' Total GC Time: ', totalGCTime asString, ' ms'; cr.
121128
Transcript show: ' Moving Averages: ', movingAverages size asString; cr.
122129
Transcript show: ' Final MA: $', (movingAverages last roundTo: 0.01) asString; cr; cr.
123130
@@ -132,12 +139,13 @@ Transcript show: 'Testing Buffer approach...'; cr.
132139
133140
3 timesRepeat: [ Smalltalk garbageCollect ].
134141
[
135-
| startTime endTime memoryBefore memoryAfter gcBefore gcAfter gcTimeBefore gcTimeAfter
142+
| startTime endTime allocatedMemory numberOfScavenges numberOfFullGCs totalGCTime
136143
priceBuffer movingAverages processedCount |
137144
138-
memoryBefore := Smalltalk vm parameterAt: 3.
139-
gcBefore := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
140-
gcTimeBefore := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
145+
allocatedMemory := Smalltalk vm parameterAt: 34.
146+
numberOfScavenges := Smalltalk vm parameterAt: 9.
147+
numberOfFullGCs := Smalltalk vm parameterAt: 7.
148+
totalGCTime := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
141149
startTime := Time millisecondClockValue.
142150
143151
priceBuffer := CTFIFOBuffer new: windowSize.
@@ -172,15 +180,17 @@ Transcript show: 'Testing Buffer approach...'; cr.
172180
].
173181
174182
endTime := Time millisecondClockValue.
175-
memoryAfter := Smalltalk vm parameterAt: 3.
176-
gcAfter := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
177-
gcTimeAfter := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
183+
allocatedMemory := (Smalltalk vm parameterAt: 34) - allocatedMemory.
184+
numberOfScavenges := (Smalltalk vm parameterAt: 9) - numberOfScavenges.
185+
numberOfFullGCs := (Smalltalk vm parameterAt: 7) - numberOfFullGCs.
186+
totalGCTime := ((Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10)) - totalGCTime.
178187
179188
Transcript show: 'Buffer Results:'; cr.
180189
Transcript show: ' Time: ', (endTime - startTime) asString, ' ms'; cr.
181-
Transcript show: ' Memory: ', ((memoryAfter - memoryBefore) / 1024 / 1024) rounded asString, ' MB'; cr.
182-
Transcript show: ' GC Events: ', (gcAfter - gcBefore) asString; cr.
183-
Transcript show: ' GC Time: ', (gcTimeAfter - gcTimeBefore) asString, ' ms'; cr.
190+
Transcript show: ' Memory Allocated: ', (allocatedMemory / 1024 / 1024) rounded asString, ' MB'; cr.
191+
Transcript show: ' Scavenges: ', numberOfScavenges asString; cr.
192+
Transcript show: ' Full GCs: ', numberOfFullGCs asString; cr.
193+
Transcript show: ' Total GC Time: ', totalGCTime asString, ' ms'; cr.
184194
Transcript show: ' Moving Averages: ', movingAverages size asString; cr.
185195
Transcript show: ' Final MA: $', (movingAverages last roundTo: 0.01) asString; cr; cr.
186196
] value.
@@ -195,19 +205,20 @@ Transcript show: 'Tests Done!'; cr.
195205
Here are the results from running this benchmark on a 500,000-row dataset (approximately 15MB file):
196206
| Metric | DataFrame | Circular Buffer | Improvement |
197207
|--------|-----------|-----------------|-------------|
198-
| **Execution Time** | ~15,100 ms | ~2,100 ms | **7.2x faster** |
199-
| **Memory Usage** | ~128 MB | ~16 MB | **8x less memory** |
200-
| **GC Events** | ~870 | ~52 | **94% fewer** |
201-
| **GC Time** | ~3,500 ms | ~3 ms | **1200x less GC overhead** |
208+
| **Execution Time** | 14,986 ms | 2,136 ms | **7.0x faster** |
209+
| **Memory Allocated** | 12,914 MB | 785 MB | **16.4x less memory** |
210+
| **Scavenges** | 867 | 52 | **94% fewer** |
211+
| **Full GCs** | 4 | 0 | **100% fewer** |
212+
| **Total GC Time** | 3,769 ms | 3 ms | **1,256x less GC overhead** |
202213
| **Results Generated** | 499,901 | 499,901 | Identical accuracy |
203214

204215
*Note: Results may vary based on your hardware, Pharo version, and system load*
205216

206217
**Key Insights:**
207-
- Circular buffers processed the same data **7.2x faster**
208-
- Used **8x less memory** despite processing the same amount of data
209-
- Had **94% fewer garbage collection events**, leading to smoother performance
210-
- Spent virtually no time on memory cleanup (3ms vs 3.5+ seconds)
218+
- Circular buffers processed the same data **7.0x faster**
219+
- Used **16.4x less memory** despite processing the same amount of data
220+
- Had **94% fewer scavenges and eliminated all full GCs**, leading to smoother performance
221+
- Spent virtually no time on garbage collection (3ms vs 3.8+ seconds)
211222
- Produced identical results, proving accuracy isn't compromised
212223

213224
## When to Use Each Approach

src/Containers-Buffer-Tests/CTFIFOBufferTest.class.st

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,26 @@ CTFIFOBufferTest >> setUp [
1616
buffer := CTFIFOBuffer new: 3
1717
]
1818

19+
{ #category : 'tests' }
20+
CTFIFOBufferTest >> testAddAlias [
21+
22+
buffer add: 'first'.
23+
self assert: buffer size equals: 1.
24+
self assert: buffer peek equals: 'first'.
25+
26+
buffer add: 'second'.
27+
self assert: buffer size equals: 2
28+
]
29+
30+
{ #category : 'tests' }
31+
CTFIFOBufferTest >> testAddAllAlias [
32+
33+
| result |
34+
result := buffer addAll: #( 'a' 'b' 'c' ).
35+
self assert: buffer size equals: 3.
36+
self assert: buffer isFull
37+
]
38+
1939
{ #category : 'tests' }
2040
CTFIFOBufferTest >> testChatMessageQueue [
2141

src/Containers-Buffer/CTAbstractBuffer.class.st

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,24 @@ CTAbstractBuffer class >> new [
2525

2626
{ #category : 'instance creation' }
2727
CTAbstractBuffer class >> new: anInteger [
28-
2928
"Create a new buffer with the specified capacity"
3029

3130
anInteger < 1 ifTrue: [ self error: 'Capacity must be positive' ].
3231
^ self basicNew
33-
initializeWithCapacity: anInteger;
34-
yourself
32+
setupWithCapacity: anInteger;
33+
yourself
34+
]
35+
36+
{ #category : 'adding' }
37+
CTAbstractBuffer >> add: anObject [
38+
39+
^ self push: anObject
40+
]
41+
42+
{ #category : 'adding' }
43+
CTAbstractBuffer >> addAll: aCollection [
44+
45+
^ self pushAll: aCollection
3546
]
3647

3748
{ #category : 'accessing' }
@@ -68,23 +79,6 @@ CTAbstractBuffer >> elements [
6879
^ elements
6980
]
7081

71-
{ #category : 'initialization' }
72-
CTAbstractBuffer >> initialize [
73-
74-
super initialize.
75-
self initializeWithCapacity: 10
76-
]
77-
78-
{ #category : 'private' }
79-
CTAbstractBuffer >> initializeWithCapacity: anInteger [
80-
81-
capacity := anInteger.
82-
elements := Array new: capacity.
83-
readIndex := 1.
84-
writeIndex := 1.
85-
currentSize := 0
86-
]
87-
8882
{ #category : 'testing' }
8983
CTAbstractBuffer >> isEmpty [
9084

@@ -182,6 +176,16 @@ CTAbstractBuffer >> removeAll [
182176
currentSize := 0
183177
]
184178

179+
{ #category : 'private' }
180+
CTAbstractBuffer >> setupWithCapacity: anInteger [
181+
182+
capacity := anInteger.
183+
elements := Array new: capacity.
184+
readIndex := 1.
185+
writeIndex := 1.
186+
currentSize := 0
187+
]
188+
185189
{ #category : 'accessing' }
186190
CTAbstractBuffer >> size [
187191

0 commit comments

Comments
 (0)