@@ -20,7 +20,7 @@ Transcript show: 'Total Average Price: ', averagePrice asString; cr.
2020This works perfectly for small files. But here's what happens when your CSV file grows from 1MB to 100MB to 1GB:
2121
2222** The Problems:**
23- - ** Memory Overhead** : DataFrames load the entire file into memory, which can be 10x larger than the file size itself. For a 1GB file, you might end up using 10GB of RAM.
23+ - ** Memory Overhead** : DataFrames load the entire file into memory, which can be ** 100x larger** than the file size itself. For a 16MB file, you might end up using over 12GB of RAM.
2424- ** Garbage Collection** : As the DataFrame grows, garbage collection kicks in frequently, slowing down your program. This is because DataFrames create many temporary objects that need to be cleaned up.
2525- ** Performance** : Calculating the average involves iterating through potentially millions of rows, which can take a long time. The more data you have, the slower it gets.
2626- ** Crashes** : If the file is larger than your available RAM, you get "Out of Memory" errors, causing your program to crash.
@@ -76,6 +76,9 @@ Transcript show: 'Generated file: ', (stockDataFile asFileReference size / 1024
7676```
7777This code generates a CSV file with 500,000 rows of realistic stock market data, including serial numbers, prices, daily lows, and highs. Each price is a random fluctuation around the previous day's price.
7878## Performance Testing: The Numbers Tell the Story
79+
80+ > ** Important Note for Benchmarking** : Before running these benchmarks, please disable background processes & ensure your system is not under heavy load. This will help ensure accurate and consistent benchmark results.
81+
7982Now let's compare the performance of the DataFrame approach with the circular buffer approach for calculating moving averages.
8083
8184### Test Setup
@@ -84,12 +87,13 @@ Now let's compare the performance of the DataFrame approach with the circular bu
8487Transcript show: 'Testing DataFrame approach...'; cr.
85883 timesRepeat: [ Smalltalk garbageCollect ].
8689[
87- | startTime endTime memoryBefore memoryAfter gcBefore gcAfter gcTimeBefore gcTimeAfter
88- stockData priceColumn movingAverages |
90+ | startTime endTime allocatedMemory numberOfScavenges numberOfFullGCs totalGCTime
91+ stockData priceColumn movingAverages |
8992
90- memoryBefore := Smalltalk vm parameterAt: 3.
91- gcBefore := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
92- gcTimeBefore := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
93+ allocatedMemory := Smalltalk vm parameterAt: 34.
94+ numberOfScavenges := Smalltalk vm parameterAt: 9.
95+ numberOfFullGCs := Smalltalk vm parameterAt: 7.
96+ totalGCTime := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
9397 startTime := Time millisecondClockValue.
9498
9599 "Load entire dataset"
@@ -110,14 +114,17 @@ Transcript show: 'Testing DataFrame approach...'; cr.
110114 ].
111115
112116 endTime := Time millisecondClockValue.
113- memoryAfter := Smalltalk vm parameterAt: 3.
114- gcAfter := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
115- gcTimeAfter := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
117+ allocatedMemory := (Smalltalk vm parameterAt: 34) - allocatedMemory.
118+ numberOfScavenges := (Smalltalk vm parameterAt: 9) - numberOfScavenges.
119+ numberOfFullGCs := (Smalltalk vm parameterAt: 7) - numberOfFullGCs.
120+ totalGCTime := ((Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10)) - totalGCTime.
121+
116122 Transcript show: 'DataFrame Test Results:'; cr.
117123 Transcript show: ' Time: ', (endTime - startTime) asString, ' ms'; cr.
118- Transcript show: ' Memory: ', ((memoryAfter - memoryBefore) / 1024 / 1024) rounded asString, ' MB'; cr.
119- Transcript show: ' GC Events: ', (gcAfter - gcBefore) asString; cr.
120- Transcript show: ' GC Time: ', (gcTimeAfter - gcTimeBefore) asString, ' ms'; cr.
124+ Transcript show: ' Memory Allocated: ', (allocatedMemory / 1024 / 1024) rounded asString, ' MB'; cr.
125+ Transcript show: ' Scavenges: ', numberOfScavenges asString; cr.
126+ Transcript show: ' Full GCs: ', numberOfFullGCs asString; cr.
127+ Transcript show: ' Total GC Time: ', totalGCTime asString, ' ms'; cr.
121128 Transcript show: ' Moving Averages: ', movingAverages size asString; cr.
122129 Transcript show: ' Final MA: $', (movingAverages last roundTo: 0.01) asString; cr; cr.
123130
@@ -132,12 +139,13 @@ Transcript show: 'Testing Buffer approach...'; cr.
132139
1331403 timesRepeat: [ Smalltalk garbageCollect ].
134141[
135- | startTime endTime memoryBefore memoryAfter gcBefore gcAfter gcTimeBefore gcTimeAfter
142+ | startTime endTime allocatedMemory numberOfScavenges numberOfFullGCs totalGCTime
136143 priceBuffer movingAverages processedCount |
137144
138- memoryBefore := Smalltalk vm parameterAt: 3.
139- gcBefore := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
140- gcTimeBefore := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
145+ allocatedMemory := Smalltalk vm parameterAt: 34.
146+ numberOfScavenges := Smalltalk vm parameterAt: 9.
147+ numberOfFullGCs := Smalltalk vm parameterAt: 7.
148+ totalGCTime := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
141149 startTime := Time millisecondClockValue.
142150
143151 priceBuffer := CTFIFOBuffer new: windowSize.
@@ -172,15 +180,17 @@ Transcript show: 'Testing Buffer approach...'; cr.
172180 ].
173181
174182 endTime := Time millisecondClockValue.
175- memoryAfter := Smalltalk vm parameterAt: 3.
176- gcAfter := (Smalltalk vm parameterAt: 7) + (Smalltalk vm parameterAt: 9).
177- gcTimeAfter := (Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10).
183+ allocatedMemory := (Smalltalk vm parameterAt: 34) - allocatedMemory.
184+ numberOfScavenges := (Smalltalk vm parameterAt: 9) - numberOfScavenges.
185+ numberOfFullGCs := (Smalltalk vm parameterAt: 7) - numberOfFullGCs.
186+ totalGCTime := ((Smalltalk vm parameterAt: 8) + (Smalltalk vm parameterAt: 10)) - totalGCTime.
178187
179188 Transcript show: 'Buffer Results:'; cr.
180189 Transcript show: ' Time: ', (endTime - startTime) asString, ' ms'; cr.
181- Transcript show: ' Memory: ', ((memoryAfter - memoryBefore) / 1024 / 1024) rounded asString, ' MB'; cr.
182- Transcript show: ' GC Events: ', (gcAfter - gcBefore) asString; cr.
183- Transcript show: ' GC Time: ', (gcTimeAfter - gcTimeBefore) asString, ' ms'; cr.
190+ Transcript show: ' Memory Allocated: ', (allocatedMemory / 1024 / 1024) rounded asString, ' MB'; cr.
191+ Transcript show: ' Scavenges: ', numberOfScavenges asString; cr.
192+ Transcript show: ' Full GCs: ', numberOfFullGCs asString; cr.
193+ Transcript show: ' Total GC Time: ', totalGCTime asString, ' ms'; cr.
184194 Transcript show: ' Moving Averages: ', movingAverages size asString; cr.
185195 Transcript show: ' Final MA: $', (movingAverages last roundTo: 0.01) asString; cr; cr.
186196] value.
@@ -195,19 +205,20 @@ Transcript show: 'Tests Done!'; cr.
195205Here are the results from running this benchmark on a 500,000-row dataset (approximately 15MB file):
196206| Metric | DataFrame | Circular Buffer | Improvement |
197207| --------| -----------| -----------------| -------------|
198- | ** Execution Time** | ~ 15,100 ms | ~ 2,100 ms | ** 7.2x faster** |
199- | ** Memory Usage** | ~ 128 MB | ~ 16 MB | ** 8x less memory** |
200- | ** GC Events** | ~ 870 | ~ 52 | ** 94% fewer** |
201- | ** GC Time** | ~ 3,500 ms | ~ 3 ms | ** 1200x less GC overhead** |
208+ | ** Execution Time** | 14,986 ms | 2,136 ms | ** 7.0x faster** |
209+ | ** Memory Allocated** | 12,914 MB | 785 MB | ** 16.4x less memory** |
210+ | ** Scavenges** | 867 | 52 | ** 94% fewer** |
211+ | ** Full GCs** | 4 | 0 | ** 100% fewer** |
212+ | ** Total GC Time** | 3,769 ms | 3 ms | ** 1,256x less GC overhead** |
202213| ** Results Generated** | 499,901 | 499,901 | Identical accuracy |
203214
204215* Note: Results may vary based on your hardware, Pharo version, and system load*
205216
206217** Key Insights:**
207- - Circular buffers processed the same data ** 7.2x faster**
208- - Used ** 8x less memory** despite processing the same amount of data
209- - Had ** 94% fewer garbage collection events ** , leading to smoother performance
210- - Spent virtually no time on memory cleanup (3ms vs 3.5 + seconds)
218+ - Circular buffers processed the same data ** 7.0x faster**
219+ - Used ** 16.4x less memory** despite processing the same amount of data
220+ - Had ** 94% fewer scavenges and eliminated all full GCs ** , leading to smoother performance
221+ - Spent virtually no time on garbage collection (3ms vs 3.8 + seconds)
211222- Produced identical results, proving accuracy isn't compromised
212223
213224## When to Use Each Approach
0 commit comments