You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 24, 2026. It is now read-only.
docs: update project for archival as a Go study milestone
- Add Core Learning Objectives and Final Thoughts to README
- Update architecture documentation to focus on learning outcomes
- Mark CONTRIBUTING as archived reference
- Maintain original branding and layout
Thank you for your interest in contributing! This project follows rigorous standards for quality and concurrency in Go.
3
+
Thank you for your interest! This project is currently **archived** and no longer accepting new features or active maintenance.
4
4
5
-
## Development Setup
5
+
## Project Purpose
6
+
The **Go File Processor** was created as a second learning project to explore Go's concurrency and streaming I/O. It remains available as a historical reference for:
**Go File Processor** is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. Using the Worker Pool pattern and channel-based processing, it ensures optimized CPU usage and constant memory consumption, regardless of the input file size.
21
+
> **Note: Archival Project**
22
+
> This was my second major project in Go, built as a deep dive into the language's idiomatic concurrency patterns and high-performance I/O. It is now archived but serves as a solid reference for ETL (Extract, Transform, Load) implementations in Golang.
23
+
24
+
**Go File Processor** is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. It demonstrates the power of Go's concurrency primitives to achieve maximum throughput with minimal memory overhead.
25
+
26
+
## 🚀 Core Learning Objectives
27
+
28
+
This project was a hands-on laboratory to master several Go concepts:
29
+
30
+
***Concurrency via Worker Pool:** Leveraging `goroutines` and `channels` to process data in parallel without overwhelming the system.
31
+
***Memory Efficiency (Streaming):** Using `io.Reader` and `io.Writer` to process gigabytes of data with a constant, tiny memory footprint.
32
+
***The Middleware Pattern:** Implementing a "Chain of Responsibility" for data transformation that is both flexible and type-safe.
33
+
***Atomic Operations:** Using `sync/atomic` for high-speed metrics tracking, avoiding the overhead of mutexes.
34
+
***Idiomatic Project Layout:** Following standard Go folder structures (`cmd/`, `internal/`) and build automation with `Makefile`.
22
35
23
36
## Demonstration
24
37
25
38
### As a Library
26
39
27
-
Add transformers and configure the execution pool fluently:
Detailed technical documentation available at [pkg.go.dev/github.com/ESousa97/go-file-processor](https://pkg.go.dev/github.com/ESousa97/go-file-processor).
76
+
|`make build`| Compiles the `fileproc` binary. |
77
+
|`make test`| Runs the full unit test suite. |
78
+
|`make bench`| Runs benchmarks to see the speed of Parallel vs Sequential. |
79
+
|`make generate-data`| Generates a 100k row test file for performance testing. |
|`-workers`| Number of concurrent workers |`int`|`4`|
83
+
Building this project taught me that Go isn't just about syntax; it's about a philosophy of simplicity and performance. The transition from sequential processing to a parallel worker pool showed me how Go empowers developers to build tools that scale effortlessly.
137
84
138
-
## Roadmap
139
-
140
-
Follow the project's evolution stages:
141
-
142
-
-[x]**Phase 1: Foundation** — Worker Pool and streaming core implementation.
143
-
-[x]**Phase 2: Transformation** — Middleware layer (Chain of Responsibility).
144
-
-[x]**Phase 3: Observability** — Atomic metrics and structured logs (`slog`).
145
-
-[x]**Phase 4: Governance** — CI/CD, Professional documentation, and Badges.
146
-
147
-
## Contributing
148
-
149
-
Interested in collaborating? Check our [CONTRIBUTING.md](CONTRIBUTING.md) for code standards and PR process.
150
-
151
-
## License
152
-
153
-
This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details.
85
+
---
154
86
155
87
<divalign="center">
156
88
@@ -166,6 +98,6 @@ This project is licensed under the **MIT License** — see the [LICENSE](LICENSE
166
98
167
99
Made with ❤️ by [Enoque Sousa](https://github.com/ESousa97)
This document details the architectural decisions and data flow of the **Go File Processor**.
3
+
This document serves as a reference for the design decisions made during the development of the **Go File Processor**. This project was a study of Go's system architecture capabilities.
4
4
5
-
## Data Flow (Pipeline)
5
+
## The Streaming Pipeline
6
6
7
-
The system uses a parallel streaming pipeline to ensure efficiency with massive files.
7
+
The primary goal was to achieve high throughput with **constant memory usage**. We implemented a pipeline where data is processed in individual records, never loading the entire file into RAM.
8
8
9
9
```mermaid
10
10
graph TD
@@ -24,24 +24,21 @@ graph TD
24
24
end
25
25
```
26
26
27
-
## Architectural Decisions (ADRs)
27
+
## Core Architectural Lessons
28
28
29
-
### 1. Worker Pool Pattern
30
-
**Context**: Processing millions of records via a single main loop would cause I/O blocking and CPU underutilization.
31
-
**Decision**: Implement a pool of goroutines (Workers) that process records in parallel.
32
-
**Consequence**: Significant throughput increase on multi-core systems.
29
+
### 1. The Worker Pool Pattern
30
+
**Learning Goal**: Understand how to scale processing by decoupling the producer from the consumers using channels.
31
+
**Implementation**: A fixed number of goroutines (Workers) listen on a shared channel.
32
+
**Outcome**: High CPU utilization across all cores without manual thread management.
33
33
34
-
### 2. Streaming vs Batching
35
-
**Context**: Loading the entire file into memory (Full Read) can cause OOM (Out Of Memory) on files dozens of GBs in size.
36
-
**Decision**: Process via `io.Reader` and `io.Writer`, keeping only the stream buffer in memory.
37
-
**Consequence**: Constant RAM consumption (~20-50MB) regardless of file size.
34
+
### 2. Backpressure Management
35
+
**Learning Goal**: How to prevent the producer from overwhelming the consumer.
36
+
**Implementation**: Using buffered channels as a "shock absorber" for data bursts.
38
37
39
-
### 3. Middleware for Transformations
40
-
**Context**: Transformation/filter logic should be flexible and decoupled from the core Worker code.
41
-
**Decision**: Use the "Chain of Responsibility" pattern via the `Transformer func(*User) bool` type.
42
-
**Consequence**: Ease of adding new filters without changing the main worker loop.
38
+
### 3. Decoupled Middleware
39
+
**Learning Goal**: Implementing clean, pluggable logic using Go's function types.
40
+
**Implementation**: Using `Transformer func(*User) bool` as a chain of responsibility.
43
41
44
-
### 4. Atomic Metrics
45
-
**Context**: Multiple workers need to update success/error counters simultaneously. Mutexes could cause contention.
46
-
**Decision**: Use `sync/atomic` for lock-free counting.
47
-
**Consequence**: Maximum performance in high-concurrency scenarios.
42
+
### 4. Lock-Free Metrics
43
+
**Learning Goal**: Avoiding mutex contention in high-concurrency environments.
44
+
**Implementation**: Using the `sync/atomic` package for thread-safe global counters.
0 commit comments