Skip to content
This repository was archived by the owner on Mar 24, 2026. It is now read-only.

Commit 125e312

Browse files
committed
docs: update project for archival as a Go study milestone
- Add Core Learning Objectives and Final Thoughts to README - Update architecture documentation to focus on learning outcomes - Mark CONTRIBUTING as archived reference - Maintain original branding and layout
1 parent 76045eb commit 125e312

3 files changed

Lines changed: 66 additions & 148 deletions

File tree

CONTRIBUTING.md

Lines changed: 14 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,18 @@
1-
# Contributing to Go File Processor
1+
# Contributing (Archived Project)
22

3-
Thank you for your interest in contributing! This project follows rigorous standards for quality and concurrency in Go.
3+
Thank you for your interest! This project is currently **archived** and no longer accepting new features or active maintenance.
44

5-
## Development Setup
5+
## Project Purpose
6+
The **Go File Processor** was created as a second learning project to explore Go's concurrency and streaming I/O. It remains available as a historical reference for:
7+
- Worker Pool implementations.
8+
- Channel-based pipelines.
9+
- Middleware design patterns in Go.
610

7-
1. **Requirements**: Go 1.22+ and `make`.
8-
2. **Clone**: `git clone https://github.com/ESousa97/go-file-processor.git`.
9-
3. **Tests**: Use `make test` to ensure everything is OK.
11+
## Exploring the Code
12+
You are welcome to fork this project to use as a template or to experiment with its features. Key areas of interest:
13+
- `internal/processor/csv_json.go`: The core engine using Worker Pools.
14+
- `internal/processor/transformer.go`: The implementation of the Middleware pattern.
15+
- `internal/processor/csv_json_bench_test.go`: Benchmarking logic to compare performance.
1016

11-
## Code Conventions
12-
13-
- Follow [Effective Go](https://golang.org/doc/effective_go.html).
14-
- Run `go fmt` before each commit.
15-
- All exported items must have professional Godoc comments in English.
16-
- Maintain extreme modularization: each file with a single responsibility.
17-
18-
## Pull Request Process
19-
20-
1. Create a descriptive branch (`feature/`, `fix/`, `perf/`).
21-
2. Ensure benchmarks haven't regressed via `make bench`.
22-
3. Update `CHANGELOG.md` in the `[Unreleased]` section.
23-
4. Request a code review.
24-
25-
## Areas for Contribution
26-
27-
- Support for new formats (XML, Avro).
28-
- Consumer optimization to further reduce serialization overhead.
29-
- CLI improvements (e.g., more detailed progress bar).
17+
## License
18+
The project remains under the **MIT License**, allowing you to use and modify it for your own purposes.

README.md

Lines changed: 34 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -12,25 +12,36 @@
1212
[![Go Reference](https://pkg.go.dev/badge/github.com/ESousa97/go-file-processor.svg)](https://pkg.go.dev/github.com/ESousa97/go-file-processor)
1313
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
1414
[![Go Version](https://img.shields.io/github/go-mod/go-version/ESousa97/go-file-processor)](https://github.com/ESousa97/go-file-processor)
15-
[![Last Commit](https://img.shields.io/github/last-commit/ESousa97/go-file-processor)](https://github.com/ESousa97/go-file-processor/commits/main)
15+
[![Last Commit](https://img.shields.io/github/last-commit/ESousa97/go-file-processor)](https://github.com/ESousa97/go-file-processor)
1616

1717
</div>
1818

1919
---
2020

21-
**Go File Processor** is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. Using the Worker Pool pattern and channel-based processing, it ensures optimized CPU usage and constant memory consumption, regardless of the input file size.
21+
> **Note: Archival Project**
22+
> This was my second major project in Go, built as a deep dive into the language's idiomatic concurrency patterns and high-performance I/O. It is now archived but serves as a solid reference for ETL (Extract, Transform, Load) implementations in Golang.
23+
24+
**Go File Processor** is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. It demonstrates the power of Go's concurrency primitives to achieve maximum throughput with minimal memory overhead.
25+
26+
## 🚀 Core Learning Objectives
27+
28+
This project was a hands-on laboratory to master several Go concepts:
29+
30+
* **Concurrency via Worker Pool:** Leveraging `goroutines` and `channels` to process data in parallel without overwhelming the system.
31+
* **Memory Efficiency (Streaming):** Using `io.Reader` and `io.Writer` to process gigabytes of data with a constant, tiny memory footprint.
32+
* **The Middleware Pattern:** Implementing a "Chain of Responsibility" for data transformation that is both flexible and type-safe.
33+
* **Atomic Operations:** Using `sync/atomic` for high-speed metrics tracking, avoiding the overhead of mutexes.
34+
* **Idiomatic Project Layout:** Following standard Go folder structures (`cmd/`, `internal/`) and build automation with `Makefile`.
2235

2336
## Demonstration
2437

2538
### As a Library
2639

27-
Add transformers and configure the execution pool fluently:
28-
2940
```go
3041
proc := processor.NewCSVToJSONProcessor()
3142
config := processor.Config{WorkerCount: 8}
3243

33-
// Add transformers (Chain of Responsibility)
44+
// Fluent transformation chain
3445
config.AddTransformer(processor.EmailFilter(`@company.com$`))
3546
config.AddTransformer(processor.FieldMasker("email"))
3647

@@ -39,118 +50,39 @@ metrics, err := proc.Process("input.csv", "output.json", config)
3950

4051
### As a CLI
4152

42-
Run massive processing with real-time metrics:
43-
4453
```bash
4554
./fileproc -input data.csv -output data.json -workers 4
4655
```
4756

48-
Output:
49-
50-
```text
51-
[INFO] Starting processing...
52-
[INFO] Progress: 100000 rows processed
53-
[SUMMARY] EXECUTION COMPLETED IN 1.2s
54-
- Total lines read: 100000
55-
- Successfully processed: 98500
56-
- Errors/Ignored: 1500
57-
```
57+
## Tech Stack & Architecture
5858

59-
## Tech Stack
60-
61-
| Technology | Role |
59+
| Technology | What I Learned |
6260
| ------------------- | ------------------------------------------------------------------- |
63-
| **Go 1.22+** | Core language with high-performance native concurrency |
64-
| **Worker Pool** | Parallelism management and load control |
65-
| **slog** | Structured logging for observability and traceability |
66-
| **Atomic Counters** | High-performance metrics collection without contention (lock-free) |
67-
| **Channels** | Secure and decoupled communication between Producer, Workers, and Consumer |
68-
69-
## Prerequisites
70-
71-
- **Go >= 1.22**
72-
- **Make** (for build automation and benchmarks)
73-
74-
## Installation and Usage
61+
| **Worker Pool** | How to orchestrate multiple goroutines for parallel work. |
62+
| **Channels** | Managing safe communication and backpressure between stages. |
63+
| **Streaming I/O** | Processing files record-by-record instead of loading to RAM. |
64+
| **Atomic Counters** | Implementing thread-safe counters with maximum performance. |
65+
| **Structured Logs** | Using `slog` for modern, machine-readable observability. |
7566

76-
### From Source
77-
78-
```bash
79-
git clone https://github.com/ESousa97/go-file-processor.git
80-
cd go-file-processor
81-
make build
82-
```
67+
### Pipeline Flow
8368

84-
### Data Generation and Benchmark
85-
86-
To validate performance with 100k+ row files:
87-
88-
```bash
89-
make generate-data
90-
make bench
91-
```
69+
The system uses a streaming model to maintain low memory usage:
70+
`Input CSV -> Producer -> Job Channel -> [Workers + Transformers] -> Result Channel -> Consumer -> Output JSON`
9271

9372
## Makefile Targets
9473

9574
| Target | Description |
9675
| -------------------- | --------------------------------------------------------- |
97-
| `make build` | Compiles the `fileproc` binary at the project root |
98-
| `make test` | Runs the unit test suite |
99-
| `make bench` | Runs performance comparisons (Sequential vs Parallel) |
100-
| `make generate-data` | Generates a massive test file (100,000 records) |
101-
| `make clean` | Removes binaries and temporary files |
102-
103-
## Architecture
104-
105-
The project uses a channel-based streaming model to process data without loading the entire file into memory.
106-
107-
```mermaid
108-
graph LR
109-
Input[CSV Input] --> Producer[Producer]
110-
Producer --> Jobs{Job Channel}
111-
Jobs --> W1[Worker 1]
112-
Jobs --> W2[Worker 2]
113-
Jobs --> WN[Worker N]
114-
W1 & W2 & WN --> Transformers[Transformation Layer]
115-
Transformers --> Results{Result Channel}
116-
Results --> Consumer[Consumer]
117-
Consumer --> Output[JSON Output]
118-
119-
subgraph "Worker Pool"
120-
W1
121-
W2
122-
WN
123-
end
124-
```
125-
126-
## API Reference
127-
128-
Detailed technical documentation available at [pkg.go.dev/github.com/ESousa97/go-file-processor](https://pkg.go.dev/github.com/ESousa97/go-file-processor).
76+
| `make build` | Compiles the `fileproc` binary. |
77+
| `make test` | Runs the full unit test suite. |
78+
| `make bench` | Runs benchmarks to see the speed of Parallel vs Sequential. |
79+
| `make generate-data` | Generates a 100k row test file for performance testing. |
12980

130-
## Configuration (CLI Flags)
81+
## 📚 Final Thoughts
13182

132-
| Flag | Description | Type | Default |
133-
| ---------- | --------------------------------- | -------- | ------------- |
134-
| `-input` | Input CSV file path | `string` | `input.csv` |
135-
| `-output` | Output JSON file path | `string` | `output.json` |
136-
| `-workers` | Number of concurrent workers | `int` | `4` |
83+
Building this project taught me that Go isn't just about syntax; it's about a philosophy of simplicity and performance. The transition from sequential processing to a parallel worker pool showed me how Go empowers developers to build tools that scale effortlessly.
13784

138-
## Roadmap
139-
140-
Follow the project's evolution stages:
141-
142-
- [x] **Phase 1: Foundation** — Worker Pool and streaming core implementation.
143-
- [x] **Phase 2: Transformation** — Middleware layer (Chain of Responsibility).
144-
- [x] **Phase 3: Observability** — Atomic metrics and structured logs (`slog`).
145-
- [x] **Phase 4: Governance** — CI/CD, Professional documentation, and Badges.
146-
147-
## Contributing
148-
149-
Interested in collaborating? Check our [CONTRIBUTING.md](CONTRIBUTING.md) for code standards and PR process.
150-
151-
## License
152-
153-
This project is licensed under the **MIT License** — see the [LICENSE](LICENSE) file for details.
85+
---
15486

15587
<div align="center">
15688

@@ -166,6 +98,6 @@ This project is licensed under the **MIT License** — see the [LICENSE](LICENSE
16698

16799
Made with ❤️ by [Enoque Sousa](https://github.com/ESousa97)
168100

169-
**Project Status:** ActiveConstantly updated
101+
**Project Status:** ArchivedEducational Milestone
170102

171103
</div>

docs/architecture.md

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# System Architecture
1+
# Historical Architecture Design
22

3-
This document details the architectural decisions and data flow of the **Go File Processor**.
3+
This document serves as a reference for the design decisions made during the development of the **Go File Processor**. This project was a study of Go's system architecture capabilities.
44

5-
## Data Flow (Pipeline)
5+
## The Streaming Pipeline
66

7-
The system uses a parallel streaming pipeline to ensure efficiency with massive files.
7+
The primary goal was to achieve high throughput with **constant memory usage**. We implemented a pipeline where data is processed in individual records, never loading the entire file into RAM.
88

99
```mermaid
1010
graph TD
@@ -24,24 +24,21 @@ graph TD
2424
end
2525
```
2626

27-
## Architectural Decisions (ADRs)
27+
## Core Architectural Lessons
2828

29-
### 1. Worker Pool Pattern
30-
**Context**: Processing millions of records via a single main loop would cause I/O blocking and CPU underutilization.
31-
**Decision**: Implement a pool of goroutines (Workers) that process records in parallel.
32-
**Consequence**: Significant throughput increase on multi-core systems.
29+
### 1. The Worker Pool Pattern
30+
**Learning Goal**: Understand how to scale processing by decoupling the producer from the consumers using channels.
31+
**Implementation**: A fixed number of goroutines (Workers) listen on a shared channel.
32+
**Outcome**: High CPU utilization across all cores without manual thread management.
3333

34-
### 2. Streaming vs Batching
35-
**Context**: Loading the entire file into memory (Full Read) can cause OOM (Out Of Memory) on files dozens of GBs in size.
36-
**Decision**: Process via `io.Reader` and `io.Writer`, keeping only the stream buffer in memory.
37-
**Consequence**: Constant RAM consumption (~20-50MB) regardless of file size.
34+
### 2. Backpressure Management
35+
**Learning Goal**: How to prevent the producer from overwhelming the consumer.
36+
**Implementation**: Using buffered channels as a "shock absorber" for data bursts.
3837

39-
### 3. Middleware for Transformations
40-
**Context**: Transformation/filter logic should be flexible and decoupled from the core Worker code.
41-
**Decision**: Use the "Chain of Responsibility" pattern via the `Transformer func(*User) bool` type.
42-
**Consequence**: Ease of adding new filters without changing the main worker loop.
38+
### 3. Decoupled Middleware
39+
**Learning Goal**: Implementing clean, pluggable logic using Go's function types.
40+
**Implementation**: Using `Transformer func(*User) bool` as a chain of responsibility.
4341

44-
### 4. Atomic Metrics
45-
**Context**: Multiple workers need to update success/error counters simultaneously. Mutexes could cause contention.
46-
**Decision**: Use `sync/atomic` for lock-free counting.
47-
**Consequence**: Maximum performance in high-concurrency scenarios.
42+
### 4. Lock-Free Metrics
43+
**Learning Goal**: Avoiding mutex contention in high-concurrency environments.
44+
**Implementation**: Using the `sync/atomic` package for thread-safe global counters.

0 commit comments

Comments
 (0)