Skip to content

Commit deb8d73

Browse files
authored
Add Normalize Command for Re-normalizing Scrobble Files
feat: Add normalize command with Azure Blob Storage support
2 parents 7920542 + 6a35a53 commit deb8d73

18 files changed

Lines changed: 3482 additions & 2 deletions

File tree

.github/copilot-instructions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Auto-generated from all feature plans. Last updated: 2026-01-06
77
- N/A (UI component only) (005-console-progress-bar)
88
- Go 1.24.0+ (alpine-based Docker build) (002-containerization-documentation)
99
- Go 1.24.0+ + `crypto/sha256`, `bufio.Scanner`, `encoding/json` (006-scrobble-dedup-merge)
10+
- Local filesystem and Azure Blob Storage (via existing `writer` abstraction) (007-normalize-command)
1011

1112
## Project Structure
1213

@@ -50,10 +51,9 @@ Go 1.24.0+ (006-scrobble-dedup-merge): Follow standard Go conventions
5051
- Table-driven tests for strategy variations
5152

5253
## Recent Changes
54+
- 007-normalize-command: Added Go 1.24.0+
5355
- 006-scrobble-dedup-merge: Added merge command for deduplicating and merging multiple NDJSON scrobble files. Uses in-memory hash map with SHA256 keys. Supports 4 deduplication strategies (default/strict/relaxed/mbid) and 3 conflict resolution modes (completeness/first/last). Includes checkpointing for resume capability. Performance targets: ≥10K scrobbles/sec, <500MB for 1M records. Reuses existing internal/writer, internal/progress, internal/models packages.
5456
- 005-console-progress-bar: Added Go 1.24.0+ + `github.com/schollz/progressbar/v3`, `golang.org/x/term`
55-
- 004-normalized-title-field: Adding `normalized_title` field to remove annotations (Live, Remastered, featuring, etc.) from track titles for better matching and grouping. Uses internal/normalize package with gopkg.in/yaml.v3 for configuration. DEBUG logging when titles modified.
56-
- 002-containerization-documentation: Added [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
5757

5858
<!-- MANUAL ADDITIONS START -->
5959
If you notice any systemic issues please add the needed requirements to this file or to the constitution if that is more appropriate.
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Feature Specification: Normalize Command
2+
3+
**Feature Branch**: `001-normalize-command`
4+
**Created**: 2026-01-08
5+
**Status**: Draft
6+
**Input**: User description: "Add a new normalize command to process all JSON files for a specified user and update the normalized_title field by reapplying normalization logic to the track field"
7+
8+
## User Scenarios & Testing *(mandatory)*
9+
10+
### User Story 1 - Re-normalize User's Scrobble Files (Priority: P1)
11+
12+
A data administrator needs to update the normalized_title field for all existing scrobble files after improvements or fixes have been made to the normalization logic. They want to ensure all historical data uses the latest normalization rules without having to re-fetch data from Last.fm.
13+
14+
**Why this priority**: Core functionality - enables retroactive application of normalization improvements to existing data, which is the primary purpose of the command.
15+
16+
**Independent Test**: Can be fully tested by running the normalize command on a user's existing files and verifying that normalized_title fields are updated correctly according to current normalization rules, delivering immediate value of consistent data normalization.
17+
18+
**Acceptance Scenarios**:
19+
20+
1. **Given** a user has 100 JSON files with scrobbles in local storage, **When** the administrator runs `./app normalize --user john_doe`, **Then** all 100 files are processed and normalized_title fields are updated based on current normalization rules
21+
2. **Given** a user has scrobble files in Azure Blob Storage, **When** the administrator runs `./app normalize --user jane_doe --azure-account myaccount --azure-container scrobbles`, **Then** all files in Azure storage are processed and updated with new normalized_title values
22+
3. **Given** some files already have correct normalized_title values, **When** the normalize command runs, **Then** only files with changed normalized_title values are updated, unchanged files are left as-is
23+
4. **Given** a file contains scrobbles where track field is "Track #1 - Some Title", **When** normalization is applied, **Then** normalized_title is updated to "track 1 some title" (lowercased, special characters removed)
24+
25+
---
26+
27+
### User Story 2 - Preview Changes Before Applying (Priority: P2)
28+
29+
A data administrator wants to see what changes would be made to normalized_title fields before actually modifying the files, to verify the normalization logic is working as expected and to estimate impact.
30+
31+
**Why this priority**: Important safety feature - allows verification before making bulk changes to data files.
32+
33+
**Independent Test**: Can be fully tested by running normalize command with --dry-run flag and confirming that preview output is shown but no files are modified, providing immediate value of safe verification.
34+
35+
**Acceptance Scenarios**:
36+
37+
1. **Given** a user has 50 files needing normalization updates, **When** the administrator runs `./app normalize --user john_doe --dry-run`, **Then** the system displays which files would be updated showing current and new normalized_title values, but does not write any changes
38+
2. **Given** files are in Azure storage, **When** the administrator runs normalize with --dry-run and Azure flags, **Then** preview is shown without modifying Azure storage
39+
3. **Given** dry-run mode is active, **When** processing completes, **Then** the summary clearly indicates "Dry-run mode: No changes written to storage"
40+
41+
---
42+
43+
### User Story 3 - Monitor Progress and Review Results (Priority: P3)
44+
45+
A data administrator processing hundreds of files wants to see real-time progress during processing and a comprehensive summary afterward to understand what was changed and identify any issues.
46+
47+
**Why this priority**: Enhances user experience - provides visibility and confidence during long-running operations.
48+
49+
**Independent Test**: Can be fully tested by running normalize on a large dataset and verifying progress indicators appear during execution and comprehensive summary is shown at completion.
50+
51+
**Acceptance Scenarios**:
52+
53+
1. **Given** processing 200 files, **When** the normalize command runs, **Then** progress is displayed showing which file is currently being processed
54+
2. **Given** processing completes successfully, **When** the command finishes, **Then** a summary shows total files processed, number updated, number unchanged, and any errors encountered
55+
3. **Given** 5 files fail to parse during processing, **When** the command completes, **Then** the error count is 5 and processing continues for remaining files
56+
4. **Given** processing a mix of files with and without changes, **When** the summary is displayed, **Then** it accurately categorizes files as "updated" or "unchanged"
57+
58+
---
59+
60+
### Edge Cases
61+
62+
- What happens when a file cannot be parsed (malformed JSON)?
63+
- What happens when a file is missing the track field?
64+
- What happens when no files exist for the specified user?
65+
- What happens when normalized_title already matches the newly calculated value?
66+
- What happens when storage permissions prevent reading or writing files?
67+
- What happens when the user specifies both local and Azure flags (conflicting storage targets)?
68+
- What happens when Azure credentials are invalid or the container doesn't exist?
69+
70+
## Requirements *(mandatory)*
71+
72+
### Functional Requirements
73+
74+
- **FR-001**: System MUST provide a `normalize` command that accepts `--user <username>` as a required argument
75+
- **FR-002**: System MUST support local storage mode when no Azure arguments are provided
76+
- **FR-003**: System MUST support Azure Blob Storage mode when Azure account and container arguments are provided
77+
- **FR-004**: System MUST locate all JSON/NDJSON files for the specified user in the determined storage location
78+
- **FR-005**: System MUST read each file, extract the `track` field, and apply existing normalization logic to generate a new `normalized_title` value
79+
- **FR-006**: System MUST update only the `normalized_title` field in each scrobble record, preserving all other fields unchanged
80+
- **FR-007**: System MUST write updated files back to the same storage location (local or Azure) unless dry-run mode is active
81+
- **FR-008**: System MUST support a `--dry-run` flag that shows what would change without modifying any files
82+
- **FR-009**: System MUST display real-time progress showing which file is currently being processed
83+
- **FR-010**: System MUST generate a summary report showing total files processed, number updated, number unchanged, and error count
84+
- **FR-011**: System MUST continue processing remaining files when individual files fail to parse or process
85+
- **FR-012**: System MUST report all errors encountered during processing in the summary
86+
- **FR-013**: System MUST clearly indicate in output when dry-run mode is active and no changes are written
87+
- **FR-014**: System MUST use the same Azure configuration pattern and argument names as existing fetch and merge commands
88+
- **FR-015**: System MUST use the same storage abstraction layer as existing commands for consistency
89+
- **FR-016**: System MUST handle files that already have correct normalized_title values by skipping updates for those files
90+
- **FR-017**: System MUST display both current and new normalized_title values during dry-run mode for files that would change
91+
- **FR-018**: System MUST validate that required user argument is provided and error appropriately if missing
92+
- **FR-019**: System MUST validate Azure configuration when Azure mode is used and error appropriately if incomplete or invalid
93+
94+
### Key Entities
95+
96+
- **Scrobble File**: Represents a JSON/NDJSON file containing scrobble records for a user, stored in either local filesystem or Azure Blob Storage
97+
- **Scrobble Record**: Individual listening event containing fields including track (original title) and normalized_title (processed title)
98+
- **Storage Location**: Either local filesystem or Azure Blob Storage container, determined by command-line arguments provided
99+
100+
## Success Criteria *(mandatory)*
101+
102+
### Measurable Outcomes
103+
104+
- **SC-001**: Administrator can process all files for a user in under 5 seconds per 1000 files
105+
- **SC-002**: System correctly identifies and updates 100% of files where normalized_title differs from newly calculated value
106+
- **SC-003**: Zero data loss - all fields except normalized_title remain unchanged after processing
107+
- **SC-004**: Dry-run mode produces accurate preview - 100% match between preview and actual changes when run without --dry-run
108+
- **SC-005**: System continues processing and completes successfully even when up to 10% of files encounter parsing errors
109+
- **SC-006**: Summary report provides complete accounting - sum of updated, unchanged, and error counts equals total files processed
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Specification Quality Checklist: Normalize Command
2+
3+
**Purpose**: Validate specification completeness and quality before proceeding to planning
4+
**Created**: 2026-01-08
5+
**Feature**: [../spec.md](../spec.md)
6+
7+
## Content Quality
8+
9+
- [x] No implementation details (languages, frameworks, APIs)
10+
- [x] Focused on user value and business needs
11+
- [x] Written for non-technical stakeholders
12+
- [x] All mandatory sections completed
13+
14+
## Requirement Completeness
15+
16+
- [x] No [NEEDS CLARIFICATION] markers remain
17+
- [x] Requirements are testable and unambiguous
18+
- [x] Success criteria are measurable
19+
- [x] Success criteria are technology-agnostic (no implementation details)
20+
- [x] All acceptance scenarios are defined
21+
- [x] Edge cases are identified
22+
- [x] Scope is clearly bounded
23+
- [x] Dependencies and assumptions identified
24+
25+
## Feature Readiness
26+
27+
- [x] All functional requirements have clear acceptance criteria
28+
- [x] User scenarios cover primary flows
29+
- [x] Feature meets measurable outcomes defined in Success Criteria
30+
- [x] No implementation details leak into specification
31+
32+
## Validation Results
33+
34+
### Content Quality Assessment
35+
**PASS**: Specification contains no implementation details about Go, command structure, or storage implementations. Focuses entirely on what the feature does from user perspective.
36+
37+
**PASS**: All content emphasizes user value (data consistency, verification, visibility) and business needs (retroactive normalization, safe bulk operations).
38+
39+
**PASS**: Language is accessible to non-technical stakeholders - describes scenarios in terms of data files, users, and business outcomes.
40+
41+
**PASS**: All mandatory sections (User Scenarios & Testing, Requirements, Success Criteria) are complete with detailed content.
42+
43+
### Requirement Completeness Assessment
44+
**PASS**: No [NEEDS CLARIFICATION] markers present - all requirements are fully specified with reasonable defaults assumed.
45+
46+
**PASS**: All 19 functional requirements are testable and unambiguous. Each FR specifies a concrete capability or behavior that can be verified.
47+
48+
**PASS**: All 6 success criteria include specific metrics (5 seconds per 1000 files, 100% accuracy, zero data loss, etc.).
49+
50+
**PASS**: Success criteria avoid implementation details - metrics focus on user-observable outcomes like processing speed and accuracy, not internal mechanisms.
51+
52+
**PASS**: Each user story includes 3-4 detailed acceptance scenarios in Given/When/Then format covering main flows and variations.
53+
54+
**PASS**: Edge cases section identifies 7 specific boundary conditions and error scenarios.
55+
56+
**PASS**: Scope is clearly bounded - command operates on existing files, updates only normalized_title field, supports local and Azure storage.
57+
58+
**PASS**: Assumptions are implicit but reasonable (reuse existing normalization logic, same storage patterns as fetch/merge, existing user files).
59+
60+
### Feature Readiness Assessment
61+
**PASS**: All 19 functional requirements map to acceptance scenarios in user stories (FR-001 to FR-019 covered).
62+
63+
**PASS**: Three prioritized user stories cover core functionality (P1), safety features (P2), and user experience (P3).
64+
65+
**PASS**: Success criteria align with feature goals - processing speed, accuracy, data integrity, error handling.
66+
67+
**PASS**: No implementation leakage - specification describes behavior and outcomes without prescribing technical solutions.
68+
69+
## Summary
70+
71+
**Status**: ✅ READY FOR PLANNING
72+
73+
All checklist items passed validation. The specification is complete, unambiguous, and focused on user value. No implementation details are present. The feature is ready to proceed to `/speckit.plan` or `/speckit.clarify` phases.
74+
75+
## Notes
76+
77+
- Specification assumes reuse of existing normalization logic from internal/normalize package (reasonable assumption based on project context)
78+
- Azure storage integration assumes same patterns as fetch/merge commands (explicitly stated in requirements)
79+
- No clarifications needed - all requirements have reasonable defaults based on existing application patterns
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# CLI Contract: Normalize Command
2+
3+
**Feature**: 007-normalize-command
4+
**Date**: 2026-01-08
5+
**Type**: Command-Line Interface
6+
7+
## Command Signature
8+
9+
```bash
10+
lastfm-sync normalize --user <username> [options]
11+
```
12+
13+
## Required Arguments
14+
15+
| Argument | Type | Description | Validation |
16+
|----------|------|-------------|------------|
17+
| `--user` | string | Username whose files to process | MUST be non-empty (FR-018) |
18+
19+
## Optional Arguments - Storage Selection
20+
21+
### Local Storage (default)
22+
No additional arguments required. Uses current working directory or configured base path.
23+
24+
### Azure Blob Storage
25+
When ANY Azure argument is provided, Azure mode is activated (FR-003).
26+
27+
| Argument | Type | Description | Validation |
28+
|----------|------|-------------|------------|
29+
| `--azure-account` | string | Azure storage account name | Required when Azure mode active (FR-019) |
30+
| `--azure-container` | string | Azure container name | Required when Azure mode active (FR-019) |
31+
| `--azure-prefix` | string | Blob prefix for file discovery | Optional, default: "" |
32+
| `--azure-auth` | string | Authentication method: "key", "sas", "default" | Optional, default: "default" |
33+
| `--azure-account-key` | string | Storage account key (if auth=key) | Required when auth=key |
34+
| `--azure-sas-token` | string | SAS token (if auth=sas) | Required when auth=sas |
35+
36+
**Validation**: Azure mode requires `--azure-account` AND `--azure-container` at minimum (FR-019).
37+
38+
## Optional Arguments - Behavior
39+
40+
| Argument | Type | Description | Default |
41+
|----------|------|-------------|---------|
42+
| `--dry-run` | boolean | Preview changes without writing | false |
43+
| `--log-level` | string | Logging level: debug, info, warn, error | info |
44+
45+
## Output Format
46+
47+
### Progress Display (stdout)
48+
49+
Per-file progress (FR-009):
50+
```
51+
Processing files for user: {username}
52+
Storage: {Local: /path/to/data | Azure: account/container}
53+
54+
Processing: episode_001.ndjson [1/150]
55+
Current: "Track #1 - Some Title"
56+
New: "track 1 some title"
57+
Status: {Updated | No change needed}
58+
59+
Processing: episode_002.ndjson [2/150]
60+
Status: No change needed
61+
62+
...
63+
```
64+
65+
### Summary Report (stdout)
66+
67+
```
68+
Summary:
69+
Total files: 150
70+
Updated: 45
71+
Unchanged: 105
72+
Errors: 0
73+
Duration: 2.3s
74+
75+
{if --dry-run}
76+
Dry-run mode: No changes written to storage
77+
{/if}
78+
79+
{if errors}
80+
Errors encountered:
81+
- username_042.ndjson: parse error
82+
- username_099.ndjson: permission denied
83+
{/if}
84+
```
85+
86+
### Error Output (stderr)
87+
88+
Individual file errors logged at DEBUG level, summary errors at INFO level.
89+
90+
## Exit Codes
91+
92+
| Code | Meaning | Scenario |
93+
|------|---------|----------|
94+
| 0 | Success | All files processed (some may have errors but processing completed) |
95+
| 1 | General error | Unexpected error during command execution |
96+
| 2 | Validation error | Missing required arguments or invalid configuration (FR-018, FR-019) |
97+
98+
**Note**: File-level errors (parse errors, missing fields) do NOT cause non-zero exit code per FR-011 (continue processing). Only configuration/validation errors exit early.
99+
100+
## Examples
101+
102+
### Local Storage
103+
104+
```bash
105+
# Basic usage - normalize all files for user
106+
./lastfm-sync normalize --user john_doe
107+
108+
# Dry-run preview
109+
./lastfm-sync normalize --user john_doe --dry-run
110+
111+
# With debug logging
112+
./lastfm-sync normalize --user john_doe --log-level debug
113+
```
114+
115+
### Azure Storage
116+
117+
```bash
118+
# Azure with default authentication (managed identity/Azure CLI)
119+
./lastfm-sync normalize --user jane_doe \
120+
--azure-account myaccount \
121+
--azure-container scrobbles
122+
123+
# Azure with account key
124+
./lastfm-sync normalize --user jane_doe \
125+
--azure-account myaccount \
126+
--azure-container scrobbles \
127+
--azure-auth key \
128+
--azure-account-key "abc123..."
129+
130+
# Azure dry-run
131+
./lastfm-sync normalize --user jane_doe \
132+
--azure-account myaccount \
133+
--azure-container scrobbles \
134+
--dry-run
135+
```
136+
137+
## File Discovery Pattern
138+
139+
Matching pattern: `{username}_*.ndjson` (FR-004)
140+
141+
### Local Storage
142+
- Search in configured base directory
143+
- Example: If base is `/data`, search `/data/john_doe_*.ndjson`
144+
- Recursive search NOT performed (flat directory expected)
145+
146+
### Azure Storage
147+
- List blobs with prefix `{azure-prefix}{username}_`
148+
- Example: If prefix is `lastfm/`, search blobs like `lastfm/john_doe_*.ndjson`
149+
- Blob name extraction: Use blob name as displayed filename
150+
151+
## Alignment with Existing Commands
152+
153+
**Follows patterns from**:
154+
- `fetch` command: Azure argument structure, authentication modes
155+
- `merge` command: Progress display, NDJSON processing, summary format
156+
157+
**Consistency**:
158+
- Same Azure flag names and behavior (FR-014)
159+
- Same logging configuration
160+
- Same progress library usage
161+
- Same error handling patterns
162+
163+
## Non-Functional Contracts
164+
165+
- **Performance**: Process ≥1000 files in under 5 seconds (SC-001)
166+
- **Memory**: Streaming processing, O(1) memory per file
167+
- **Reliability**: Continue on individual file errors (FR-011, SC-005)
168+
- **Safety**: Dry-run produces accurate preview (SC-004)
169+
- **Idempotency**: Running multiple times produces same result (normalized_title stabilizes)

0 commit comments

Comments
 (0)