Skip to content

Commit 51dec3e

Browse files
committed
Update changelog, README, and documentation to include console progress bar and title normalization features
1 parent cc449e8 commit 51dec3e

4 files changed

Lines changed: 200 additions & 8 deletions

File tree

CHANGELOG.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,47 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added - Feature 005: Console Progress Bar
11+
12+
- **Progress Bar Implementation** (`internal/progress/`)
13+
- Real-time progress tracking with visual feedback using `github.com/schollz/progressbar/v3`
14+
- ETA calculation based on actual throughput
15+
- Speed metrics (scrobbles per second)
16+
- Current/total scrobble counts with percentage
17+
- Automatic terminal width detection
18+
- Thread-safe progress updates for concurrent operations
19+
- Configurable via `--no-progress` flag or `LASTFM_NO_PROGRESS` environment variable
20+
- Graceful degradation for non-interactive terminals (CI/CD pipelines)
21+
22+
- **Progress Reporting**
23+
- Factory pattern for creating progress bars vs no-op implementations
24+
- `ProgressReporter` interface for flexible progress tracking
25+
- Progress state tracking (completed, in-progress, pending)
26+
- Terminal capability detection using `golang.org/x/term`
27+
- Comprehensive test coverage including terminal emulation tests
28+
29+
### Added - Feature 004: Normalized Title Field
30+
31+
- **Title Normalization** (`internal/normalize/`)
32+
- Automatic removal of track title annotations for better data quality
33+
- Pattern-based removal of:
34+
- Remaster/Remastered annotations (e.g., "2009 Remaster", "Remastered 2015")
35+
- Live performance markers (e.g., "Live at Wembley", "Live 1969")
36+
- Version/edit labels (e.g., "Radio Edit", "Extended Version")
37+
- Date/year markers in parentheses or brackets
38+
- Remix labels (e.g., "Dave's Remix")
39+
- Featuring/collaboration markers (e.g., "feat. Artist", "with Orchestra")
40+
- Thread-safe global feature flag for enabling/disabling normalization
41+
- Preserves original title in `track` field while adding clean `normalized_title` field
42+
- Configurable minimum length protection to avoid over-aggressive cleaning
43+
- DEBUG-level logging for title changes (when logger provided)
44+
- YAML-based configuration for customizable patterns (`internal/normalize/patterns.go`)
45+
46+
- **Scrobble Model Enhancement** (`internal/models/scrobble.go`)
47+
- Added `normalized_title` field to all NDJSON output
48+
- Automatic normalization in `NewScrobble` constructor
49+
- Original title preserved for audit trail and debugging
50+
1051
### Added - Feature 002: Containerization & Documentation
1152

1253
- **Configuration Documentation** (`docs/configuration.md`)
@@ -86,6 +127,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
86127
- DefaultAzureCredential (recommended for Azure VMs/AKS)
87128
- Managed Identity (workload identity)
88129
- Connection string
130+
- Account key (SharedKeyCredential)
89131
- SAS token
90132
- Time-partitioned blob paths for efficient data organization
91133
- Separate watermark blob storage for state management

README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@
1212
**Rate Limit Compliant**: 3 QPS throttling with Retry-After header support
1313
**Production Ready**: Exponential backoff, timeout handling, structured logging
1414
**Dry-Run Mode**: Test configurations without consuming API quota
15-
**Azure Integration**: DefaultAzureCredential, managed identity, SAS tokens, connection strings
15+
**Azure Integration**: DefaultAzureCredential, managed identity, SAS tokens, connection strings, account keys
1616
**Secret Redaction**: Automatic credential masking in logs
17-
**NDJSON Output**: Newline-delimited JSON for easy streaming and processing
17+
**NDJSON Output**: Newline-delimited JSON for easy streaming and processing
18+
**Console Progress Bar**: Real-time progress tracking with ETA, speed, and scrobble count
19+
**Title Normalization**: Automatic removal of annotations (Live, Remastered, Featuring) for better data quality
1820

1921
## Quick Start
2022

@@ -148,8 +150,9 @@ Flags:
148150
Azure Options:
149151
--azure-container string Azure container name
150152
--azure-prefix string Azure blob prefix (default: "lastfm/")
151-
--azure-auth string Auth method: default, mi, connstr, sas (default: "default")
153+
--azure-auth string Auth method: default, mi, connstr, key, sas (default: "default")
152154
--azure-account string Azure storage account name
155+
--azure-account-key string Azure storage account key (for key auth)
153156
--azure-container-url string Full container URL (for SAS)
154157

155158
Watermark Options:
@@ -162,6 +165,7 @@ Flags:
162165

163166
Other Options:
164167
--dry-run Preview only, no API calls or writes
168+
--no-progress Disable progress bar (useful for CI/CD)
165169
--log-level string Log level: info or debug (default: "info")
166170
-h, --help Help for fetch
167171
```
@@ -231,6 +235,17 @@ lastfm-sync fetch \
231235
--azure-auth mi
232236
```
233237
238+
**Using account key:**
239+
```bash
240+
lastfm-sync fetch \
241+
--user alice \
242+
--output azure \
243+
--azure-container scrobbles \
244+
--azure-account mystorageaccount \
245+
--azure-account-key "your-account-key" \
246+
--azure-auth key
247+
```
248+
234249
### Advanced Usage
235250
236251
**Debug logging:**

docs/azure-deployment.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,85 @@ az network nsg rule create \
459459

460460
---
461461

462+
## Alternative Authentication Methods
463+
464+
While Managed Identity is recommended for production, the tool supports multiple authentication methods:
465+
466+
### Connection String Authentication
467+
468+
Use connection string for development or when managed identity is not available:
469+
470+
```bash
471+
# Get connection string
472+
CONN_STR=$(az storage account show-connection-string \
473+
--name lastfmstorage \
474+
--resource-group lastfm-rg \
475+
--query connectionString -o tsv)
476+
477+
# Deploy with connection string
478+
az container create \
479+
--resource-group lastfm-rg \
480+
--name lastfm-sync \
481+
--image lastfm-sync:latest \
482+
--environment-variables \
483+
LASTFM_API_KEY="$LASTFM_API_KEY" \
484+
AZURE_STORAGE_CONNECTION_STRING="$CONN_STR" \
485+
--command-line "/app/lastfm-sync fetch --user alice --output azure --azure-container scrobbles --azure-auth connstr"
486+
```
487+
488+
### Account Key Authentication
489+
490+
Use storage account key directly:
491+
492+
```bash
493+
# Get storage account key
494+
STORAGE_KEY=$(az storage account keys list \
495+
--resource-group lastfm-rg \
496+
--account-name lastfmstorage \
497+
--query "[0].value" -o tsv)
498+
499+
# Deploy with account key
500+
az container create \
501+
--resource-group lastfm-rg \
502+
--name lastfm-sync \
503+
--image lastfm-sync:latest \
504+
--environment-variables \
505+
LASTFM_API_KEY="$LASTFM_API_KEY" \
506+
AZURE_STORAGE_ACCOUNT=lastfmstorage \
507+
AZURE_STORAGE_ACCOUNT_KEY="$STORAGE_KEY" \
508+
--command-line "/app/lastfm-sync fetch --user alice --output azure --azure-container scrobbles --azure-auth key"
509+
```
510+
511+
### SAS Token Authentication
512+
513+
Use SAS token for time-limited access:
514+
515+
```bash
516+
# Generate SAS token (valid for 24 hours)
517+
SAS_TOKEN=$(az storage container generate-sas \
518+
--account-name lastfmstorage \
519+
--name scrobbles \
520+
--permissions rwdl \
521+
--expiry $(date -u -d "24 hours" '+%Y-%m-%dT%H:%MZ') \
522+
--output tsv)
523+
524+
# Construct container URL with SAS token
525+
CONTAINER_URL="https://lastfmstorage.blob.core.windows.net/scrobbles?$SAS_TOKEN"
526+
527+
# Deploy with SAS token
528+
az container create \
529+
--resource-group lastfm-rg \
530+
--name lastfm-sync \
531+
--image lastfm-sync:latest \
532+
--environment-variables \
533+
LASTFM_API_KEY="$LASTFM_API_KEY" \
534+
--command-line "/app/lastfm-sync fetch --user alice --output azure --azure-container-url \"$CONTAINER_URL\" --azure-auth sas"
535+
```
536+
537+
> **Security Note**: For production deployments, prefer Managed Identity (`--azure-auth mi` or `--azure-auth default`) over connection strings, account keys, or SAS tokens. These credential-based methods should only be used for development or when Managed Identity is not available.
538+
539+
---
540+
462541
## Persistent Storage
463542

464543
### Azure File Share (for local output mode)

docs/configuration.md

Lines changed: 61 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,19 @@ For more complex setups, see the [Environment Variables](#environment-variables)
4646
| `LASTFM_LOG_LEVEL` | string | `info` | Logging level. Options: `info`, `debug`. |
4747
| `LASTFM_STATE` | string | `~/.lastfm` | Directory for storing state (watermarks, local output files). |
4848
| `LASTFM_CONFIG` | string | none | Path to config file. If not set, searches `./`, `~/.lastfm/`, `/etc/lastfm/` for `config.yaml`. |
49+
| `LASTFM_NO_PROGRESS` | boolean | `false` | Disable console progress bar. Useful for CI/CD pipelines or when redirecting output. |
4950

5051
### Azure Storage Variables
5152

5253
Required only when using `--output azure`:
5354

5455
| Variable | Type | Default | Description |
5556
|----------|------|---------|-------------|
56-
| `AZURE_STORAGE_ACCOUNT` | string | none | Azure Storage account name. Used with `--azure-auth default` or `--azure-auth mi`. |
57+
| `AZURE_STORAGE_ACCOUNT` | string | none | Azure Storage account name. Used with `--azure-auth default`, `--azure-auth mi`, or `--azure-auth key`. |
5758
| `AZURE_STORAGE_CONNECTION_STRING` | string | none | **Sensitive**. Full Azure Storage connection string. Used with `--azure-auth connstr`. |
59+
| `AZURE_STORAGE_ACCOUNT_KEY` | string | none | **Sensitive**. Azure Storage account key. Used with `--azure-auth key`. |
5860

59-
> **Security Note**: Never commit `AZURE_STORAGE_CONNECTION_STRING` to version control. Use environment variables or Azure Key Vault.
61+
> **Security Note**: Never commit `AZURE_STORAGE_CONNECTION_STRING` or `AZURE_STORAGE_ACCOUNT_KEY` to version control. Use environment variables or Azure Key Vault.
6062
6163
---
6264

@@ -95,16 +97,18 @@ Required when `--output azure`:
9597
| Flag | Type | Default | Description |
9698
|------|------|---------|-------------|
9799
| `--azure-container` | string | none | **Required**. Azure Blob Storage container name. |
98-
| `--azure-account` | string | `$AZURE_STORAGE_ACCOUNT` | Storage account name (alternative to connection string). |
99-
| `--azure-prefix` | string | `lastfm/` | Blob prefix (folder path). Blobs written to `{prefix}{user}/{year}/{month}/{timestamp}.ndjson`. |
100-
| `--azure-auth` | string | `default` | Authentication method. Options: `default` (DefaultAzureCredential), `mi` (Managed Identity), `connstr` (Connection String), `sas` (SAS Token). |
100+
| `--azure-account` | string | `$AZURE_STORAGE_ACCOUNT` | Storage account name (required for `default`, `mi`, and `key` auth). |
101+
| `--azure-prefix` | string | `lastfm/` | Blob prefix (folder path). Blobs written to `{prefix}dt=YYYY-MM-DD/{user}-YYYYMMDD-HHMMSS.ndjson`. |
102+
| `--azure-auth` | string | `default` | Authentication method. Options: `default`, `mi`, `connstr`, `key`, `sas`. |
103+
| `--azure-account-key` | string | none | **Sensitive**. Storage account key (required for `--azure-auth key`). |
101104
| `--azure-container-url` | string | none | Full container URL with SAS token (for `--azure-auth sas`). |
102105

103106
**Azure Authentication Methods**:
104107

105108
- **`default`** (recommended for Azure VMs/AKS): Uses Azure SDK's DefaultAzureCredential chain (tries Managed Identity, Azure CLI, Environment, etc.).
106109
- **`mi`**: Explicitly uses Managed Identity (Azure VMs, AKS, Container Instances with identity).
107110
- **`connstr`**: Uses connection string from `AZURE_STORAGE_CONNECTION_STRING` environment variable.
111+
- **`key`**: Uses storage account key from `--azure-account-key` flag or `AZURE_STORAGE_ACCOUNT_KEY` environment variable. Requires `--azure-account`.
108112
- **`sas`**: Uses SAS token embedded in `--azure-container-url`.
109113

110114
#### Watermark Storage
@@ -126,6 +130,7 @@ Required when `--output azure`:
126130
| Flag | Type | Default | Description |
127131
|------|------|---------|-------------|
128132
| `--log-level` | string | `info` | Logging verbosity. Options: `info`, `debug`. |
133+
| `--no-progress` | boolean | `false` | Disable console progress bar. Useful for CI/CD pipelines or when redirecting output. |
129134
| `--dry-run` | boolean | `false` | Preview mode. Fetches data but doesn't write to storage or update watermarks. |
130135

131136
---
@@ -292,6 +297,20 @@ lastfm-sync fetch --user alice \
292297
--azure-auth mi
293298
```
294299

300+
### Azure with Account Key
301+
302+
```bash
303+
export LASTFM_API_KEY="your-api-key"
304+
export AZURE_STORAGE_ACCOUNT_KEY="your-account-key"
305+
lastfm-sync fetch --user alice \
306+
--output azure \
307+
--azure-container scrobbles \
308+
--azure-account mystorageaccount \
309+
--azure-auth key
310+
# Or pass key directly via flag:
311+
# --azure-account-key "your-account-key"
312+
```
313+
295314
### Time Range Fetch
296315

297316
```bash
@@ -320,6 +339,43 @@ lastfm-sync fetch --user alice --log-level debug
320339

321340
---
322341

342+
## Output Format
343+
344+
### NDJSON Structure
345+
346+
Each scrobble is written as a single line of JSON with the following fields:
347+
348+
| Field | Type | Description |
349+
|-------|------|-------------|
350+
| `username` | string | Last.fm username |
351+
| `artist` | string | Artist name |
352+
| `track` | string | Original track name from Last.fm (may include annotations) |
353+
| `normalized_title` | string | Clean track title with annotations removed (Live, Remastered, featuring, etc.) |
354+
| `album` | string | Album name |
355+
| `uts` | int64 | Unix timestamp (seconds since epoch) |
356+
| `local_time` | string | Human-readable UTC timestamp (RFC3339 format) |
357+
| `mbid` | string | MusicBrainz ID (omitted if null) |
358+
| `source` | string | Data source (always "lastfm") |
359+
| `ingested_at` | string | UTC timestamp when record was created (RFC3339) |
360+
| `raw` | object | Original Last.fm API response for debugging |
361+
362+
**Example:**
363+
```json
364+
{"username":"alice","artist":"The Beatles","track":"Come Together - Remastered","normalized_title":"Come Together","album":"Abbey Road","uts":1704067200,"local_time":"2024-01-01T00:00:00Z","source":"lastfm","ingested_at":"2024-01-06T14:30:22Z","raw":{}}
365+
```
366+
367+
**Note on normalized_title**: This field is automatically generated by removing common annotations like:
368+
- Remaster/Remastered annotations (e.g., "2009 Remaster", "Remastered 2015")
369+
- Live performance markers (e.g., "Live at Wembley", "Live 1969")
370+
- Version/edit labels (e.g., "Radio Edit", "Extended Version")
371+
- Date/year markers in parentheses or brackets
372+
- Remix labels (e.g., "Dave's Remix")
373+
- Featuring/collaboration markers (e.g., "feat. Artist", "with Orchestra")
374+
375+
This normalization improves data quality for aggregation and matching while preserving the original title in the `track` field.
376+
377+
---
378+
323379
## Security Best Practices
324380

325381
1. **Never commit secrets to version control**:

0 commit comments

Comments
 (0)