Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is `mapper-icij`, a Python script that converts ICIJ Offshore Leaks database files (Panama Papers, Paradise Papers, Bahamas Leaks, Offshore Leaks, Pandora Papers) to JSON format for loading into Senzing entity resolution.

## Development Setup

```bash
# Create virtual environment and install all dependencies
python -m venv ./venv
source ./venv/bin/activate
python -m pip install --upgrade pip
python -m pip install --group all .

# External dependency: mapper-base must be accessible
export PYTHONPATH=$PYTHONPATH:/path/to/mapper-base
```

## Common Commands

```bash
# Lint (matches CI workflow)
pylint $(git ls-files '*.py' ':!:docs/source/*')

# Other linting tools available
black src/
isort src/
flake8 src/
mypy src/
bandit -c pyproject.toml -r src/
```

## Running the Mapper

```bash
python src/icij_mapper.py -i /path/to/csv/files -o output.json [-l stats.json] [-a]
```

Required input CSV files: `nodes-entities.csv`, `nodes-intermediaries.csv`, `nodes-officers.csv`, `nodes-addresses.csv`, `nodes-others.csv`, `relationships.csv`

## Architecture

The mapper (`src/icij_mapper.py`) follows a single-script design:

1. Loads ICIJ CSV files into a temporary SQLite database for efficient querying
2. Creates SQL views to join node tables with relationships (edges)
3. Processes each node type (entity, intermediary, officer, address, other) sequentially
4. Outputs JSON lines format with Senzing entity resolution attributes

Key external dependency: Requires `base_mapper` from [mapper-base](https://github.com/Senzing/mapper-base) for company name detection and variant handling.

## Code Style

- Line length: 120 characters (black, flake8)
- Import sorting: isort with black profile
- See `pyproject.toml` for tool configurations and `.pylintrc` for additional pylint settings
3 changes: 0 additions & 3 deletions .claude/commands/senzing-code-review.md

This file was deleted.

3 changes: 3 additions & 0 deletions .claude/commands/senzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Senzing

- Perform the steps specified by <https://raw.githubusercontent.com/senzing-factory/claude/refs/tags/v1/commands/senzing.md>
File renamed without changes.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Default code owner
# Default code owner

* @Senzing/senzing-mappers

Expand Down
16 changes: 10 additions & 6 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@

version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
- package-ecosystem: github-actions
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
- package-ecosystem: "pip"
directory: "/"
interval: daily
- package-ecosystem: pip
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
interval: daily
2 changes: 1 addition & 1 deletion .github/workflows/add-labels-standardized.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add labels standardized
name: Add labels standardized

on:
issues:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing-dependabot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization dependabot
name: Add to project senzing github organization dependabot

on:
pull_request:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization
name: Add to project senzing github organization

on:
issues:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/claude-pr-review.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: Claude PR Review

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

on:
pull_request:
types: [opened, synchronize]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/dependabot-approve-and-merge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/lint-workflows.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
name: lint workflows
name: Lint workflows

on:
push:
branches-ignore: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/move-pr-to-done-dependabot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: move pr to done dependabot
name: Move pr to done dependabot

on:
pull_request:
Expand Down
17 changes: 12 additions & 5 deletions .github/workflows/pylint.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: pylint
name: Pylint

on: [push]
on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

Expand All @@ -12,8 +18,10 @@ jobs:
contents: read
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
timeout-minutes: 10

steps:
- name: Checkout repository
Expand All @@ -32,8 +40,7 @@ jobs:
source ./venv/bin/activate
echo "PATH=${PATH}" >> "${GITHUB_ENV}"
python -m pip install --upgrade pip
python -m pip install --requirement development-requirements.txt
python -m pip install --requirement requirements.txt
python -m pip install --group all .

- name: Analysing the code with pylint
run: |
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/spellcheck.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: spellcheck
name: Spellcheck

on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
31 changes: 25 additions & 6 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,46 @@
"version": "0.2",
"language": "en",
"words": [
"CCLA",
"CODEOWNER",
"ICLA",
"PYTHONPATH",
"Senzing",
"analysing",
"applehelp",
"argparser",
"autodoc",
"autodocsumm",
"bugtracker",
"CCLA",
"CODEOWNER",
"cooldown",
"devhelp",
"esbenp",
"htmlhelp",
"icij",
"ICLA",
"isort",
"jquery",
"jsmath",
"kwargs",
"mypy",
"oldb",
"psutil",
"pylint",
"pylintrc",
"pyproject",
"pytest",
"PYTHONPATH",
"qthelp",
"remoteliteralinclude",
"Senzing",
"serializinghtml",
"setuptools",
"shellcheck",
"sphinxcontrib",
"sphinxext",
"stackoverflow",
"statpack",
"subrecord",
"venv"
"typehints",
"venv",
"virtualenv"
],
"ignorePaths": [
".git/**",
Expand Down
9 changes: 6 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
[markdownlint](https://dlaa.me/markdownlint/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The changelog format is based on [Keep a Changelog] and [CommonMark].
This project adheres to [Semantic Versioning].

## [1.0.0] - yyyy-mm-dd

### Added to 1.0.0

- Initial content

[CommonMark]: https://commonmark.org/
[Keep a Changelog]: https://keepachangelog.com/
[Semantic Versioning]: https://semver.org/
Loading