Skip to content

feat(t2): Phase 3 legal_representatives extraction for LV#134

Open
petterlindstrom79 wants to merge 4 commits into
feat/phase-1-class-a-relabel-fr-sk-ukfrom
feat/phase-3-extraction-lv
Open

feat(t2): Phase 3 legal_representatives extraction for LV#134
petterlindstrom79 wants to merge 4 commits into
feat/phase-1-class-a-relabel-fr-sk-ukfrom
feat/phase-3-extraction-lv

Conversation

@petterlindstrom79
Copy link
Copy Markdown
Member

@petterlindstrom79 petterlindstrom79 commented May 18, 2026

Summary

Phase 3 of the legal_representatives extraction sweep — Latvia.

Extracts directors/officers from data.gov.lv amatpersonas open dataset (CKAN resource e665114a-73c2-4375-9470-55874b4cfa6b) and surfaces them as the canonical legal_representatives[] array on latvian-company-data output. Flips tier_2_available: true when the upstream returns at least one active officer.

Free, no auth, no infra. A second CKAN datastore_search call alongside the existing entities-master lookup. The officer FK column at_legal_entity_registration_number is numeric, so the JSON filter value is sent unquoted (see comment in fetchOfficers).

Role normalization: stable English enum (board_chair, board_member, council_member, council_chair, procurist, liquidator). Unknown LV codes pass through verbatim. Each entry carries name, role, start_date, rights_of_representation, representation_with_at_least, entity_type.

Honest coverage caveat: the amatpersonas dataset is a current-active-officers snapshot only — resignations and historical entries are not exposed via this resource. Disclosed in tier_2_available_reason.

Scope and dependencies

This PR is stacked on #133 (Phase 1 Class A relabel for FR/SK/UK/SE). PR base is feat/phase-1-class-a-relabel-fr-sk-uk. When #133 merges, this PR auto-rebases to main. The diff shown here is the LV delta only (2 files).

BE deferred from this phase

Source research confirmed no free path exists for BE officer data:

  • cbeapi.be: no officer endpoint
  • KBO Open Data CSV bulk: 8 files, function table explicitly excluded
  • KBO Public Search Web Service SOAP: paid (€50 per 2k requests, ~7-day onboarding with Belgian bank transfer)

Recommend deferring BE to a "Paid Tier-2 vendor onboarding" phase that handles DEC-20260428-A vetting + budget approval.

Verification

  • tsc --noEmit clean
  • validate-capability --slug latvian-company-data — 19/20 (single Gate 5 failure is pre-existing on main: task / company_name entry points lack fixture coverage; not introduced by this change)
  • smoke-test --slug latvian-company-data — 11/11 steps green, live execution returns 24 fields in 1272 ms
  • Live spot-checks on airBaltic / Latvenergo / 40003020121 fallback return officers with normalized roles, signing authority, and dates
  • /go six-lens review: 0 HIGH, 4 MEDIUM (see below), 1 LOW (pre-existing)

Reviewer findings

Applied this session (1):

  • Pass B.2: diacritics inconsistency in the populated-officers reason string fixed in commit f623d73 — registry name now spelled Uzņēmumu reģistrs consistently with the provenance attribution and file docstring.

Flagged for follow-up (3, none ship-blocking):

  • Pass A.1 — Raw fetch vs safeFetch policy gap (MEDIUM). callDatastore uses raw fetch against a hardcoded LV_DATASTORE_API constant. No exploitable SSRF path today since user input only flows into a query-string param. The smell is that if callDatastore is later extended to accept a caller-supplied base URL, the missing safeFetch wrapper becomes live. Flagging so the next person sees the conscious choice.

  • Pass A.2 — Sequential awaits on name-lookup path (MEDIUM). For name queries: entity record fetched first (15s timeout), then officers (15s timeout) — worst case 30s before handler returns. The 10s DEC-22 sync threshold means name-lookup queries flip to async more often than regcode queries. Genuine dependency (officers fetched by regcode derived from the entity record), so the serial order is correct — but the behavior is worth documenting so the async flip is expected, not treated as a degradation signal.

  • Pass B.1 — tier_2_available semantic ambiguity (MEDIUM). The field is set true when officers count > 0 and false (with explanatory reason) when officers count = 0. An empty officers list is a valid registry state (newly registered entity, all officers resigned). The current boolean conflates data-availability with entity-state. Note: this is consistent with the canonical contract already shipped on UK/SK/FR in Phases 1+2, so changing the shape here would create cross-handler inconsistency. Worth a platform-wide follow-up (legal_representatives_available or officers_known_present: boolean | null) — not appropriate to fork the shape in a single-country PR.

LOW (pre-existing, not introduced here):

  • Indentation inconsistency in the Tier 1 alias-injection block (lines 241–245). Cosmetic artifact from the parent Phase 1 commit c2e4974; lives in this file but originated upstream. Out of scope.

Cross-repo

No frontend changes required. legal_representatives shape matches what UK/FR/SK already emit; the additional LV-specific keys (rights_of_representation, representation_with_at_least, entity_type) are additive.

Refs DEC-20260518-A, DEC-20260518-D.

Test plan

  • TypeScript check clean
  • Live smoke test against three Latvian entities returns expected officer counts and shapes
  • tier_2_available correctly toggles based on officer count
  • No scraping; pure CKAN datastore_search; license CC0 1.0 preserved in provenance
  • /go six-lens review applied

🤖 Generated with Claude Code

Extract directors/officers from data.gov.lv `amatpersonas` open dataset
(CKAN resource e665114a-73c2-4375-9470-55874b4cfa6b) and surface them as
the canonical legal_representatives[] array on latvian-company-data
output. Flips tier_2_available true when the upstream returns at least
one active officer.

Free, real-time CKAN datastore_search call alongside the existing
entities-master lookup — no infra, no auth, no scraping. Officer FK
column at_legal_entity_registration_number is numeric, so the JSON
filter value is sent unquoted.

Roles normalized to a stable English enum (board_chair, board_member,
council_member, council_chair, procurist, liquidator); unknown LV codes
pass through verbatim. Each entry carries rights_of_representation,
representation_with_at_least, start_date, and entity_type.

Coverage limit honestly disclosed in tier_2_available_reason: the
amatpersonas dataset is a current-active-officers snapshot — no
resignations or historical entries are exposed via this resource.

Smoke-verified against airBaltic (40003245752, 2 officers), Latvenergo
(40003032949, 5 officers), and a 40003020121 fallback (2 officers).

BE deferred from this phase: cbeapi.be doesn't expose officers, KBO
Open Data CSV omits the function table, and KBO Public Search Web
Service SOAP is paid (€50 per 2k requests with bank-transfer
onboarding). No free path exists.

Refs DEC-20260518-A, DEC-20260518-D.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@petterlindstrom79 petterlindstrom79 changed the base branch from main to feat/phase-1-class-a-relabel-fr-sk-uk May 18, 2026 10:31
petterlindstrom79 and others added 3 commits May 18, 2026 12:34
Pass B reviewer flagged that the populated-officers reason string spelled
the registry name without diacritics ("Uznemumu registrs") while the
provenance attribution + file comment use the correct "Uzņēmumu reģistrs"
form. Aligns the two so an AI agent caller does not surface inconsistent
transliterations in the same response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two inline fixes from the /go six-lens review on PR #133:

- Pass A correctness: removed 404 swallow in fetchOfficers. Previously
  the officers endpoint 404 silently produced legal_representatives=[]
  with tier_2_available=true, which made the flag misleading in the
  rare case where /company/{n} succeeds but /company/{n}/officers
  returns 404. Now the 404 propagates as a structured error consistent
  with fetchCompany.

- Quality consistency: guarded the legal_representatives assignment
  with the alias-block invariant pattern (if undefined). Matches
  surrounding canonical-alias resolution and FR sibling.

Deferred (PR-body MEDIUMs): cross-country shape mismatch, manifest
output_field_reliability gap, items_per_page=100 truncation guard,
SE YAML roadmap-state phrasing, reason-string voice, auth-header
duplication. See PR description for details.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant