Skip to content

feat(search_people): add network and current_company filters (#248)#384

Merged
stickerdaniel merged 4 commits intostickerdaniel:mainfrom
mvanhorn:feat/248-search-people-filters
May 7, 2026
Merged

feat(search_people): add network and current_company filters (#248)#384
stickerdaniel merged 4 commits intostickerdaniel:mainfrom
mvanhorn:feat/248-search-people-filters

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

  • Add two optional params to search_people: network (1st / 2nd / 3rd degree filter, tokens "F" / "S" / "O") and current_company (current-employer filter, accepts a company name or URN id).
  • Threaded through tools/person.py and scraping/extractor.py onto the existing LinkedIn search URL. Encodes the two facets as network=%5B%22F%22%5D and currentCompany=%5B%22Weber+Inc%22%5D via a small list-facet helper.
  • Invalid network tokens raise ValueError at the extractor boundary.
  • 6 new extractor tests, 1 new tool test, README and manifest.json updated.

Closes #248 (company-filter half; the industry filter from the second half is a natural follow-up PR once this shape is accepted).

Motivation

Concrete user story that motivated this: a user has Jennifer Bonuso (President Americas at Weber Inc) as a 1st-degree connection, but search_people("Weber Inc") returns only 2nd-degree results and omits her. After this PR, search_people(keywords="Weber Inc", network=["F"]) constrains the results page to 1st-degree.

Before:

https://www.linkedin.com/search/results/people/?keywords=Weber+Inc

After (with the new params):

https://www.linkedin.com/search/results/people/?keywords=Weber+Inc&network=%5B%22F%22%5D&currentCompany=%5B%22Weber+Inc%22%5D

Prior Art

Inspired by:

This PR delivers the core of #152 and part of #320 as a minimal, current-main-aligned diff that follows the search_jobs filter pattern already in extractor.py.

Testing

  • uv run pytest - 395 passed locally, including 6 new extractor tests and 1 new tool test covering single-degree, multi-degree, current_company, invalid token, and combined-filter URL construction.
  • uv run ruff check . - clean
  • uv run ruff format . - clean
  • uv run ty check - clean
  • uv run pre-commit run --all-files - clean

Tests exercise URL construction only. Live verification against LinkedIn's rendered page is not included in this PR since it requires an authenticated browser session; happy to add notes from a manual live run if you want that before merge.

Out of scope

Synthetic prompt

Add two optional params to search_people in linkedin_mcp_server/tools/person.py and linkedin_mcp_server/scraping/extractor.py: network: list[str] | None (tokens "F" / "S" / "O" for 1st / 2nd / 3rd degree) and current_company: str | None. Validate network tokens at the extractor; invalid tokens raise ValueError. Encode both as URL-encoded JSON list facets (network=%5B%22F%22%5D, currentCompany=%5B%22Weber+Inc%22%5D) via a small helper. Update manifest.json tool description and the README tool table. Add extractor tests for single-degree, multi-degree, current_company, invalid token, and combined-filter URL construction, plus a tool test that forwards kwargs. Keep the diff under ~150 core-source lines and follow the search_jobs filter pattern.

Generated with Claude Opus 4.7 (1M context)

Extends the existing search_people tool with two optional params that
LinkedIn's people-search URL already accepts:

- network: list of "F" / "S" / "O" tokens for 1st / 2nd / 3rd+ degree
  connection filter. Invalid tokens raise ValueError.
- current_company: company name (or URN id) passed to LinkedIn's
  currentCompany facet.

Encoded via a small list-facet helper that produces URLs of the form
network=%5B%22F%22%5D and currentCompany=%5B%22Weber+Inc%22%5D.

Addresses the "who in my 1st-degree network works at <company>" half
of stickerdaniel#248. Industry filter tracked as a follow-up.
@stickerdaniel
Copy link
Copy Markdown
Owner

Thanks! Let me know when it's ready for review

@mvanhorn mvanhorn marked this pull request as ready for review April 22, 2026 15:31
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Ready for review @stickerdaniel - all CI green (lint-and-check, test, Socket Security). Thanks!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR adds two optional filter parameters — network (connection-degree, tokens "F" / "S" / "O") and current_company (numeric URN only) — to search_people, following the existing search_jobs filter pattern throughout the stack.

  • extractor.py introduces _encode_list_facet (using json.dumps), a FilterValidationError subclass of ValueError, ASCII-only URN validation via re.fullmatch, and threads both params onto the LinkedIn search URL.
  • person.py forwards the new params, catches FilterValidationError and re-raises it as ToolError so the actionable message reaches the MCP client instead of being masked, and adds except ToolError: raise to prevent the generic handler from swallowing already-formatted errors.
  • Eight extractor tests and two tool tests cover single/multi-degree network filters, URN validation (including Unicode-digit rejection), empty-string noop, combined filters, and the FilterValidationError → ToolError promotion path.

Confidence Score: 5/5

Safe to merge — the change is additive, all new params are optional with None defaults, validation is strict and tested, and the error-promotion path from extractor to MCP client is correctly guarded.

All new parameters default to None, leaving existing callers unaffected. Validation (network token allow-list, ASCII-only URN regex) is exercised by 8 targeted tests including Unicode-digit edge cases. The FilterValidationError → ToolError promotion and the bare except ToolError: raise guard are both covered by dedicated tool tests. No regressions are introduced to the existing search_people call path.

No files require special attention.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds network and current_company params to search_people, with a FilterValidationError subclass, ASCII-only URN validation via re.fullmatch, and _encode_list_facet helper using json.dumps. Logic is tight and consistent with the existing search_jobs pattern.
linkedin_mcp_server/tools/person.py Threads new params through the MCP tool, catches FilterValidationError and re-raises as ToolError, and adds a bare except ToolError: raise to prevent the generic error handler from swallowing already-formatted client errors.
tests/test_scraping.py Adds 8 new extractor tests covering single-degree, multi-degree, current_company URN, invalid token, Unicode-digit rejection, empty-string noop, and combined-filter URL construction. Good edge-case coverage.
tests/test_tools.py Adds 2 new tool tests: one verifying kwarg forwarding and one confirming FilterValidationError surfaces as ToolError with the original message intact.
manifest.json Tool description updated to mention connection degree and current company filters — straightforward doc update.
README.md README tool table updated to reflect new filter capabilities — accurate and consistent with the implementation.

Sequence Diagram

sequenceDiagram
    participant Client as MCP Client
    participant Tool as person.py (search_people)
    participant Extractor as extractor.py (search_people)
    participant LinkedIn as LinkedIn Search URL

    Client->>Tool: search_people(keywords, network, current_company)
    Tool->>Extractor: "extractor.search_people(keywords, location, network=, current_company=)"

    alt network tokens invalid
        Extractor-->>Tool: raise FilterValidationError
        Tool-->>Client: raise ToolError (message preserved)
    else current_company not numeric URN
        Extractor-->>Tool: raise FilterValidationError
        Tool-->>Client: raise ToolError (message preserved)
    else params valid
        Extractor->>Extractor: build params + network + currentCompany
        Extractor->>LinkedIn: extract_page(url)
        LinkedIn-->>Extractor: page content
        Extractor-->>Tool: "{url, sections}"
        Tool-->>Client: "{url, sections}"
    end
Loading

Reviews (4): Last reviewed commit: "fix(search_people): reject non-numeric c..." | Re-trigger Greptile

Comment thread linkedin_mcp_server/scraping/extractor.py Outdated
@stickerdaniel
Copy link
Copy Markdown
Owner

Live-tested all three network tokens against my account. F/S/O filter
cleanly, URL encoding matches what LinkedIn's UI generates, combinations
work. Network half is good.

current_company doesn't actually filter on plain text though. Compared
"SAP" and "Anthropic" against the same keyword set, LinkedIn returns
the same unfiltered result set for both. Only the numeric URN id ("1115"
for SAP) hits the facet. With that, 8/10 results were SAP employees as
expected. The Weber Inc example in the PR body wouldn't filter in practice.

Two things before merge:

  • Description + README: current_company takes a numeric LinkedIn
    company URN id, not a name. Plain text is silently dropped by LinkedIn.
  • Validation: non-numeric current_company should raise ValueError
    pointing at get_company_profile for the URN.

Validation depends on get_company_profile exposing the URN, which it
doesn't yet. Opened #430 for that, I'll take it.

stickerdaniel added a commit that referenced this pull request May 6, 2026
- Soften get_company_profile docstring: "may include" company_urn (anchor
  isn't on every company page, e.g. very small / brand-new ones)
- Reword company_urn description in tools/company.py docstring,
  manifest.json, README, and docs/docker-hub.md to point at LinkedIn's
  currentCompany URL facet rather than the search_people.current_company
  parameter (which doesn't exist on main today; arrives via #384)
- Defensive _FIRST_URN_RE: accept both quoted-string and bare-integer
  JSON list elements; new test covers the unquoted form
- Comment why normalize_reference re-extracts the urn from the canonical
  url instead of threading it through classify_link
@stickerdaniel
Copy link
Copy Markdown
Owner

@mvanhorn #430 merged. URN now in get_company_profile.references[about].

stickerdaniel and others added 2 commits May 6, 2026 16:53
Two follow-ups from PR stickerdaniel#384 review:

- _encode_list_facet now uses json.dumps with compact separators
  instead of manual string interpolation. Same URL output for
  well-behaved tokens, but values containing quotes/backslashes
  (e.g. Weber "Big" Inc) now produce valid JSON instead of malformed
  output. Greptile P2 from the inline review.

- current_company docstrings (extractor + tool) now state that the
  currentCompany facet only filters on numeric URN ids; plain
  company names are accepted by the URL but ignored by LinkedIn.
  Points users to get_company_profile.references["about"] for URN
  lookup, per @stickerdaniel's findings and merged stickerdaniel#430.
@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented May 6, 2026

Pushed a327845:

  • _encode_list_facet now uses json.dumps with compact separators, so quotes/backslashes in values escape cleanly (greptile P2 from the inline review). Existing URL-encoding tests still pass.
  • current_company docstrings (extractor + tool) flipped to make the URN-only requirement clear. Plain names are accepted by the URL but won't filter; URN can be pulled from get_company_profile.references["about"] now that [FEATURE] Surface company URN id in get_company_profile references #430 has shipped. Thanks for the live-test confirmation.

A plain-text company name is silently ignored by LinkedIn's currentCompany
facet, returning unfiltered results without warning. Mirror the existing
network-token validation: raise up front, with a hint pointing at
get_company_profile -> references["about"] for URN lookup (per stickerdaniel#430).

Both validations now raise FilterValidationError (a ValueError subclass —
backward compatible). The tool wrapper maps this to ToolError so the
helpful message reaches the MCP client instead of being collapsed to
"Error calling tool" by mask_error_details. Same fix retroactively
surfaces the existing network=["X"] error for the same reason.
@stickerdaniel stickerdaniel merged commit 0c90e17 into stickerdaniel:main May 7, 2026
6 checks passed
nfsarch33 pushed a commit to nfsarch33/linkedin-mcp-server that referenced this pull request May 8, 2026
Two follow-ups from PR stickerdaniel#384 review:

- _encode_list_facet now uses json.dumps with compact separators
  instead of manual string interpolation. Same URL output for
  well-behaved tokens, but values containing quotes/backslashes
  (e.g. Weber "Big" Inc) now produce valid JSON instead of malformed
  output. Greptile P2 from the inline review.

- current_company docstrings (extractor + tool) now state that the
  currentCompany facet only filters on numeric URN ids; plain
  company names are accepted by the URL but ignored by LinkedIn.
  Points users to get_company_profile.references["about"] for URN
  lookup, per @stickerdaniel's findings and merged stickerdaniel#430.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] What connections do I have at <company>?

2 participants