Skip to content

feat(connections): add connections export, contact enrichment, company search, notifications#315

Closed
Gabrcodes wants to merge 766 commits into
stickerdaniel:mainfrom
Gabrcodes:feat/connections-notifications
Closed

feat(connections): add connections export, contact enrichment, company search, notifications#315
Gabrcodes wants to merge 766 commits into
stickerdaniel:mainfrom
Gabrcodes:feat/connections-notifications

Conversation

@Gabrcodes
Copy link
Copy Markdown

Summary

Four new tools for connections and notifications, plus a network degree filter for search_people.

Tool Description
get_my_connections Scrapes the connections list via infinite scroll. Uses innerText parsing only — no CSS class-name selectors, compliant with project scraping rules.
extract_contact_details Batch-enriches profiles with email, phone, website, birthday from the contact info overlay. Chunks with configurable delays. include_raw=false by default to avoid large responses.
get_connections_at_company Resolves the company name to a numeric entity ID first (LinkedIn ignores name strings in currentCompany filter), then searches 1st-degree connections. Falls back to keyword search if ID can't be resolved. Closes #248.
get_notifications Scrapes /notifications/ for all notification types. Closes #211.

search_people gains a network parameter ("F" = 1st degree, "S" = 2nd, "O" = 3rd+) with input validation.

Implementation notes

  • _resolve_company_id navigates to the company page and extracts the numeric entity URN from page metadata — needed because currentCompany=%5B%22Google%22%5D is silently ignored by LinkedIn
  • scrape_contact_batch catches RateLimitError specifically to abort the batch; other LinkedInScraperException failures are per-profile only
  • _parse_contact_record extracts company from the headline ("X at Company") rather than guessing from innerText line positions

Changes

  • scraping/extractor.py_parse_contact_record, search_people network filter, scrape_connections_list, scrape_contact_batch, get_notifications, _resolve_company_id, search_connections_at_company
  • tools/connections.py — new file with 4 tool registrations
  • tools/person.py — network parameter on search_people
  • server.py — register connections tools
  • tests/test_tools.py — updated search_people assertion for network param

Test plan

  • 357 passed, 5 skipped, 0 failures
  • ruff check and ruff format pass
  • Live test: get_my_connections returns connection list
  • Live test: get_connections_at_company with a real company
  • Live test: get_notifications

stickerdaniel and others added 30 commits March 4, 2026 19:38
…place_flag_enums_with_config_dicts

refactor(scraping): replace Flag enums with config dicts
docs: sync manifest.json tools and features with current capabilities
Lock file already has 3.1.0 since #166; align pyproject.toml
floor to prevent accidental downgrades to v2.

Resolves: #190
Lock file already has 3.1.0 since #166; align pyproject.toml
floor to prevent accidental downgrades to v2.

Resolves: #190

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR tightens the `fastmcp` minimum version constraint from `>=2.14.0` to `>=3.0.0` in `pyproject.toml` (and the corresponding `uv.lock` metadata), preventing any future resolver from backtracking to the incompatible v2 series. The lock file has already been pinning `fastmcp==3.1.0` since PR #166, so there is no runtime impact — this is purely a spec/metadata alignment.

- `pyproject.toml`: `fastmcp` floor raised to `>=3.0.0`
- `uv.lock`: `package.metadata.requires-dist` updated to match; the resolved package entry (`3.1.0`) is unchanged
- No upper-bound cap (`<4.0.0`) is set, which is consistent with the project's existing open-ended constraints for all other dependencies

<h3>Confidence Score: 5/5</h3>

- This PR is safe to merge — it is a pure metadata alignment with no functional or runtime impact.
- The locked version was already `3.1.0` before this PR; the only change is raising the declared floor to match. Both modified lines are trivially correct, consistent with each other, and have no side-effects on the installed environment.
- No files require special attention.

<h3>Important Files Changed</h3>




| Filename | Overview |
|----------|----------|
| pyproject.toml | Single-line change updating the `fastmcp` floor constraint from `>=2.14.0` to `>=3.0.0`, aligning with the already-resolved version in the lock file. |
| uv.lock | Auto-generated lock file metadata updated to reflect the new `>=3.0.0` specifier; the resolved `fastmcp` version (3.1.0) was already correct and unchanged. |

</details>



<h3>Flowchart</h3>

```mermaid
%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["pyproject.toml\nfastmcp >=3.0.0"] -->|uv resolves| B["uv.lock\nfastmcp 3.1.0 (pinned)"]
    B --> C["Installed environment\nfastmcp 3.1.0"]
    D["Old constraint\nfastmcp >=2.14.0"] -. "could resolve to" .-> E["fastmcp 2.x\n(incompatible)"]
    style D fill:#f9d0d0,stroke:#c00
    style E fill:#f9d0d0,stroke:#c00
    style A fill:#d0f0d0,stroke:#060
    style B fill:#d0f0d0,stroke:#060
    style C fill:#d0f0d0,stroke:#060
```

<sub>Last reviewed commit: 7d2363e</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Replace dict-returning handle_tool_error() with raise_tool_error()
that raises FastMCP ToolError for known exceptions. Unknown exceptions
re-raise as-is for mask_error_details=True to handle.

Resolves: #185
Add logger.error with exc_info for unknown exceptions before re-raising,
and add test coverage for AuthenticationError and ElementNotFoundError.
Re-add optional context parameter to raise_tool_error() for log
correlation, and add test for base LinkedInScraperException branch.
Add catch-all comment on base exception branch and NoReturn
inline comments on all raise_tool_error() call sites.
…mcp_constraint_to_3.0.0

refactor(error-handler): replace handle_tool_error with ToolError
Replace repeated ensure_authenticated/get_or_create_browser/
LinkedInExtractor boilerplate in all 6 tool functions with
FastMCP Depends()-based dependency injection via a single
get_extractor() factory in dependencies.py.

Resolves: #186
Updated the get_extractor function to route errors through raise_tool_error, ensuring that MCP clients receive structured ToolError responses for authentication failures. Added a test to verify that authentication errors are correctly handled and produce the expected ToolError response.
…epends_to_inject_extractor

refactor(tools): Use Depends() to inject extractor
Replace ToolAnnotations(...) with plain dicts, move title to
top-level @mcp.tool() param, and add category tags to all tools.

Resolves: #189
Replace ToolAnnotations(...) with plain dicts, move title to
top-level @mcp.tool() param, and add category tags to all tools.

Resolves: #189

<!-- greptile_comment -->

<h3>Greptile Summary</h3>

This PR is a clean, well-scoped refactoring that modernises tool metadata across all four changed files to align with the FastMCP 3.x API. It introduces no functional or behavioural changes.

Key changes:
- Removes the `ToolAnnotations(...)` Pydantic wrapper in `company.py`, `job.py`, and `person.py`, replacing it with plain `dict` syntax for the `annotations` parameter — the simpler form supported by FastMCP 3.x.
- Moves `title` from inside `ToolAnnotations` to a top-level keyword argument on `@mcp.tool()`, matching the updated FastMCP 3.x decorator signature.
- Drops the now-redundant `destructiveHint=False` from all read-only tools. Per the MCP spec, `destructiveHint` is only meaningful when `readOnlyHint` is `false`, so omitting it from tools that already declare `readOnlyHint=True` is semantically equivalent.
- Adds `tags` (as Python `set` literals) to every tool for categorisation (`"company"`, `"job"`, `"person"`, `"scraping"`, `"search"`, `"session"`).
- Enriches the previously unannotated `close_session` tool in `server.py` with a title, `destructiveHint=True`, and the `"session"` tag — accurately describing its destructive nature.

The existing test suite in `tests/test_tools.py` covers all tool functions but does not assert on annotation metadata, so no test changes are required. The refactoring is consistent across all tool files and fits naturally within the project's layered registration pattern.

<h3>Confidence Score: 5/5</h3>

- This PR is safe to merge — it is a pure metadata/annotation refactoring with no changes to tool logic, inputs, outputs, or error handling.
- All changes are limited to decorator parameters (`title`, `annotations`, `tags`). The `annotations` dict values are semantically equivalent to the removed `ToolAnnotations` objects, `destructiveHint=False` is correctly dropped only for `readOnlyHint=True` tools, and the new `close_session` annotations accurately reflect its destructive nature. No business logic, scraping behaviour, or error paths were altered.
- No files require special attention.

<h3>Flowchart</h3>

```mermaid
%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["@mcp.tool() decorator"] --> B{Annotation style}
    B -->|Before| C["ToolAnnotations(title=..., readOnlyHint=..., destructiveHint=False, openWorldHint=...)"]
    B -->|After| D["title='...' (top-level param)\nannotations={'readOnlyHint': True, 'openWorldHint': True}\ntags={'category', 'type'}"]
    D --> E["person tools\n(get_person_profile, search_people)"]
    D --> F["company tools\n(get_company_profile, get_company_posts)"]
    D --> G["job tools\n(get_job_details, search_jobs)"]
    D --> H["session tool\n(close_session)\nannotations={'destructiveHint': True}"]
```

<sub>Last reviewed commit: c5bf554</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Use lowercase dict instead of Dict, add auth validation log line
…t_lifespan_into_composable_browser_auth_lifespans

refactor(server): Split lifespan into composable browser + auth lifespans
# Conflicts:
#	linkedin_mcp_server/server.py
#	linkedin_mcp_server/tools/company.py
#	linkedin_mcp_server/tools/job.py
#	linkedin_mcp_server/tools/person.py
# Conflicts:
#	linkedin_mcp_server/server.py
#	linkedin_mcp_server/tools/company.py
#	linkedin_mcp_server/tools/job.py
#	linkedin_mcp_server/tools/person.py
# Conflicts:
#	linkedin_mcp_server/server.py
aspectrr and others added 23 commits March 29, 2026 18:25
feat: add get_sidebar_profiles tool and profile_urn in get_person_profile
- Replace custom _secure_profile_dirs/_set_private_mode with thin
  _harden_linkedin_tree that uses secure_mkdir from common_utils
- Fix export_storage_state: chmod 0o600 after Playwright writes
- Add test for export_storage_state permission hardening
- Add test for no-op outside .linkedin-mcp tree
- Revert unrelated loaders.py change
Harden .linkedin-mcp profile/cookie permissions
- Remove unused selector constants (_MESSAGING_THREAD_LINK_SELECTOR, _MESSAGING_RESULT_ITEM_SELECTOR, _MESSAGING_SEND_SELECTOR)
- Remove dead _conversation_thread_cache (new extractor per tool call)
- Add AuthenticationError handling to get_sidebar_profiles and all messaging tools
- Pass CSS selector as evaluate() arg instead of f-string interpolation
- Replace deprecated execCommand with press_sequentially
- Guard sidebar container walk against depth-limit exhaustion
- Update scrape_person docstring to document profile_urn return key
- Add messaging tools to README tool-status table
LinkedIn redirects /messaging/ to the most recent thread; capture
baseline_thread_id after the SPA settles so search-selected threads
can be distinguished from the auto-opened one.
feat: linkedin messaging, get sidebar profiles
…IDs (#300)

* fix(scraping): Respect --timeout for messaging, recognize thread URLs

Remove all hardcoded timeout=5000 from the send_message flow and
messaging helpers so they fall through to the page-level default
set from BrowserConfig.default_timeout (configurable via --timeout).

Also add /messaging/thread/ URL recognition to classify_link so
conversation thread references are captured when they appear in
search results or conversation detail views. Raise inbox reference
cap to 30 and add proper section context labels.

Resolves: #296
See also: #297

* fix(scraping): Extract conversation thread IDs from inbox via click-and-capture

LinkedIn's conversation sidebar uses JS click handlers instead of <a>
tags, so anchor extraction cannot capture thread IDs. Click each
conversation item and read the resulting SPA URL change to build
conversation references with thread_id and participant name.

Before: get_inbox returned 2 references (active conversation only)
After: get_inbox returns all conversation thread IDs (10+ refs)

Resolves: #297

* fix(scraping): Respect --timeout across all remaining scraping methods

Remove the remaining 10 hardcoded timeout=5000 from profile scraping,
connection flow, modal detection, sidebar profiles, conversation
resolution, and job search. All Playwright calls now use the page-level
default from BrowserConfig.default_timeout.

Resolves: #299

* fix: Address PR review feedback

- Use saved inbox URL instead of self._page.url (P1: wrong URL after clicks)
- Fix docstring to clarify 2s recipient-picker probe is intentional
- Replace class-name selectors with aria-label discovery + minimal class fallback
- Dedupe references after merging conversation and anchor refs
First-time uvx runs download ~77 Python packages including the 39MB
patchright wheel. On slow connections, uv's default 30s HTTP timeout
can cause silent failures before the server process starts.

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
Move UV_HTTP_TIMEOUT=300 into the main uvx config example so it's the
default, not an optional troubleshooting step. Fix grammar in the
troubleshooting note.

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
* docs: use @latest tag in uvx config for auto-updates

Without @latest, uvx caches the first downloaded version forever.
Adding @latest ensures uvx checks PyPI on each client launch and
pulls new versions automatically.

* docs: apply @latest consistently to all uvx invocations

Update --login examples in README.md and docs/docker-hub.md to use
linkedin-scraper-mcp@latest for consistency with the MCP config.

---------

Co-authored-by: Daniel Sticker <sticker@ngenn.net>
…y search, notifications

Four new tools covering connections and notifications:

get_my_connections:
- Scrapes the connections list via infinite scroll
- Extracts username, name, headline using innerText parsing only
  (no CSS class-name selectors — compliant with project scraping rules)
- Configurable limit and max_scrolls

extract_contact_details:
- Batch-enriches profiles with structured contact data
- Extracts email, phone, website, birthday from contact info overlay
- Processes in configurable chunks with rate-limit protection
- RateLimitError specifically catches rate limits; other scraper
  failures are per-profile and don't abort the batch
- include_raw=True opt-in for raw innerText (off by default to avoid
  oversized responses)

get_connections_at_company:
- Resolves company name to numeric entity ID first (_resolve_company_id)
  by navigating to the company page and reading the URN from metadata
- Falls back to keyword search if ID can't be resolved
- Filters to 1st-degree connections (network=F)

get_notifications:
- Scrapes /notifications/ page for all notification types
  (connection requests, reactions, mentions, job alerts, etc.)

Also adds network degree filter to search_people:
- "F" = 1st degree, "S" = 2nd degree, "O" = 3rd+
- Validates the value before building the URL

Closes #248
Closes #211

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 2, 2026

Greptile Summary

This PR adds four new tools (get_my_connections, extract_contact_details, get_connections_at_company, get_notifications) and a network degree filter for search_people. The implementation is well-structured, follows the project's innerText-first scraping rules, handles RateLimitError correctly in the batch flow, and the company-ID resolution approach is sound. All previously flagged concerns (quote_plusquote, double-navigation timeout) appear to have been addressed.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor P2 style suggestions with no correctness impact.

No P0 or P1 issues found. The two open comments are best-practice suggestions (chunk_delay guard and notifications limit semantics), neither of which affects correctness or reliability in normal use. Prior concerns about quote_plus and timeout have been resolved.

extractor.py — chunk_delay guard and notification limit approximation are minor but worth a follow-up.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds scrape_connections_list, scrape_contact_batch, get_notifications, _resolve_company_id, and search_connections_at_company; network filter on search_people. Minor: chunk_delay has no lower bound; notification limit controls scroll depth but doesn't clip the returned text.
linkedin_mcp_server/tools/connections.py New file registering four MCP tools; error handling is consistent with existing tools; timeout values look appropriate with the * 2 multiplier for get_connections_at_company and * 5 for extract_contact_details.
linkedin_mcp_server/tools/person.py Adds network parameter to search_people tool; validation is delegated to the extractor layer which raises ValueError for invalid values.
linkedin_mcp_server/server.py Minimal change: imports and registers the new connections tools module.
tests/test_tools.py Updates search_people assertion to account for the new network parameter; no regressions visible.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_my_connections] --> B[scrape_connections_list]
    B --> C[Navigate connections page]
    C --> D[scroll_to_bottom]
    D --> E[evaluate: extract /in/ links]
    E --> F[Return connections list]

    G[extract_contact_details] --> H[scrape_contact_batch]
    H --> I{For each chunk}
    I --> J[extract_page profile]
    J --> K[_extract_overlay contact-info]
    K --> L[_parse_contact_record]
    L --> M{More chunks?}
    M -->|Yes, sleep chunk_delay| I
    M -->|No| N[Return contacts + failed]

    O[get_connections_at_company] --> P[search_connections_at_company]
    P --> Q[_resolve_company_id]
    Q --> R{ID found?}
    R -->|Yes| S[Search with network + currentCompany filter]
    R -->|No| T[Fallback keyword search]

    U[get_notifications] --> V[Navigate notifications page]
    V --> W[scroll_to_bottom]
    W --> X[_extract_root_content main]
    X --> Y[Return notifications text]

    Z[search_people] --> AA{network param?}
    AA -->|Valid F S O| AB[Add network filter to URL]
    AA -->|Invalid| AC[Raise ValueError]
    AB --> AD[extract_page search results]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 288-295

Comment:
**`chunk_delay` has no lower-bound guard**

`asyncio.sleep(chunk_delay)` is called unconditionally with a caller-supplied float. There's no validation in either `scrape_contact_batch` or the tool layer, so a caller (or an LLM) can pass `chunk_delay=0` and silently remove the inter-chunk pause that protects against rate limiting.

```suggestion
            if chunk_idx + chunk_size < total and chunk_delay > 0:
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 320-322

Comment:
**`limit` controls scroll depth, not result count**

`scrolls = max(1, limit // 10)` is an approximation — the final raw text returned is not clipped to `limit` notifications. With `limit=1`, `scrolls=1` but the page may render far more than 1 notification; with `limit=50`, up to 50 may be loaded but the caller gets back the full `<main>` text with no numeric cutoff. The parameter name and description ("Maximum notifications to load") imply a stricter bound than what's implemented. Consider documenting this in the docstring, or renaming the parameter to `max_scrolls` to match `scrape_connections_list`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "fix: propagate section_errors and omit e..." | Re-trigger Greptile

Comment thread linkedin_mcp_server/scraping/extractor.py Outdated
Comment thread linkedin_mcp_server/tools/connections.py Outdated
Gabrcodes and others added 2 commits April 3, 2026 02:09
- fix(extractor): use quote(company, safe='-') instead of quote_plus
  in _resolve_company_id — quote_plus encodes spaces as + which is
  valid only in query strings, not URL paths; 'Goldman Sachs' now
  produces /company/Goldman%20Sachs/ which LinkedIn recognises
- fix(tools): double timeout for get_connections_at_company to
  TOOL_TIMEOUT_SECONDS * 2 — method does two navigations (company
  page + search results) and needs more headroom than 90s
- fix(extractor): move Website into the labeled-field loop in
  _parse_contact_record — was duplicated outside, creating two
  slightly different code paths that could drift out of sync
- test: add TestConnectionsTool with 4 tests covering all new
  connection tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nections_at_company

Aligns with search_people pattern: propagate extraction failures via
section_errors and only emit references key when non-empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add get_notifications tool to scrape LinkedIn notifications [FEATURE] What connections do I have at <company>?

7 participants