Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions services/apps/git_integration/src/crowdgit/database/crud.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,23 @@ async def find_github_identity(github_username: str):
return result


async def find_maintainer_identity_by_email(email: str):
sql_query = """
SELECT id
FROM "memberIdentities"
WHERE
platform IN ('github', 'git', 'gitlab')
AND "verified" = TRUE
AND value = $1
Comment on lines +231 to +233
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joanagmaia FYI, I didn't limit the type to email only, because for git, we don't have usernames only email with type username, that should be fine, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's ok. Makes sense. And for git yeah we actually have to use type username since type emails are not verified. So all good

LIMIT 1
"""
result = await fetchval(
sql_query,
(email,),
)
return result


async def upsert_maintainer(
repo_id: str,
identity_id: str,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class MaintainerInfoItem(BaseModel):
name: str | None = None
title: str | None = None
normalized_title: Literal["maintainer", "contributor"] | None = None
email: str | None = None


class MaintainerInfo(BaseModel):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from crowdgit.database.crud import (
find_github_identity,
find_maintainer_identity_by_email,
get_maintainers_for_repo,
save_service_execution,
set_maintainer_end_date,
Expand Down Expand Up @@ -76,10 +77,16 @@ async def process_maintainer(maintainer: MaintainerInfoItem):
original_role = self.make_role(maintainer.title)
# Find the identity in the database
github_username = maintainer.github_username
if github_username == "unknown":
self.logger.warning("github username with value 'unknown' aborting")
email = maintainer.email

if github_username == "unknown" and email == "unkown":
Comment thread
cursor[bot] marked this conversation as resolved.
Outdated
self.logger.warning("username & email with value 'unknown' aborting")
return
identity_id = await find_github_identity(github_username)
identity_id = (
await find_github_identity(github_username)
if github_username != "unknown"
else await find_maintainer_identity_by_email(email)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Null Checks Break Identity Logic

The logic doesn't handle None values for github_username and email. When github_username is None, the condition github_username != "unknown" evaluates to True, causing find_github_identity(None) to be called instead of falling back to email-based lookup. Similarly, when both are None, the early return check fails since None != "unknown", allowing invalid lookups to proceed. The code only checks for the string "unknown" but the model fields are nullable.

Fix in Cursor Fix in Web

Comment on lines +80 to +89
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a bit more context on this unknown logic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM return "unknown" if it didn't manage to extract the expected values. That was the logic from the v1 I didn't change it as it was working fine

self.logger.debug(
f"Found identity_id for {github_username}: {identity_id} (type: {type(identity_id)})"
)
Expand Down Expand Up @@ -198,7 +205,7 @@ def get_extraction_prompt(self, filename: str, content_to_analyze: str) -> str:
- If maintainers are found, the JSON format must be: `{{"info": [list_of_maintainer_objects]}}`
- If no individual maintainers are found, or only teams/groups are mentioned, the JSON format must be: `{{"error": "not_found"}}`

Each object in the "info" list must contain these four fields:
Each object in the "info" list must contain these five fields:
1. `github_username`:
- Find using common patterns like `@username`, `github.com/username`, `Name (@username)`, or from emails (`123+user@users.noreply.github.com`).
- This is a best-effort search. If no username can be confidently found, use the string "unknown".
Expand All @@ -210,6 +217,10 @@ def get_extraction_prompt(self, filename: str, content_to_analyze: str) -> str:
- Do not include filler words like "repository", "project", or "active".
4. `normalized_title`:
- Must be exactly "maintainer" or "contributor". If the role is ambiguous, use the `<filename>` as the primary hint. For example, a file named `MAINTAINERS` or `CODEOWNERS` implies "maintainer", while `CONTRIBUTORS` implies "contributor".
5. `email`:
- Extract the person's email address from the content. Look for patterns like `FullName <email@domain>`, `email@domain`, or email addresses in various formats.
- The email must be a valid email address format (containing @ and a domain).
- If no valid email can be found for the individual, use the string "unknown".

---
Filename: {filename}
Expand Down
Loading