-
Notifications
You must be signed in to change notification settings - Fork 732
feat: map maintainers by email if username not found (CM-773) #3598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ | |
|
|
||
| from crowdgit.database.crud import ( | ||
| find_github_identity, | ||
| find_maintainer_identity_by_email, | ||
| get_maintainers_for_repo, | ||
| save_service_execution, | ||
| set_maintainer_end_date, | ||
|
|
@@ -76,10 +77,16 @@ async def process_maintainer(maintainer: MaintainerInfoItem): | |
| original_role = self.make_role(maintainer.title) | ||
| # Find the identity in the database | ||
| github_username = maintainer.github_username | ||
| if github_username == "unknown": | ||
| self.logger.warning("github username with value 'unknown' aborting") | ||
| email = maintainer.email | ||
|
|
||
| if github_username == "unknown" and email == "unknown": | ||
| self.logger.warning("username & email with value 'unknown' aborting") | ||
| return | ||
| identity_id = await find_github_identity(github_username) | ||
| identity_id = ( | ||
| await find_github_identity(github_username) | ||
| if github_username != "unknown" | ||
| else await find_maintainer_identity_by_email(email) | ||
| ) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Missing Null Checks Break Identity LogicThe logic doesn't handle
Comment on lines
+80
to
+89
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give me a bit more context on this unknown logic?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The LLM return "unknown" if it didn't manage to extract the expected values. That was the logic from the v1 I didn't change it as it was working fine |
||
| self.logger.debug( | ||
| f"Found identity_id for {github_username}: {identity_id} (type: {type(identity_id)})" | ||
| ) | ||
|
|
@@ -198,7 +205,7 @@ def get_extraction_prompt(self, filename: str, content_to_analyze: str) -> str: | |
| - If maintainers are found, the JSON format must be: `{{"info": [list_of_maintainer_objects]}}` | ||
| - If no individual maintainers are found, or only teams/groups are mentioned, the JSON format must be: `{{"error": "not_found"}}` | ||
|
|
||
| Each object in the "info" list must contain these four fields: | ||
| Each object in the "info" list must contain these five fields: | ||
| 1. `github_username`: | ||
| - Find using common patterns like `@username`, `github.com/username`, `Name (@username)`, or from emails (`123+user@users.noreply.github.com`). | ||
| - This is a best-effort search. If no username can be confidently found, use the string "unknown". | ||
|
|
@@ -210,6 +217,10 @@ def get_extraction_prompt(self, filename: str, content_to_analyze: str) -> str: | |
| - Do not include filler words like "repository", "project", or "active". | ||
| 4. `normalized_title`: | ||
| - Must be exactly "maintainer" or "contributor". If the role is ambiguous, use the `<filename>` as the primary hint. For example, a file named `MAINTAINERS` or `CODEOWNERS` implies "maintainer", while `CONTRIBUTORS` implies "contributor". | ||
| 5. `email`: | ||
| - Extract the person's email address from the content. Look for patterns like `FullName <email@domain>`, `email@domain`, or email addresses in various formats. | ||
| - The email must be a valid email address format (containing @ and a domain). | ||
| - If no valid email can be found for the individual, use the string "unknown". | ||
|
|
||
| --- | ||
| Filename: {filename} | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joanagmaia FYI, I didn't limit the type to email only, because for git, we don't have usernames only email with type username, that should be fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's ok. Makes sense. And for git yeah we actually have to use type username since type emails are not verified. So all good