Skip to content

fix(git-integration): deprecate multiprocessing and improve performance/resources usage in commits processing#3513

Merged
mbani01 merged 2 commits into
mainfrom
fix/deprecate_multiprocessing
Oct 15, 2025
Merged

fix(git-integration): deprecate multiprocessing and improve performance/resources usage in commits processing#3513
mbani01 merged 2 commits into
mainfrom
fix/deprecate_multiprocessing

Conversation

@mbani01
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 commented Oct 14, 2025

This pull request refactors the CommitService class in the git_integration app to improve maintainability, efficiency, and consistency. The main changes include converting many static methods to instance methods, removing process pool logic, switching JSON serialization to use orjson, and updating the commit processing pipeline to be fully asynchronous and write directly to the database and Kafka.

Refactoring and code modernization:

  • Converted most static methods in CommitService to instance methods for improved encapsulation and easier testing. This includes methods like is_valid_commit_hash, is_valid_datetime, should_skip_commit, clean_up_username, create_activity, extract_activities, prepare_activity_for_db_and_queue, and create_activities_from_commit. All internal references now use self instead of the class name. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

  • Removed all process pool management logic (ProcessPoolExecutor and related methods), simplifying the class and focusing on async processing. [1] [2]

Performance and serialization improvements:

  • Replaced the standard json library with orjson for faster serialization/deserialization, and updated all relevant code paths to use orjson.dumps(...).decode(). [1] [2] [3]

Commit processing pipeline updates:

  • Refactored process_commits_chunk to be an async method that processes commits and writes results directly to the database and Kafka, instead of returning them for later processing. This change enables better resource usage and aligns with the async design of the service. [1] [2] [3]

  • Improved logging and error handling by using the instance logger throughout the class. [1] [2]

Dependency updates:

  • Added orjson to the pyproject.toml dependencies to support the new serialization approach.

@mbani01 mbani01 self-assigned this Oct 14, 2025
@mbani01 mbani01 marked this pull request as ready for review October 15, 2025 12:02
@mbani01 mbani01 merged commit 827af2a into main Oct 15, 2025
12 checks passed
@mbani01 mbani01 deleted the fix/deprecate_multiprocessing branch October 15, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant