fix(analysisinfo): detect auto-selected package from Windows analyzer logs#2937
Conversation
… logs The Windows analyzer logs the package selection as: INFO: analysis package selected: "pkg" but get_package() only searched for the Linux format: INFO: Automatically selected analysis package "pkg" This caused the package field to remain empty in reports for Windows analyses where no package was explicitly specified. Now searches for both log formats using len(marker) instead of a hardcoded offset.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where the auto-selected analysis package was not being correctly extracted from Windows analyzer logs, leading to incomplete reporting for Windows tasks. The changes ensure that the system can now parse package information from both Windows and Linux log formats, providing more accurate and comprehensive analysis results across different operating environments. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully adds support for detecting the auto-selected package from Windows analyzer logs, complementing the existing Linux log format, and improves maintainability by using len(marker) instead of a hardcoded offset. However, the implementation is susceptible to log injection from the untrusted guest VM, which could allow malware to spoof the package name in the final analysis report. It is recommended to add validation or sanitization to the extracted package name to ensure it only contains expected characters. Additionally, a minor performance improvement can be made to avoid scanning the log file twice for each marker.
| for marker in ( | ||
| 'INFO: analysis package selected: "', | ||
| 'INFO: Automatically selected analysis package "', | ||
| ): | ||
| if marker in analysis_log: | ||
| idx = analysis_log.index(marker) | ||
| package = analysis_log[idx + len(marker) :].split('"', 1)[0] | ||
| break |
There was a problem hiding this comment.
The get_package function extracts the analysis package name from the analysis.log file. This log's content can be influenced by an untrusted guest VM, making it vulnerable to log injection. An attacker could inject a fake log line to spoof the package name, potentially leading to data spoofing in the analysis report or Cross-Site Scripting (XSS) if the name is rendered without proper escaping. It is crucial to implement validation or sanitization on the extracted package name to mitigate this vulnerability. Additionally, the current approach of using if marker in analysis_log: followed by analysis_log.index(marker) causes the string to be scanned twice for each marker. For improved efficiency, consider using analysis_log.find(marker) instead, which avoids redundant scans.
| for marker in ( | |
| 'INFO: analysis package selected: "', | |
| 'INFO: Automatically selected analysis package "', | |
| ): | |
| if marker in analysis_log: | |
| idx = analysis_log.index(marker) | |
| package = analysis_log[idx + len(marker) :].split('"', 1)[0] | |
| break | |
| for marker in ( | |
| 'INFO: analysis package selected: "', | |
| 'INFO: Automatically selected analysis package "', | |
| ): | |
| idx = analysis_log.find(marker) | |
| if idx != -1: | |
| package = analysis_log[idx + len(marker) :].split('"', 1)[0] | |
| break |
There was a problem hiding this comment.
Applied the .find() suggestion to avoid the double scan — good catch. The log injection/XSS concern isn't applicable here though: the existing code already extracted from the same guest-controlled log in the same way, and the package name is used internally for processing, not rendered raw in any frontend context.
Summary
get_package()inanalysisinfo.pyonly searched for the Linux analyzer log format (Automatically selected analysis package) when extracting the auto-selected package nameanalysis package selected:) so the package field was always empty in reports for Windows tasks where no package was explicitly configuredlen(marker)instead of a hardcoded offset