feat(aws): add AWS Security integration#173
Conversation
…dWatch, and CloudTrail
- Add missing autohive-integrations-sdk to requirements.txt - Validate AWS credentials before creating boto3 client - Parallelize get_insights result fetches with asyncio.gather - Batch update_finding_workflow lookups in chunks of 100 - Add error_code to all 20 output schemas in config.json - Add docstrings to all action handler classes
- Remove root __init__.py (not used by working integrations) - Rename integration variable to 'aws' matching entry point name - Update all action imports to 'from aws import aws' - Update all decorators to @aws.action() - Remove required array from auth fields schema
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 982b0a353a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Can we please run philbot against it, too? |
🤖 Automated Code ReviewOverall AssessmentVerdict: ✅ Patch is correct - The code is functional and free of blocking issues. The integration follows security best practices and demonstrates good memory management. However, there are performance optimizations that should be addressed. Confidence Score: 0.92/1.0 📊 Review Summary
🔴 High Priority Issues[P1] Boto3 client created on every action executionFile: Current code: def create_boto3_client(context: ExecutionContext, service_name: str):
credentials = context.auth.get("credentials", {})
access_key = credentials.get("aws_access_key_id")
secret_key = credentials.get("aws_secret_access_key")
if not access_key or not secret_key:
raise ValueError(
"AWS credentials are missing: aws_access_key_id and aws_secret_access_key are required"
)
return boto3.client(
service_name,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name=credentials.get("aws_region", "us-east-1")
)Recommendation: Consider caching boto3 clients per (credentials_hash, service_name, region) tuple. Since boto3 clients are thread-safe for making API calls, they can be reused across requests with the same credentials: import hashlib
# Simple client cache - in production, consider TTL-based eviction
_client_cache: dict[tuple[str, str], Any] = {}
def _get_credentials_key(context: ExecutionContext) -> str:
"""Generate a cache key from credentials."""
credentials = context.auth.get("credentials", {})
key_parts = (
credentials.get("aws_access_key_id", ""),
credentials.get("aws_secret_access_key", ""),
credentials.get("aws_region", "us-east-1")
)
return hashlib.sha256(":".join(key_parts).encode()).hexdigest()
def create_boto3_client(context: ExecutionContext, service_name: str):
credentials = context.auth.get("credentials", {})
access_key = credentials.get("aws_access_key_id")
secret_key = credentials.get("aws_secret_access_key")
if not access_key or not secret_key:
raise ValueError(
"AWS credentials are missing: aws_access_key_id and aws_secret_access_key are required"
)
cache_key = (_get_credentials_key(context), service_name)
if cache_key not in _client_cache:
_client_cache[cache_key] = boto3.client(
service_name,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name=credentials.get("aws_region", "us-east-1")
)
return _client_cache[cache_key]Expected improvement: Eliminates client initialization overhead for subsequent calls with the same credentials. 🟡 Medium Priority Issues[P2] Sequential API calls in update_finding_workflow could be parallelizedFile: Current code: # Look up findings in batches of 100 (AWS API limit) to get ProductArn
findings = []
for i in range(0, len(finding_arns), 100):
batch = finding_arns[i:i + 100]
lookup_kwargs = {
"Filters": {
"Id": [
{"Value": arn, "Comparison": "EQUALS"}
for arn in batch
]
},
"MaxResults": len(batch)
}
lookup_response = await run_sync(client.get_findings, **lookup_kwargs)
findings.extend(lookup_response.get("Findings", []))Recommendation: Parallelize the batch lookups using # Look up findings in batches of 100 (AWS API limit) to get ProductArn
async def lookup_batch(batch):
lookup_kwargs = {
"Filters": {
"Id": [
{"Value": arn, "Comparison": "EQUALS"}
for arn in batch
]
},
"MaxResults": len(batch)
}
lookup_response = await run_sync(client.get_findings, **lookup_kwargs)
return lookup_response.get("Findings", [])
batches = [finding_arns[i:i + 100] for i in range(0, len(finding_arns), 100)]
batch_results = await asyncio.gather(*[lookup_batch(batch) for batch in batches])
findings = [f for batch_findings in batch_results for f in batch_findings]Expected improvement: For N batches, reduces latency from O(N × API_latency) to O(API_latency) (parallel execution). [P2] Unbounded finding lookup in update_finding_workflowFile: Recommendation: Add input validation to limit if len(finding_arns) > 1000:
return error_result(ValueError("Maximum 1000 finding ARNs allowed per request"))🟢 Low Priority Issues[P3] Missing batch size handling for >50 finding IDs in GuardDutyFile: Recommendation: Either validate the input length or batch the requests: async def execute(self, inputs: Dict[str, Any], context: ExecutionContext):
try:
client = create_boto3_client(context, "guardduty")
finding_ids = inputs["finding_ids"]
# GuardDuty GetFindings has a limit of 50 IDs per request
if len(finding_ids) > 50:
# Batch the requests
all_findings = []
for i in range(0, len(finding_ids), 50):
batch = finding_ids[i:i + 50]
kwargs = {
"DetectorId": inputs["detector_id"],
"FindingIds": batch
}
response = await run_sync(client.get_findings, **kwargs)
all_findings.extend(response.get("Findings", []))
return success_result({"findings": all_findings})
kwargs = {
"DetectorId": inputs["detector_id"],
"FindingIds": finding_ids
}
response = await run_sync(client.get_findings, **kwargs)Alternatively, add [P3] list_metrics missing pagination limit parameterFile: Recommendation: Add a [P3] Inconsistent pagination limits across actionsFile: Current state:
Recommendation: Add ✅ Positive ObservationsSecurity Best Practices
Memory Management
Code Quality
Excellent Pattern: Parallel Insight FetchingFile: The enriched_insights = await asyncio.gather(
*[fetch_insight_result(insight) for insight in insights]
)This pattern should be replicated in other actions that perform batch operations (like 📋 Summary of Findings{
"findings": [
{
"title": "[P1] Boto3 client created on every action execution",
"priority": 1,
"confidence_score": 0.95,
"file": "aws/helpers.py",
"lines": "11-22"
},
{
"title": "[P2] Sequential API calls in update_finding_workflow could be parallelized",
"priority": 2,
"confidence_score": 0.90,
"file": "aws/actions/security_hub.py",
"lines": "68-82"
},
{
"title": "[P2] Unbounded finding lookup in update_finding_workflow",
"priority": 2,
"confidence_score": 0.85,
"file": "aws/actions/security_hub.py",
"lines": "73-90"
},
{
"title": "[P3] Missing batch size handling for >50 finding IDs in GuardDuty",
"priority": 3,
"confidence_score": 0.80,
"file": "aws/actions/guardduty.py",
"lines": "48-58"
},
{
"title": "[P3] list_metrics missing pagination limit parameter",
"priority": 3,
"confidence_score": 0.75,
"file": "aws/actions/cloudwatch.py",
"lines": "14-27"
},
{
"title": "[P3] Inconsistent pagination limits across actions",
"priority": 3,
"confidence_score": 0.70,
"file": "aws/config.json",
"lines": "N/A"
}
],
"overall_correctness": "patch is correct",
"overall_explanation": "The code is functional, secure, and follows best practices. The identified issues are optimization opportunities rather than bugs. The integration will work correctly in production, but addressing the P1 and P2 issues will significantly improve performance.",
"overall_confidence_score": 0.92
}🎯 Recommended Action ItemsBefore Merge:
Post-Merge (Nice to Have):
This is an automated review from Mrs. Reviewer in Autohive. The review was conducted by specialized AI agents focusing on security, performance, memory management, and general code quality. |
|
Thanks for the thorough review! The patch confidence score is appreciated. P1 (client caching): Noted for a future optimization pass. For v1, per-call client creation keeps things simple and avoids credential/region caching edge cases on Lambda. P2/P3 items: Most are already handled (chunked batching, next_token pagination) or are edge cases unlikely in practice (e.g. >50 detector IDs). We'll revisit if performance becomes a concern at scale. Overall a clean bill of health — thanks! |
@vish4004 - could you please create a GH issue for this future change. Edit: As we've discussed - in lambdas this is prob not required, but if this was used in a container, it might make sense. I think a ticket should be created to be looked at at a later stage. Another edit: new information has been surfaced (implication of caching in the context of AWS orgs) that have led us towards not doing this at all for now. |
|
The P2 items, I'd not deal with right now The P3 (max boundaries) - items are low hanging fruit, I'd say. Having validation around the AWS limits would stop problems earlier in the flow before requests even hit the AWS APIs. |
TheRealAgentK
left a comment
There was a problem hiding this comment.
Left some comments re my views.
I'd prefer the P3s to be addressed, but it's not a hill I'm going to die on 😄
|
@TheRealAgentK I don't think any of the issues are P1 in terms of impact it has, but an interpretation of an over eager bot trying to please. Re caching, while the bot thinks it as an edge case, however given how AWS sub orgs and regions work wherein a single access key and secret can have cross org/region access, so caching would generate other issues such as getting locked into a region or org. Right now AWS default's to us-east or root org and user can ask for a org or region in each tool call. This would have been different if were using OIDC auth that AWS supports. As for the limits AWS Api changes on a continuous pace and in our case it's the boto3 client that abstracting away the calls, which handlers the errors gracefully, and will also pass that to an the agent which can then handle then actively limits in runtime. I think that's better approach than handling the limits for each call manually. We are also not locking the boto3 sdk version as well which or for any other packages as well which would be needed as well if we are setting up limits. |
Ok, that's a good reason for not caching then. Let's scrap the idea of the P1 for now. |
|
I pushed a few more commits from changes due to the new integration tooling into the branch. Please check that they are still ok - most are rule exclusions for valid code that linters and security checkers don't interpret correctly to 💯 |
|
Tooling checks now: and |
|
Last thing to do in my view (apart from functional testing on your end) is getting a 512x512 icon. @vish4004 |
The __init__.py caused a circular import chain: context.py -> aws.aws -> actions -> security_hub -> aws.aws (still initializing) This integration is loaded as an entry point, not as a package, so __init__.py is not needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🔍 Integration Validation ResultsCommit: Changed directories:
|
…ibe_trails input access
Summary
actions/subdirectoryTest plan