Wire up end to end call path (kafka -> monolith endpoints)#142
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
bc83d63 to
986ef83
Compare
704b165 to
608a506
Compare
986ef83 to
5623c49
Compare
5623c49 to
3827165
Compare
src/launchpad/sentry_client.py
Outdated
| return b"".join(parts) | ||
|
|
||
|
|
||
| def categorize_http_error(error_result: Dict[str, Any]) -> tuple[str, str]: |
There was a problem hiding this comment.
debatably helpful, can remove if ppl feel strongly
3827165 to
2b7ccae
Compare
| if self._statsd: | ||
| self._statsd.increment("launchpad.artifact.processing.failed") | ||
| # Re-raise to let Arroyo handle the error (can be configured for DLQ) | ||
| raise |
There was a problem hiding this comment.
Don't re-raise - let the consumer continue processing other messages
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #142 +/- ##
==========================================
+ Coverage 71.63% 73.81% +2.17%
==========================================
Files 93 106 +13
Lines 7320 8399 +1079
Branches 860 958 +98
==========================================
+ Hits 5244 6200 +956
- Misses 1724 1828 +104
- Partials 352 371 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if not self.app_info: | ||
| self.app_info = self.preprocess(artifact) | ||
|
|
||
| app_info = self.app_info |
There was a problem hiding this comment.
Now matches analyzers/apple.py and allows us to preprocess separately from the full size analysis
| from launchpad.size.models.apple import AppleAppInfo | ||
| from launchpad.size.models.common import BaseAnalysisResults | ||
|
|
||
|
|
There was a problem hiding this comment.
Allow us to treat preprocess & full size analysis separately
2b7ccae to
dd3bea7
Compare
src/launchpad/service.py
Outdated
| async def start(self) -> None: | ||
| """Start all service components.""" | ||
| if not self.server or not self.kafka_processor: | ||
| def process_artifact_analysis(self, artifact_id: str, project_id: str, organization_id: str) -> None: |
There was a problem hiding this comment.
Once we have more than size analysis, we can move a lot of this code into product specific files (ex. size.py . snapshots.py , etc) but for now I figure its ok to just have it all in this file
fb4e622 to
fbf1baf
Compare
src/launchpad/service.py
Outdated
| # Set up signal handlers for graceful shutdown | ||
| self._setup_signal_handlers() | ||
| try: | ||
| file_content, _ = self._download_artifact(sentry_client, artifact_id, project_id, organization_id) |
There was a problem hiding this comment.
I wonder if it's a problem that we are keeping this large file in memory and then writing it, and reading it all again (in ArtifactFactory). Ideally the download would write to the file without storing the entire thing in memory. Maybe we should at least make sure this file_content reference isn't held when we read it the second time?
src/launchpad/service.py
Outdated
| ) | ||
| raise | ||
|
|
||
| def _save_to_temp_file(self, file_content: bytes, artifact_id: str) -> str: |
There was a problem hiding this comment.
The handling of tempfile is a little odd. Ideally we would use the built in cleanup behaviour of tempfile rather than manually cleaning up ourselves. See comment at use.
src/launchpad/sentry_client.py
Outdated
| "file_content": content, | ||
| "file_size_bytes": len(content), | ||
| "headers": dict(response.headers), | ||
| } |
There was a problem hiding this comment.
The DX of dicts (both types and access) are quite bad for Python vs. TypeScript so what it very idiomatic in TypeScript (making the return type an anonymous object with some fields) is not the most common in Python.
If there are only ~2 fields it's more common to use a tuple:
def foo():
...
success = True
content = "bar"
return success, content
However if there are many fields the fastest thing is to use a NamedTuple:
class DownloadResult(NamedTuple):
success: bool
file_content: string
file_size_bytes: int
Which is a magic way to define an object with some fields (also handles a bunch of the operators). See:
https://typing.python.org/en/latest/spec/namedtuples.html#named-tuples
src/launchpad/sentry_client.py
Outdated
| content += chunk | ||
| if len(content) > 5 * 1024 * 1024 * 1024: # 5GB limit | ||
| logger.warning("Download truncated at 5GB") | ||
| break |
There was a problem hiding this comment.
I think we should avoid reading the whole file into a string if possible since they can be quite large.
Python has a nice abstraction for file-like objects which is what I would use here.
We could either take the file-like object as an out parameter:
def download_artifact(..., out):
for chunk in response.iter_content(chunk_size=8192):
out.write(chunk)
Or you can have download_artifact make a tempfile/in memory file:
def download_artifact(...):
f = tempfile.TemporaryFile()
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return f
def download_artifact(...):
f = io.StringIO()
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return f
src/launchpad/service.py
Outdated
| return file_content, file_size | ||
|
|
||
| except Exception as e: | ||
| if not isinstance(e, RuntimeError): |
There was a problem hiding this comment.
If you want to catch only some kinds of errors you can do:
try:
# ...
except RuntimeError as e:
# handle RuntimeError here
src/launchpad/service.py
Outdated
| try: | ||
| # Wait for shutdown signal | ||
| await self._shutdown_event.wait() | ||
| download_result = sentry_client.download_artifact( |
There was a problem hiding this comment.
Here it might be better to do:
with tempfile.TemporyFile() as tf:
sentry_client.download_artifact(org=organization_id, project=project_id, artifact_id=artifact_id, out=tf)
artifact = ArtifactFactory.fromFile(tf)
...
src/launchpad/service.py
Outdated
| ) | ||
|
|
||
| if not download_result.get("success"): | ||
| self._handle_download_error( |
There was a problem hiding this comment.
The control flow here seems odd, _handle_download_error ends up rasing iiuc. Can download_artifact just raise an error directly?

A kafka message will now trigger the following flow: