DM-52886: Update bps report function to aggregate panda task slices into task labels#97
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #97 +/- ##
==========================================
- Coverage 36.26% 35.26% -1.01%
==========================================
Files 13 13
Lines 1249 1296 +47
Branches 204 212 +8
==========================================
+ Hits 453 457 +4
- Misses 776 819 +43
Partials 20 20 ☔ View full report in Codecov by Sentry. |
664c5a3 to
7a6a3fb
Compare
7a6a3fb to
0df3cf8
Compare
| @@ -0,0 +1 @@ | |||
| Update bps report function to aggregate panda task slices into task labels | |||
There was a problem hiding this comment.
A nitpick, could you please enclose bps report in double-back quotes (``bps report``).
| aggregated_run: `str` | ||
| A semicolon-separated string with aggregated job counts | ||
| by base label. | ||
| """ |
There was a problem hiding this comment.
I'd rewrite the Returns section as follow:
Returns
-------
aggregated_jobs : `dict` [`str`, `dict` [`str`, `int`]]
A dictionary mapping each base label to the summed job state counts
across all matching labels.
aggregated_exits : `dict` [`str`, `list` [`int`]]
A dictionary mapping each base label to a combined list of exit codes
from all matching labels.
aggregated_run : `str`
A semicolon-separated string with aggregated job counts by base label.
(See an example in the LSST Dev Guide.)
| Returns | ||
| ------- | ||
| `str`: | ||
| The extracted task name as per the rules above. |
There was a problem hiding this comment.
Please rewrite the Returns section as something like:
taskname: `str`
The extracted task name as per the rules above.
According to the LSST Dev Guide, the short-hand syntax that avoids naming a returned value will not render properly (see the warning in this section).
|
|
||
| Parameters | ||
| ---------- | ||
| s: `str` |
There was a problem hiding this comment.
A single space before and after the colon (:) is required in the parameter declaration (src). Please fix and elsewhere if needed.
| def extract_taskname(s: str) -> str: | ||
| """ | ||
| Extract the task name from a string that follows a pattern | ||
| CampaignName_timestamp_TaskNumber_TaskLabel_ChunkNumber. |
There was a problem hiding this comment.
The short summary should immediately follow the opening triple quotes, i.e.:
"""Extract the task name from a string that follows a pattern
CampaignName_timestamp_TaskNumber_TaskLabel_ChunkNumber.
Please fix here and in aggregate_by_basename().
| raise RuntimeError( | ||
| f"Error to get workflow status: {new_ret} for id: {wms_workflow_id}" | ||
| f"Error getting workflow status: {new_ret} for id: {wms_workflow_id}" | ||
| ) |
There was a problem hiding this comment.
To avoid code repetition, you may consider encapsulating the logic of querying the iDDS (lines 176-182 and 282-293) in a separate function.
| if nfailed > 0 and len(exit_codes_all[tasklabel]) == 0: | ||
| _LOG.warning( | ||
| f"No exit codes in iDDS task info for workload {transform_workload_id}" | ||
| ) |
There was a problem hiding this comment.
Since DM-51261 was merged, ctrl_bps performs a similar check and prints out a warning if such a situation occurs. For this reason, we may consider using DEBUG level here to avoid printing two separate warnings concerning the same thing .
| totaljobs = task.get("output_total_files", 0) | ||
| wms_report.total_number_jobs += totaljobs | ||
|
|
||
| taskstatus = {} |
There was a problem hiding this comment.
Unless I missed something, it doesn't look like taskstatus is used in this for loop at all. If that's true, can we move this line closer to the place where taskstatus is actually used. For example, can we put it after the comment in line 313?
| # Sort tasks by workload_id or fallback | ||
| try: | ||
| tasks.sort(key=lambda x: x["transform_workload_id"]) | ||
| except Exception: |
There was a problem hiding this comment.
Can we narrow the exception clause to KeyError? Exception seems too broad in this context.
…abels
Checklist
doc/changes