airdos04c parser time and memory optimization by cisar2218 · Pull Request #66 · UniversalScientificTechnologies/DOSPORTAL

cisar2218 · 2026-04-28T22:02:48Z

No description provided.

Copilot

Pull request overview

Optimizes the AIRDOS04C log parser for improved runtime/memory by avoiding per-block event lists and constructing the output via NumPy, and adds basic parsing-duration instrumentation in the spectral record processing task.

Changes:

Refactor AIRDOS04C parsing to compute high-energy histograms in a single pass and build the DataFrame from a preallocated NumPy array.
Add optional UNIX time alignment via $TIME-derived offset when computing time_ms.
Add timing printout around parse_log_to_unified(...) in the async spectral record task.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`backend/DOSPORTAL/tasks/spectral_records.py`	Adds parsing duration measurement/log output around unified parsing.
`backend/DOSPORTAL/services/parsing/parsers/airdos_04c.py`	Refactors AIRDOS04C parser to reduce allocations and speed up DataFrame construction.

Comments suppressed due to low confidence (1)

backend/DOSPORTAL/services/parsing/parsers/airdos_04c.py:160

metadata no longer includes channel_columns. Other parsers consistently provide this (e.g., parsers/airdos_04a.py:102-115, parsers/geodos_1024_v1.py:88-101), and some callers look for it before falling back to column-name scanning. Please add channel_columns: channel_names back to keep the unified parsing metadata consistent across parsers.

        metadata: dict[str, object] = {
            "header": raw_header,
            "format": self.format_name,
            "version": self.version,
            "records_count": int(len(df)),
            "channels_count": int(len(channel_names)),
            "skipped_rows": int(skipped_rows),
            "time_range_ms": [
                float(df["time_ms"].min()),
                float(df["time_ms"].max()),
            ],
            "high_energy_bin_edges": bin_edges,
            "high_energy_adc_max": self.ADC_MAX,
            "start_unix_s": int(start_unix_s) if start_unix_s is not None else None,
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T22:06:10Z

        total_channels = self.LOW_CHANNELS + self.N_HIGH_BINS
-        channel_names = [f"channel_{i}" for i in range(total_channels)]



total_channels is recomputed at line 141 but the value isn’t used afterward (the earlier total_channels is what drives channel_names/channel_arr). This looks like a leftover from the previous implementation; removing the redundant assignment would reduce confusion.

Copilot · 2026-04-28T22:06:11Z

+            import time
+            _t0 = time.perf_counter()
            parsed = parse_log_to_unified(combined)
+            print(f"Parsing done in {time.perf_counter() - _t0:.3f}s (record {record.id}, {len(raw_files)} files)")


The new timing instrumentation uses print(...) inside this background task. For production/async task runs this can be noisy and bypass structured logging/levels; prefer using the module logger (e.g., debug/info) for this timing metric so it can be filtered/collected consistently.

Suggested change

print(f"Parsing done in {time.perf_counter() - _t0:.3f}s (record {record.id}, {len(raw_files)} files)")

logger.debug(

"Parsing done in %.3fs (record %s, %s files)",

time.perf_counter() - _t0,

record.id,

len(raw_files),

)

Copilot · 2026-04-28T22:06:11Z

+        df = pd.DataFrame(channel_arr, columns=channel_names)
+        df.insert(0, "particle_count", np.array(particle_count_list, dtype=np.int32))
+        df.insert(0, "time_ms", np.array(time_list, dtype=np.float64))


time_ms is no longer normalized to start at 0.0 (other parsers do time_ms -= min(time_ms) and tests expect df["time_ms"].min() == 0.0). With the AIRDOS04C fixture values (~7.5e9 ms), this will break consumers that assume relative time. Suggest subtracting the minimum from the time_ms array/column after it’s built, and ensure metadata["time_range_ms"] reflects the normalized range.

Suggested change

df = pd.DataFrame(channel_arr, columns=channel_names)

df.insert(0, "particle_count", np.array(particle_count_list, dtype=np.int32))

df.insert(0, "time_ms", np.array(time_list, dtype=np.float64))

time_ms_arr = np.array(time_list, dtype=np.float64)

time_ms_arr -= time_ms_arr.min()

df = pd.DataFrame(channel_arr, columns=channel_names)

df.insert(0, "particle_count", np.array(particle_count_list, dtype=np.int32))

df.insert(0, "time_ms", time_ms_arr)

airdos04c parser time and memory optimization

bf44783

Copilot AI review requested due to automatic review settings April 28, 2026 22:02

Copilot started reviewing on behalf of cisar2218 April 28, 2026 22:03 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

code quality: unused var removed

20ce4ab

cisar2218 merged commit 369771e into master Apr 28, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

airdos04c parser time and memory optimization#66

airdos04c parser time and memory optimization#66
cisar2218 merged 2 commits into
masterfrom
update-parsers

cisar2218 commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		total_channels = self.LOW_CHANNELS + self.N_HIGH_BINS
		channel_names = [f"channel_{i}" for i in range(total_channels)]

-            print(f"Parsing done in {time.perf_counter() - _t0:.3f}s (record {record.id}, {len(raw_files)} files)")
+            logger.debug(
+                "Parsing done in %.3fs (record %s, %s files)",
+                time.perf_counter() - _t0,
+                record.id,
+                len(raw_files),
+            )

Conversation

cisar2218 commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants