Skip to content

Freed heavy fit objects and large dataProcess intermediates early#207

Open
tonywu1999 wants to merge 4 commits into
develfrom
MSstats/work/20260514_free-heavy-objects
Open

Freed heavy fit objects and large dataProcess intermediates early#207
tonywu1999 wants to merge 4 commits into
develfrom
MSstats/work/20260514_free-heavy-objects

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented May 14, 2026

User description

  • Stripped survreg() $y / $linear.predictors and lm() $model after fitting in .fitSurvival and .fitLinearModel; downstream predict / summary / vcov paths are unaffected.
  • Rewrote .updateUnequalVariances to mutate a single data.frame in place instead of rebuilding it ~5x per iteration via data.frame(...).
  • Added rm(raw), rm(peptides_dict), and gc(verbose = FALSE) inside dataProcess() so the ~250-400 MB held by function arguments is freed before MSstatsSummarizationOutput runs.
  • Added rm(survival_fit) and rm(fit) inside the per-protein linear / TMP summarizers so local fit objects drop as soon as survival / result is materialised.
  • Tests: inst/tinytest/test_memory_optimization.R Issue 1 (tests 1a, 1b) — 14 assertions, all green.

PR Type

Bug fix, Enhancement, Tests


Description

  • Free large pipeline inputs earlier

  • Strip heavy fields from fit objects

  • Avoid repeated copies in variance updates

  • Add fit memory regression tests


Diagram Walkthrough

flowchart LR
  raw["raw and peptides_dict inputs"]
  proc["dataProcess() processing pipeline"]
  fits["Stripped survreg and lm objects"]
  var["In-place unequal variance updates"]
  tests["Memory optimization regression tests"]
  raw -- "freed earlier" --> proc
  fits -- "reduce object size" --> proc
  var -- "avoids repeated copies" --> proc
  proc -- "validated by" --> tests
Loading

File Walkthrough

Relevant files
Bug fix
dataProcess.R
Free pipeline intermediates and fit objects early               

R/dataProcess.R

  • Remove raw after initial preparation
  • Remove peptides_dict after normalization
  • Run gc(verbose = FALSE) before output assembly
  • Drop local survival_fit and fit sooner
+11/-3   
Enhancement
utils_imputation.R
Strip unused `survreg` fields after fitting                           

R/utils_imputation.R

  • Clear fit$y after survreg() fitting
  • Clear fit$linear.predictors before returning
  • Keep downstream prediction behavior intact
+3/-1     
utils_summarization.R
Reduce `lm` footprint and variance-copy churn                       

R/utils_summarization.R

  • Clear linear_model$model before returning
  • Convert input to data.frame once upfront
  • Add and remove helper columns in place
  • Avoid repeated full-frame copies per iteration
+17/-15 
Tests
test_memory_optimization_fits.R
Add regression tests for memory-stripped fits                       

inst/tinytest/test_memory_optimization_fits.R

  • Add tests for stripped survreg fields
  • Verify stripped survival fits still predict
  • Add tests for stripped lm$model
  • Verify summary() and vcov() still work
+111/-0 

Motivation and Context

This PR reduces peak memory use in MSstats' data-processing and per-protein summarization by freeing large intermediate objects and by stripping heavy fields from fitted model objects that are not required for downstream use. Two main memory sources are targeted: (1) large intermediate data structures created during dataProcess() (raw, peptides_dict, etc.), and (2) large components inside fitted model objects (survreg and lm) that are retained unnecessarily across loop iterations. The PR also reduces copy churn in the unequal-variance weighting loop. Together these changes free on the order of ~250–400 MB during dataProcess in observed benchmarks.

Short summary of the solution

  • Remove large intermediates (rm(raw), rm(peptides_dict)) as soon as they are no longer needed and call gc(verbose = FALSE) before the final summarization output.
  • Strip large, unneeded fields from fitted models: survreg objects have $y and $linear.predictors nulled; lm objects have $model nulled.
  • Refactor .updateUnequalVariances to perform in-place data.table updates (add/remove intermediate columns via assignment/NULL) to avoid repeated full data.frame reconstruction.
  • Remove local fit objects (rm(survival_fit), rm(fit)) immediately after per-protein results are materialized in summarizers.
  • Add tests and a benchmark script to verify stripping behavior and to measure peak memory.

Detailed changes

  • R/dataProcess.R

    • After MSstatsPrepareForDataProcess: rm(raw).
    • After MSstatsNormalize: rm(peptides_dict).
    • Call gc(verbose = FALSE) after summarization returns and before MSstatsSummarizationOutput to free memory held by function arguments.
    • Per-protein summarization paths (linear/TMP) remove local fit objects immediately after results are produced (explicit rm(survival_fit) / rm(fit) in summarizers called from MSstatsSummarizeWithMultipleCores).
  • R/utils_imputation.R (.fitSurvival)

    • Ensures correct branch closure for unlabeled/multi-feature case.
    • After survreg fitting, sets fit$y = NULL and fit$linear.predictors = NULL before returning the fit (preserves coefficients/scale/etc.).
  • R/utils_summarization.R (.fitLinearModel, .updateUnequalVariances)

    • .fitLinearModel: strips linear_model$model = NULL before returning.
    • .updateUnequalVariances: rewritten to:
      • Convert input to data.table (copy if already data.table, otherwise as.data.table).
      • Iteratively compute abs.resids, fitted, loess.fitted, and weight using in-place column assignments.
      • Remove intermediate columns by assigning NULL rather than rebuilding the entire frame.
      • Re-fit weighted lm using lm(formula(fit), data = input, weights = weight) and return wls.fit.
    • Overall behavior and final weighted re-fit logic preserved.
  • Per-protein summarization orchestration

    • MSstatsSummarizeWithMultipleCores / summarizer internals export and cluster handling unchanged; added local removals of fitted objects in single-core / multi-core summarization code paths to avoid retention across iterations.
  • New benchmark and helpers

    • benchmark/profile_dataprocess_peak.R: CLI script to replay dataProcess internals, record gc() Vcells used/max used and elapsed time at each stage, and print peak memory, baseline, and peak-above-baseline delta.

Unit tests added or modified

  • inst/tinytest/test_memory_optimization_fits.R (new)

    • Tests .fitSurvival(): asserts fit$y and fit$linear.predictors are NULL, coefficients and scale remain non-NULL, predict() works on the stripped fit, and object.size() for stripped fit is smaller than unstripped survreg().
    • Tests .fitLinearModel(): asserts lm$model is NULL, coefficients/qr/residuals remain non-NULL, summary() and vcov() work on the stripped lm, and object.size() for stripped fit is smaller than unstripped lm().
    • The file contains the assertions described in the PR (14 assertions reported in the PR description).
  • benchmark/profile_dataprocess_peak.R (new)

    • Not a unit test but included to measure per-stage peak memory and elapsed time for dataProcess.

No existing tests were modified; new tests follow the project's tinytest pattern and validate both correctness (downstream APIs) and memory reduction.

Coding guidelines / potential violations

  • No API-signature or exported/public function declarations were changed.
  • Changes preserve downstream behavior: predict(), summary(), and vcov() continue to work on stripped fits per added tests.
  • The .updateUnequalVariances refactor uses data.table idioms (in-place assignments and NULL removal); this is consistent with improved memory/copy behavior and adheres to common R/data.table practices.
  • No coding guideline violations were observed in the diff reviewed: no exported signatures changed, no behavioral changes to public functions, and tests were added to validate correctness.

Review Change Stack

* Stripped survreg() $y / $linear.predictors and lm() $model after
  fitting in .fitSurvival and .fitLinearModel; downstream predict /
  summary / vcov paths are unaffected.
* Rewrote .updateUnequalVariances to mutate a single data.frame in
  place instead of rebuilding it ~5x per iteration via data.frame(...).
* Added rm(raw), rm(peptides_dict), and gc(verbose = FALSE) inside
  dataProcess() so the ~250-400 MB held by function arguments is freed
  before MSstatsSummarizationOutput runs.
* Added rm(survival_fit) and rm(fit) inside the per-protein linear /
  TMP summarizers so local fit objects drop as soon as survival /
  result is materialised.
* Tests: inst/tinytest/test_memory_optimization.R Issue 1 (tests 1a,
  1b) — 14 assertions, all green.

See MSstats-ai/todos/active/TODO-MS-20260514_fix-memory-bugs.md

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

Frees large intermediates and fitted-model internals during data processing and summarization, fixes a conditional in .fitSurvival(), refactors unequal-variance updates to avoid repeated copies, and adds tests and a benchmark to measure peak memory.

Changes

Memory Optimization in Data Processing and Fitting

Layer / File(s) Summary
Pipeline intermediate and fitted-model cleanup
R/dataProcess.R
Removes large intermediates (raw, peptides_dict) immediately after use, calls gc(verbose = FALSE) before summarization, and explicitly removes per-iteration fitted objects (survival_fit, fit) after their results are extracted in both linear and TMP summarization paths.
Survival fit return cleanup and conditional fix
R/utils_imputation.R
Adds missing brace to terminate a nested branch in .fitSurvival() and nulls fit$y and fit$linear.predictors on the returned survreg-like object.
Unequal-variance adjustment refactor
R/utils_summarization.R
Refactors .updateUnequalVariances() to convert input to a data.table once and perform iterative loess/weight/WLS steps using in-place column assignments and NULL-based cleanup instead of repeated full data.frame reconstruction.
Tests validating memory-optimized fits
inst/tinytest/test_memory_optimization_fits.R
Adds tests asserting that .fitSurvival() strips $y and $linear.predictors but preserves coefficients/scale and predictability; and that .fitLinearModel() strips $model while preserving coefficients/qr/residuals and summary/vcov behavior; compares object.size() to unstripped fits.
Benchmark for peak memory profiling
benchmark/profile_dataprocess_peak.R
Adds a script that replays internal dataProcess stages with checkpointed gc() sampling to report per-stage timings and peak GC memory (Vcells) and a summary peak-above-baseline metric.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Vitek-Lab/MSstats#193: Modifies the same summarization pipeline (R/dataProcess.R) and may overlap with model-cleanup locations.

Suggested labels

Review effort 3/5

Suggested reviewers

  • mstaniak
  • devonjkohler
  • anshuman-raina

Poem

🐰 I hopped through the code with tidy delight,

Cleared bulky fits and made memory light,
Raw crumbs swept up, predictions still sing,
Tests give a wink and the benchmark will sing,
A small tidy patch — a rabbit's neat spring.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: freeing heavy fit objects and large intermediate objects early in dataProcess, which is the primary goal of the changeset.
Description check ✅ Passed The PR description covers all required template sections: motivation (memory optimization), detailed changes (5 bullet points addressing fit object stripping and data.frame optimization), and testing (test file with 14 assertions). Checklist items are not marked but the description is substantive and complete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstats/work/20260514_free-heavy-objects

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Unguarded cleanup

In MSstatsSummarizeSingleTMP, survival_fit is created inside a tryCatch and the code already handles the failure case by setting converged = FALSE. The new unconditional rm(survival_fit) runs even when fitting failed, so this path now emits object 'survival_fit' not found warnings and can become a hard failure in environments that promote warnings to errors.

rm(survival_fit)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/utils_summarization.R`:
- Around line 208-223: The loess iterations reuse stale fitted values because
input[["fitted"]] is only set from the initial fit; after each weighted refit
(wls.fit) update the stored fitted values so subsequent loess calls use
consistent residual/fitted pairs — e.g. inside the loop after computing wls.fit
(the variable wls.fit) assign input[["fitted"]] <- fitted(wls.fit) or
input[["fitted"]] <- wls.fit$fitted.values before recalculating
input[["abs.resids"]] and proceeding to the next iteration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7d6992f3-0ec8-4947-80c1-53b71f9e9caa

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa0bd1 and 5693fc9.

📒 Files selected for processing (4)
  • R/dataProcess.R
  • R/utils_imputation.R
  • R/utils_summarization.R
  • inst/tinytest/test_memory_optimization_fits.R

Comment thread R/utils_summarization.R
Comment on lines 208 to 223
for (i in seq_len(num_iter)) {
if (i == 1) {
abs.resids = data.frame(abs.resids = abs(fit$residuals))
fitted = data.frame(fitted = fit$fitted.values)
input = data.frame(input,
"abs.resids" = abs.resids,
"fitted" = fitted)
input[["abs.resids"]] = abs(fit$residuals)
input[["fitted"]] = fit$fitted.values
}
fit.loess = loess(abs.resids ~ fitted, data = input)
loess.fitted = data.frame(loess.fitted = fitted(fit.loess))
input = data.frame(input, "loess.fitted" = loess.fitted)
## loess fitted valuaes are predicted sd
input$weight = 1 / (input$loess.fitted ^ 2)
input = input[, !(colnames(input) %in% "abs.resids")]
input[["loess.fitted"]] = fitted(fit.loess)
## loess fitted values are predicted sd
input[["weight"]] = 1 / (input[["loess.fitted"]] ^ 2)
input[["abs.resids"]] = NULL
## re-fit using weight
wls.fit = lm(formula(fit), data = input, weights = weight)
abs.resids = data.frame(abs.resids = abs(wls.fit$residuals))
input = data.frame(input, "abs.resids" = abs.resids)
input = input[, -which(colnames(input) %in% c("loess.fitted", "weight"))]
input[["abs.resids"]] = abs(wls.fit$residuals)
input[["loess.fitted"]] = NULL
input[["weight"]] = NULL
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh fitted after each weighted refit.

At Line 220 you refresh abs.resids from wls.fit, but fitted is never refreshed after Line 211. For num_iter > 1, later loess iterations use stale fitted values and inconsistent residual/fitted pairs.

Suggested fix
     for (i in seq_len(num_iter)) {
         if (i == 1) {
             input[["abs.resids"]] = abs(fit$residuals)
             input[["fitted"]] = fit$fitted.values
         }
         fit.loess = loess(abs.resids ~ fitted, data = input)
         input[["loess.fitted"]] = fitted(fit.loess)
         ## loess fitted values are predicted sd
         input[["weight"]] = 1 / (input[["loess.fitted"]] ^ 2)
         input[["abs.resids"]] = NULL
         ## re-fit using weight
         wls.fit = lm(formula(fit), data = input, weights = weight)
         input[["abs.resids"]] = abs(wls.fit$residuals)
+        input[["fitted"]] = wls.fit$fitted.values
         input[["loess.fitted"]] = NULL
         input[["weight"]] = NULL
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_summarization.R` around lines 208 - 223, The loess iterations reuse
stale fitted values because input[["fitted"]] is only set from the initial fit;
after each weighted refit (wls.fit) update the stored fitted values so
subsequent loess calls use consistent residual/fitted pairs — e.g. inside the
loop after computing wls.fit (the variable wls.fit) assign input[["fitted"]] <-
fitted(wls.fit) or input[["fitted"]] <- wls.fit$fitted.values before
recalculating input[["abs.resids"]] and proceeding to the next iteration.

@mstaniak
Copy link
Copy Markdown
Contributor

Thanks, fantastic update.
I wonder if people ever use the unequal variances option, btw.

Copy link
Copy Markdown
Contributor

@mstaniak mstaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for his fantastic update, just one question

Comment thread R/utils_summarization.R Outdated
Comment thread inst/tinytest/test_memory_optimization_fits.R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/profile_dataprocess_peak.R`:
- Around line 58-60: The required-column validation in
profile_dataprocess_peak.R currently defines required_cols as c("ProteinName",
"PeptideSequence", "PrecursorCharge", "FragmentIon", "ProductCharge",
"IsotopeLabelType", "Run", "BioReplicate", "Condition", "Intensity") which omits
PeptideModifiedSequence expected by the package validity path and risks later
failures; update the required_cols vector (and the analogous check around lines
85-89) to include "PeptideModifiedSequence", and ensure any check/handling for
"AnomalyScores" remains separate (i.e., validate presence of
PeptideModifiedSequence as required while treating AnomalyScores as an
optional/special-case column).
- Around line 143-148: The benchmark is using non-default parameters which makes
results non-comparable to dataProcess(); update the calls to
MSstats:::MSstatsSelectFeatures and MSstats:::MSstatsPrepareForSummarization to
use the same default knobs as dataProcess() rather than the explicit overrides
(featureSubset = "all", n_top_feature = 3, remove_uninformative_feature_outlier
= TRUE), i.e. remove the custom arguments and call the functions with their
defaults (or explicitly pass the same parameter values used by dataProcess()) so
the benchmark reflects the standard dataProcess() baseline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b06a9a32-3f43-4c52-841e-6e968edfde2d

📥 Commits

Reviewing files that changed from the base of the PR and between 9edb718 and d07f4d1.

📒 Files selected for processing (1)
  • benchmark/profile_dataprocess_peak.R

Comment on lines +58 to +60
required_cols <- c("ProteinName", "PeptideSequence", "PrecursorCharge",
"FragmentIon", "ProductCharge", "IsotopeLabelType",
"Run", "BioReplicate", "Condition", "Intensity")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten required-column validation to match the pipeline contract.

The script currently accepts a 10-column schema, but the package validity path expects PeptideModifiedSequence too (with AnomalyScores handled separately). This can admit unsupported inputs and fail later in less obvious places.

Suggested patch
-required_cols <- c("ProteinName", "PeptideSequence", "PrecursorCharge",
+required_cols <- c("ProteinName", "PeptideSequence", "PeptideModifiedSequence", "PrecursorCharge",
                    "FragmentIon", "ProductCharge", "IsotopeLabelType",
                    "Run", "BioReplicate", "Condition", "Intensity")

Also applies to: 85-89

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/profile_dataprocess_peak.R` around lines 58 - 60, The
required-column validation in profile_dataprocess_peak.R currently defines
required_cols as c("ProteinName", "PeptideSequence", "PrecursorCharge",
"FragmentIon", "ProductCharge", "IsotopeLabelType", "Run", "BioReplicate",
"Condition", "Intensity") which omits PeptideModifiedSequence expected by the
package validity path and risks later failures; update the required_cols vector
(and the analogous check around lines 85-89) to include
"PeptideModifiedSequence", and ensure any check/handling for "AnomalyScores"
remains separate (i.e., validate presence of PeptideModifiedSequence as required
while treating AnomalyScores as an optional/special-case column).

Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
Comment thread benchmark/profile_dataprocess_peak.R Outdated
@tonywu1999
Copy link
Copy Markdown
Contributor Author

@Rudhik1904 If you could report your benchmarking results with a large dataset in this PR (i.e. amount of memory freed up, etc.), that would be great for documentation purposes

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
R/utils_summarization.R (1)

219-221: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh fitted after each weighted refit.

At Line 219, abs.resids is updated from wls.fit, but fitted is not. For num_iter > 1, subsequent loess steps use stale fitted values.

Suggested fix
         wls.fit = lm(formula(fit), data = input, weights = weight)
         input[, abs.resids := abs(wls.fit$residuals)]
+        input[, fitted := wls.fit$fitted.values]
         input[, loess.fitted := NULL]
         input[, weight := NULL]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_summarization.R` around lines 219 - 221, The code updates input[,
abs.resids := abs(wls.fit$residuals)] but does not refresh the fitted values, so
subsequent iterations (when num_iter > 1) use stale predictions; after computing
wls.fit add a line to update the stored fitted values (e.g., set input[, fitted
:= wls.fit$fitted] or equivalent) so each loess reweighting uses the most recent
fitted vector before clearing loess.fitted and weight.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/profile_dataprocess_peak.R`:
- Line 8: Replace the garbled header comment on line containing "mark and
elapsed time..." with a clear, readable description; e.g., rewrite the comment
used by the benchmark (the commented header string) to: "Mark elapsed time at
every stage boundary; useful for tracking the running max of `used` after
gc(reset = TRUE) to locate peak memory usage." Ensure you update the same
comment block that contains the malformed text so the benchmark header is
readable and informative.

---

Duplicate comments:
In `@R/utils_summarization.R`:
- Around line 219-221: The code updates input[, abs.resids :=
abs(wls.fit$residuals)] but does not refresh the fitted values, so subsequent
iterations (when num_iter > 1) use stale predictions; after computing wls.fit
add a line to update the stored fitted values (e.g., set input[, fitted :=
wls.fit$fitted] or equivalent) so each loess reweighting uses the most recent
fitted vector before clearing loess.fitted and weight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dad8d3f4-a2a6-430c-be4c-8fd22547e10c

📥 Commits

Reviewing files that changed from the base of the PR and between d07f4d1 and 4d2d41a.

📒 Files selected for processing (2)
  • R/utils_summarization.R
  • benchmark/profile_dataprocess_peak.R

# Peak-memory + wall-clock instrumentation for MSstats::dataProcess.
#
# Replays dataProcess's internals manually, printing the gc() high-water
# mark and elapsed time at every stage boundary. Useful forunning max of `used` since gc(reset = TRUE)r locating
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the garbled benchmark description comment.

Line 8 contains corrupted text (forunning ... r locating), which makes the header guidance harder to follow.

Suggested patch
-# mark and elapsed time at every stage boundary. Useful forunning max of `used` since gc(reset = TRUE)r locating
+# mark and elapsed time at every stage boundary. Useful for locating
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/profile_dataprocess_peak.R` at line 8, Replace the garbled header
comment on line containing "mark and elapsed time..." with a clear, readable
description; e.g., rewrite the comment used by the benchmark (the commented
header string) to: "Mark elapsed time at every stage boundary; useful for
tracking the running max of `used` after gc(reset = TRUE) to locate peak memory
usage." Ensure you update the same comment block that contains the malformed
text so the benchmark header is readable and informative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants