fix(maxquant): Filter out decoys with decoy column by tonywu1999 · Pull Request #133 · Vitek-Lab/MSstatsConvert

tonywu1999 · 2026-05-22T15:35:37Z

Motivation and Context

https://groups.google.com/g/msstats/c/NwsByfS2Y5M

I was using a MaxQuant output for the first time with MSstats. I ran MaxQtoMSstatsFormat then dataProcess and realized there were "REV_" (reverse) sequences in the processed_data$ProteinLevelData slot. Looking back at an old MaxQuant output shows that the decoy columns used to be named "Reverse" in the proteinGroup.txt and evidence.txt output files. Using MaxQuant 2.8.0.0 the "Reverse" column is now missing from these files and there is a new column called "Decoy" that seems to represent the same information. Although I couldn't find any documentation of this change, I believe this is why I saw reverse sequences in my processed data.
I reran MSstats but changed "Decoy" column names to "Reverse" in the proteinGroup and evidence data which resulted in MSstats removing decoy sequences and they were no longer present in the processed data.

Motivation and Context

MaxQuant proteomics software has introduced the "Potential.contaminant" column in its output format as an additional means of identifying potentially problematic proteins. The .cleanRawMaxQuant() function previously filtered proteins only based on the Contaminant, Reverse, and Decoy columns. This PR updates the function to also filter out proteins marked in the new Potential.contaminant column, ensuring that the MSstatsConvert package properly handles recent changes to MaxQuant's output format and prevents potentially problematic proteins from being included in downstream analysis.

Changes

R/clean_MaxQuant.R:
- Added "Potentialcontaminant" to the filter_cols vector (line 20) to filter rows where this column contains marked values
- Updated the informational message (lines 21-22) to include "Potential.contaminant" in the list of filtered protein categories
- Updated the informational message for remove_by_site = TRUE case (lines 25-26) to also mention "Potential.contaminant" alongside existing filters ("Contaminant", "Reverse", "Decoy", and "Only.identified.by.site")
- Total changes: +3/-3 lines

Unit Tests

No unit tests were added or modified in this PR. The existing test suite in inst/tinytest/test_cleanRaw.R does test MaxQuant cleaning functionality (lines 40-55), but the test data already contains the Potential.contaminant column in the mq_pg.csv file, so the filtering behavior is implicitly covered by existing tests. No explicit new test cases were created to specifically validate the Potentialcontaminant filtering behavior.

Coding Guidelines

No violations of coding guidelines identified. The changes follow the existing code patterns and maintain consistency with the R coding style used throughout the package.

coderabbitai · 2026-05-22T15:35:54Z

📝 Walkthrough

Walkthrough

.cleanRawMaxQuant() now includes Potentialcontaminant as an additional column for filtering contaminants. The function's status message is updated to report Potential.contaminant alongside the existing Contaminant, Reverse, and Decoy filters when removing flagged rows.

Changes

Contaminant Filtering Enhancement

Layer / File(s)	Summary
Expand contaminant filter columns and status message `R/clean_MaxQuant.R`	The contaminant filter configuration is expanded to include `Potentialcontaminant` as an additional filter column alongside `Contaminant`, `Reverse`, and `Decoy`. The informational status message is updated to mention `Potential.contaminant` in the filtering output.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Poem

🐰 A contaminant lurks in the data stream,
Potential and actual—now caught by the scheme!
With dots in the names and filters aligned,
MaxQuant data shines, with contaminants consigned!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title mentions filtering decoys with a decoy column, but the actual change adds 'Potential.contaminant' filtering, not decoy-related changes.	Update the title to accurately reflect that the change filters Potential.contaminant column alongside existing contaminant-like filters.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description is comprehensive and well-structured, addressing motivation, changes, testing, and coding guidelines.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-maxquant

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/clean_MaxQuant.R`:
- Around line 20-27: The current filter_cols in clean_MaxQuant.R uses incorrect
MaxQuant header names (e.g., "Potentialcontaminant" and "Decoy") so filtering
silently skips expected columns; change filter_cols to use the literal MaxQuant
column names present in our inputs (e.g., "Contaminant",
"Potential.contaminant", "Reverse") and remove "Decoy"; if remove_by_site is
true append "Only.identified.by.site" to filter_cols and update the msg text to
match these literal names so the log reflects the actual columns being filtered;
locate the filter_cols and msg variables in clean_MaxQuant.R to make this edit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d5e62551-f121-49e2-8429-864235c2d3d7

📥 Commits

Reviewing files that changed from the base of the PR and between b9564f2 and 40b0139.

📒 Files selected for processing (1)

R/clean_MaxQuant.R

fix(maxquant): Fix MaxQuant converter w.r.t. recent MaxQ changes

40b0139

tonywu1999 changed the title ~~fix(maxquant): Fix MaxQuant converter w.r.t. recent MaxQ changes~~ fix(maxquant): Filter out decoys with decoy column May 22, 2026

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread R/clean_MaxQuant.R

tonywu1999 merged commit 3145427 into devel May 27, 2026
2 checks passed

tonywu1999 deleted the fix-maxquant branch May 27, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(maxquant): Filter out decoys with decoy column#133

fix(maxquant): Filter out decoys with decoy column#133
tonywu1999 merged 1 commit into
develfrom
fix-maxquant

tonywu1999 commented May 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tonywu1999 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Motivation and Context

Changes

Unit Tests

Coding Guidelines

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading