All notable changes to this project will be documented in this file.
- Added
docs/Stage5_Output_Reference.mddocumenting all 19 columns inStage5_curated_results.txtwith definitions, data types, column origins, internal-to-output name mapping, and interpretation guides for Confidence and score. (2026-03-14)
- Audited
docs/xMSannotator_Workflow.mdagainst codebase (19/20 references correct). Fixedalmost_equal()line reference in Stage 1 mass tolerance section, added missingfeature_id_columnto Key Parameters Reference table. (2026-03-14) - Updated
readme.mdwith Documentation & Examples section containing a table of contents for all guides, example scripts, and developer docs. Added inline references to detailed docs from Pipeline Overview, Confidence Levels, and Installation sections. Updateddocs/README.mdto fix staledata_transformation.mdlink and list all current documents and scripts. (2026-03-14) - Updated
docs/xMSannotator_Input_Formats.mdto reflect current codebase: added missinglevel1_primary_adductsparameter to Section 10, updated confidence level 4 label from "Boosted" to "Confirmed" in Section 6. (2026-03-14) - Renamed
docs/data_transformation.mdtodocs/advanced_annotation_input_formatting.md. Rewrote for current pipeline inputs: replaced obsolete recetox-aplcms conversion and HMDB RDA-to-Parquet content with documentation of XCMS feature table format, sample mapfile format, xlsx compound database format, and pre-processing steps (blank removal, fold-change filtering, peak table construction) as implemented in the Example Runscript. Feature Table section distinguishes required columns (mz,time, sample intensities) from workflow-specific QC metadata columns (XCMS stats, QA scores, blank/detection stats, CV columns) that are only needed for post-processing metadata joins and can be modified as needed. (2026-03-14) - Rewrote pipeline workflow reference document (
docs/xMSannotator_Workflow.md) for clarity and usability. Added Quick Start section with minimal example, replaced ASCII data flow diagram with numbered stage list, added per-stage "Key questions" callout boxes, added worked scoring example for Stage 3, replaced Stage 4 prose with pseudocode decision flowchart, added Parameter Decision Guide (instrument/study/tuning), and added Common Issues troubleshooting section. All existing algorithmic content, tables, formulas, and function references preserved. (2026-03-13)
- Added comprehensive pipeline workflow reference document (
docs/xMSannotator_Workflow.md). Covers all stages (1 through 5) with algorithmic details, scoring formulas, parameter defaults, theory explanations, data flow diagram, output file manifest, and full parameter reference table. (2026-03-13)
- Fixed Stage 5 redundancy filtering reading stale pre-upgrade data.
init_chemscoremat()inmultilevelannotationstep5.Rusedany(is.na())to detect whether a data frame was passed, but annotation data frames contain legitimate NA values inisotopologue/isotopologue_qualitycolumns, causing it to always fall back to reading the oldStage4_confidence_levels.txtfile. This discarded all confidence upgrades, caps, labels, and coherence filtering — reverting 136 Conf 2 rows to Conf 0/1 and reintroducing 83 incoherent rows. Fix: changed sentinel check tois_scalar_na()(detects the defaultNAparameter vs. a real data frame), updated fallback to readStage4b_confidence_levels.txt(post-coherence), and removed duplicate Stage5 file write from insidemultilevelannotationstep5(). (2026-03-10) - Fixed single M+H/M-H annotations stuck at Confidence 0 —
cap_confidence_with_evidence()now processes Level 0 compounds and rescues primary adduct matches to Level 1. Previously, cap only processed compounds with Confidence > 0, so single-row primary adduct matches that Stage 4 assigned Level 0 (due toscore > 10strict inequality andfilter_by = NULL) were never evaluated. (2026-03-10) - Fixed orphan isotope rows (no base adduct row present) incorrectly upgraded to Level 2. Both
upgrade_confidence_with_evidence()andcap_confidence_with_evidence()now count actual base rows (n_base_rows) separately from unique base adduct types (n_base_adducts). Isotope evidence tiers requiren_base_rows >= 1, preventing lone isotope rows (e.g.,M+ACN+H_[+1]with noM+ACN+Hrow) from being treated as isotope + base adduct evidence. (2026-03-10)
- Added
cap_confidence_with_evidence()post-hoc confidence cap function (multilevelannotationstep4.R). Runs after Stage 4 and the upgrade step to enforce hard evidence requirements on confidence values. Unlike upgrade (which only raises), the cap can lower confidence when evidence is insufficient. Cap tiers: Conf 3 requires isotope rows + adduct + module/RT coherence, Conf 2 requires 2+ base adducts + coherence, Conf 1 requires a single primary adduct match, everything else caps to 0. Skips Conf 0 and Conf 4 (user-confirmed). (2026-03-10) - Added
add_confidence_labels()function (multilevelannotationstep4.R). Maps numeric Confidence column to human-readableConfidence_Leveltext labels (None/Low/Medium/High/Confirmed) in all output files. (2026-03-10) - Added
level1_primary_adductsparameter toadvanced_annotation()(default:c("M+H", "M-H")). Controls which adducts qualify for Confidence 1 as a single match in the evidence cap. Independent offilter_by, allowingfilter_by = NULLfor equal Stage 4 scoring while still requiring primary ion evidence for Level 1. (2026-03-10)
- Sorted Stage4a and Stage4b output files by Confidence (descending), then compound_id, score, and Adduct. All rows for a compound are now grouped together, ordered from highest to lowest confidence. Matches the existing Stage5 sort order from
multilevelannotationstep5(). (2026-03-10) - Renamed confidence level 4 label from "Boosted" to "Confirmed" in documentation. (2026-03-10)
- Removed false Confidence 2 assignments for compounds without corroborating evidence. Two changes: (1) Removed boost block in
compute_confidence_for_compound()that unconditionally upgraded any weighted adduct with score > 10 from Conf 0/1 → Conf 2, bypassing filter checks. This was redundant for filter compounds (Stage 4 already assigns Conf 2 for filter matches) and too aggressive for non-filter compounds (filter_by=NULL path assigns Conf 0 becausehas_filter_match()returns FALSE). (2) Removedhas_isotope_boostscore proxy (score >= 100) fromupgrade_confidence_with_evidence()evidence tiers. Scores can reach ≥ 100 from pathway matching or base scoring without any isotopes, creating false-positive upgrades. Evidence tiers now use only actual isotope row detection and multiple adduct counts. Single mass matches with no isotopes and no multiple adducts now correctly remain at Conf 0. (2026-03-10)
- Refined coherence enforcement: now filters to the largest module group per compound instead of blanket downgrading to Conf 0. Added
enforce_compound_coherence()helper that keeps only rows from the most-represented module for multi-module compounds. Applied as a pre-filter incompute_confidence_for_compound()(before Stage 4 evaluation) andupgrade_confidence_with_evidence()(before evidence gathering), replacing the post-hoc gate. Stage 4 now only sees coherent rows, allowing its internal RT clustering and module rules to work correctly on the filtered subset. Compounds like PEST0681 (2 rows in module 110 + 1 stray row in module 62) now retain their strong evidence instead of being killed. (2026-03-09) - Split Stage 4 output into Stage4a (all rows with confidence) and Stage4b (coherent rows only). Stage 5 redundancy filtering now receives Stage4b (coherent subset). Previously wrote single
Stage4_confidence_levels.txt. (2026-03-09)
- Enforced module + RT coherence on all confidence levels > 0. Added
check_compound_coherence()helper that verifies all rows for a compound are in the same peak module and withinmax.rt.diff. Applied at two enforcement points: (1) post-hoc gate incompute_confidence_for_compound()catches all Stage 4 paths at their single exit point, (2) replaced RT-only check inupgrade_confidence_with_evidence()with full module + RT coherence check. Previously, 6 Stage 4 code paths could assign Conf 1 or 2 without checking module coherence, resulting in ~92% of multi-row Conf 2 compounds being split across multiple peak modules. (2026-03-09) - Fixed dead code in
compute_confidence_for_compound()(multilevelannotationstep4.R). Theelseblock at line 580 re-checkedfilter.byinside a branch that already established no filter match — the inner condition was always FALSE. Replaced withConfidence <- CONFIDENCE_LOWso non-filter compounds with weighted adducts and high scores now receive Confidence 1 instead of being stuck at 0. (2026-03-09) - Fixed score zeroing in
get_confidence_stage4()(multilevelannotationstep4.R). Removedelse { final_res$score <- 0 }which zeroed the internal score for non-filter compounds, preventing the Confidence 1→2 boost check at line 572 from working correctly. Output scores (from Stage 3 merge) were not affected. (2026-03-09)
- Added
upgrade_confidence_with_evidence()post-hoc confidence upgrade function (multilevelannotationstep4.R). Evaluates compounds below Confidence 3 using evidence already available: isotope rows, multiple base adducts, isotope score boost (≥100), and RT coherence. Can only upgrade, never downgrade. Whenfilter_byis set, non-filter compounds are assigned one tier lower than equivalent filter-matched compounds (e.g., isotope rows + 2 adducts → Conf 2 instead of 3). Whenfilter_byis NULL/NA, same tier as filter-matched. Integrated intoadvanced_annotation()as Tool 10c, running afteridentify_isotopologues()and before Stage 4 output write. (2026-03-09)
- Replaced
rcdk(Java/rJava dependency) withenviPatfor isotope pattern calculations.compute_isotopic_pattern()andprecompute_isotope_patterns()now useenviPat::isopattern()instead ofrcdk::get.formula()/rcdk::get.isotopes.pattern(). This eliminates the rJava dependency entirely — no JDK,JAVA_HOME, orR CMD javareconfrequired.enviPatmoved from Suggests to Imports;rcdkremoved from Imports. Isotope pattern output format (mass, abund, mass_number_difference, exact_mass_diff) is preserved. Test data regenerated to match enviPat output values. (2026-03-04)
- Removed experimental permutation-based p-value testing. Deleted
R/compute_permutation.R(~650 lines, 6 functions:compute_permutation_pvalues(),precompute_isotope_patterns(),detect_isotopic_peaks_cached(),compute_isotopes_with_cache(),compute_full_pvalues(),compute_streaming_pvalues()). Removedenable_permutation,n_permutations,permutation_method,permutation_seedparameters fromadvanced_annotation(). Deleted 6 man pages and 2 planning documents. The feature was disabled (if (FALSE)) and never production-ready. (2026-03-05)
- Fixed crash on charged molecular formulas (e.g.,
C12H14N2+2for Paraquat) after rcdk → enviPat migration. enviPat cannot parse charge notation in formulas unlike rcdk/CDK. Addedstrip_formula_charge()to remove charge suffixes before callingenviPat::isopattern()(charge doesn't affect isotope patterns). Also addedtryCatchsafety net indetect_isotopic_peaks()to gracefully skip any other unparseable formulas instead of crashing the entire annotation run. (2026-03-04) - Fixed duplicate
feature_idcolumns (feature_id.x,feature_id.y,feature_id) in Stage 4/5 output. Thefeature_idjoin was performed insideskip_pathway_step(),multilevelannotationstep3(), andmultilevelannotationstep4(), then again bysafe_join_feature_id()inadvanced_annotation(), causingdplyr::left_join()to create.x/.ysuffixes. Removed redundant joins from the three internal functions;safe_join_feature_id()now handles all feature_id joining at output stages. (2026-03-04) - Suppressed enviPat "NOTE: You are sure that is the mass of an electrone?" message in
get_isotopologue_labels()by explicitly passingemass = 0.00054857990924toisopattern(). (2026-03-04)
- Fixed
.gitignorewhich contained*.Rdpattern blocking all man pages from being committed. Removed stalexmsannotator/paths and added.DS_Storeexclusion. (2026-03-03) - Added roxygen2 documentation to 13 exported functions that lacked man pages:
simple_annotation,get_chemscore,compute_chemical_score,add_isotopic_peaks,remove_water_adducts,create_adduct_weights,group_by_rt,load_peak_table_parquet,load_adduct_table_parquet,load_compound_table_parquet,load_expected_adducts_csv,load_boost_compounds_csv,save_parquet. (2026-03-03) - Updated
.Rbuildignoreto exclude non-package files (.github,docs/,Dockerfile,.DS_Store,.gitignore,readme.md) from the package tarball. (2026-03-03) - Deleted 4 orphaned
.Rdfiles for removed/internal functions:custom_pathway_step.Rd,compute_boosted_confidences.Rd,get_confidence_stage2.Rd,group_by_rt_histv2.Rd. (2026-03-03) - Regenerated all
man/*.Rdfiles viaroxygen2::roxygenise(). (2026-03-03)
- Added isotopologue identification step (Tool 10b) to
advanced_annotation(). After confidence level assignment, usesenviPat::isopattern()to identify which specific isotope substitution each isotope peak corresponds to (e.g., 13C:1 vs 15N:1 for M+1 peaks). Adds two columns to output:isotopologue(identity label) andisotopologue_quality("confirmed" if both m/z and abundance match, "mz_only" if only m/z matches). Usesisotope_mass_tolerancefor ppm cutoff andintensity_deviation_tolerancefor abundance validation. RequiresenviPatpackage (Suggests dependency); gracefully skips if not installed. Newidentify_isotopologues_flagparameter (default TRUE) to enable/disable. (2026-03-04) - Added
get_monoisotopic_names()internal helper tocompute_isotopes.Rfor identifying monoisotopic element names from enviPat column headers. (2026-03-04) - Added
identify_isotopologues.Rwithidentify_isotopologues()andget_isotopologue_labels()functions. (2026-03-04) - Added
multimer_abundance_checkparameter toadvanced_annotation()(default TRUE). When enabled, checks that multimer adducts (2M, 3M) have lower intensity than the monomer during confidence level assignment. If a multimer is more abundant than the monomer, the confidence level is downgraded. Set to FALSE to disable this validation. Parameter is passed throughmultilevelannotationstep4()toget_confidence_stage4(). (2026-01-27) - Added
MplusH_abundance_ratio_checkparameter toadvanced_annotation()(default TRUE). When enabled, requires secondary adducts to have lower intensity than the primary M+H or M-H adduct during chemical scoring. Set to FALSE to disable this abundance ratio validation. Parameter is passed through toget_chemscore(). (2026-01-27) - Added permutation-based significance testing to
advanced_annotation(). New parameters:enable_permutation(default FALSE),n_permutations(default 1000),permutation_method(default "full"),permutation_seed(42). When enabled, computes p-values by permuting m/z values across peaks to generate null distributions, then outputsStage4_permutation_pvalues_multi.txtwith aperm_pvaluecolumn. Uses parallel processing vian_workersparameter. Two methods available: "full" (all permutations in parallel, faster) and "streaming" (chunked processing, lower memory). Note: permutation testing is currently disabled/in development and not ready for production use. (2026-01-26) - Added
compute_permutation.Rwithcompute_permutation_pvalues(),compute_full_pvalues(), andcompute_streaming_pvalues()functions for permutation testing. (2026-01-26) - Added
pathway_modeparameter toadvanced_annotation()with options: "HMDB" (default), "custom", or "skip". This allows users with custom compound databases to skip pathway matching or provide their own pathway data. (2026-01-24) - Added
pathway_dataparameter toadvanced_annotation()for providing custom pathway-compound mappings whenpathway_mode = "custom". Format: data frame withcompoundandpathwaycolumns. (2026-01-24) - Added
excluded_pathwaysandexcluded_pathway_compoundsparameters toadvanced_annotation()for filtering pathway analysis. (2026-01-24) - Added
as_pathway_table()validation function to utils.R for validating custom pathway data format. (2026-01-24) - Added
skip_pathway_step()andcustom_pathway_step()helper functions for non-HMDB pathway handling. (2026-01-24) - Added adduct detection summary output to console after
simple_annotation()inadvanced_annotation(). Shows total annotations, unique peaks matched, unique compounds matched, and adduct breakdown. (2026-01-24) - Added isotope detection summary output to console after
compute_isotopes()inadvanced_annotation(). Shows count of monoisotopic peaks, isotopes detected (with percentage), and breakdown by adduct type and mass number difference. (2026-01-24) - Added
feature_id_columnparameter toadvanced_annotation()allowing users to preserve their custom feature identifiers (e.g., "C0001", "C0005") through the pipeline. The specified column is now included in all stage outputs (Stage 1-5). Feature ID is joined bypeakfor Stage 1 and bymz+timefor Stages 2-5 for robust matching. (2026-01-24) - Added
mz_rt_feature_id_mapparameter tomultilevelannotationstep3()andmultilevelannotationstep4()to pass feature ID mapping for output files. (2026-01-24) - Added
Stage3_pathway_matched.txtoutput after HMDB pathway matching inadvanced_annotation(). This captures annotation scores after pathway enrichment (only written whenpathway_mode = "HMDB"). (2026-01-25) - Added
Stage4_confidence_levels.txtoutput aftermultilevelannotationstep4()inadvanced_annotation(). This captures confidence level assignments before redundancy filtering. (2026-01-25) - Added
isotope_mass_toleranceparameter toadvanced_annotation()for ppm-based filtering of isotope matches. Defaults to same value asmass_tolerance. This filters out isotope candidates where the observed m/z differs from the theoretical isotope m/z by more than the specified ppm tolerance. Improves isotope detection accuracy by rejecting false positive matches with poor mass accuracy. (2026-01-25) - Added automatic creation of
outlocdirectory if it doesn't exist. Previously, specifying a non-existent output directory would causewrite.table()to fail. (2026-01-25) - Added support for string compound IDs via
compound_idcolumn in compound_table. Users can now provide meaningful identifiers (e.g., "HMDB0000001", "C00001") that flow through the entire pipeline and appear in all output files ascompound_id. The internal integercompoundcolumn is auto-generated ifcompound_idis provided. Legacy mode (numericcompoundonly) remains supported for backward compatibility. (2026-01-25) - Added
boosted_compoundsparameter toadvanced_annotation()for boosting confidence of confirmed annotations to level 4. Requirescompound_idcolumn matchingcompound_idin compound_table. Optionalmzandrtcolumns enable proximity-based matching. (2026-01-25) - Added
boost_match_byparameter toadvanced_annotation()to control matching mode:c("mz")for mz-only,c("rt")for RT-only, orc("mz", "rt")for both (default). (2026-01-25) - Added
boost_mass_toleranceandboost_time_toleranceparameters toadvanced_annotation()for separate control of boost matching tolerances. Uses same format as main tolerance parameters (fractional for mass, seconds for time). Defaults to main tolerances if not specified. (2026-01-25)
- Bumped version from 0.10.0 to 1.0.0 in DESCRIPTION, conda/meta.yaml, and docs/xMSannotator_Input_Formats.md. Major version bump signals the new CLUES-Emory release with package rename, permutation testing, boosted compounds, pathway modes, abundance checks, and many bug fixes. (2026-02-08)
- Updated
xMSannotator_Input_Formats.md: renamed allrecetox.xmsannotator/RECETOX xMSannotatorreferences toCLUES.xMSannotator, added missingMplusH_abundance_ratio_checkandmultimer_abundance_checkparameters to the Optional Parameters table, addedload_boost_compounds_csv()to Validation Functions table and notedas_boosted_compounds_table()is internal, updated footer date. Moved file from repo root toCLUES.xMSannotator/docs/. Added link indocs/README.md. (2026-02-08) - Restored RECETOX GitHub URLs in
docs/possible_issues.md,docs/refactoring.md,docs/modifications.md, anddocs/README.md. These docs describe original RECETOX refactoring work with commit-specific URLs that should point toRECETOX/recetox-xMSannotator, not the CLUES fork. Also restored title indocs/modifications.md. (2026-02-08) - Rewrote
readme.mdto credit upstream lineage (kuppal2 -> RECETOX -> CLUES-Emory), link to the RECETOX fork, and include a summary of CLUES-Emory changes (new features, bug fixes, code cleanup) derived from CHANGELOG.md. (2026-02-08) - Renamed package from
recetox.xmsannotatortoCLUES.xMSannotatorto reflect new ownership under CLUES-Emory organization. GitHub repo:CLUES-Emory/CLUES-xMSannotator. Updated DESCRIPTION (package name, author/maintainer), NAMESPACE, R/RcppExports.R, R/simple_annotation.R, src/RcppExports.cpp, tests/testthat.R, readme.md, conda files, GitHub workflow, docs, and git remote URL. Renamed directory fromrecetox-xMSannotatortoCLUES.xMSannotator. (2026-02-08) - Unified pathway scoring:
multilevelannotationstep3()now accepts both HMDB and custom pathway databases viadb_nameparameter ("HMDB" or "custom"). Custom mode uses the same module-aware Fisher's test scoring logic (compute_score_pathways) as HMDB mode, replacing the simple co-member counting approach of the oldcustom_pathway_step(). New parameters:pathway_data,excluded_pathways,excluded_pathway_compounds.chemCompMZparameter is now optional (default NULL), only needed for HMDB mode. (2026-02-08) - Updated
advanced_annotation()pathway dispatch to routepathway_mode = "custom"throughmultilevelannotationstep3()instead ofcustom_pathway_step(). (2026-02-08) - Renamed HMDB pathway output file from
Stage3_correlation_scores.txttoStage3_HMDB_pathways.txtfor consistency withStage3_custom_pathways.txtandStage3_pathway_skipped.txt. (2026-02-08) - Made
skip_pathway_step()join feature ID tochemscorematdirectly (consistent withmultilevelannotationstep3()), instead of joining to a temporary copy for writing only. Both functions now return data with feature ID included. (2026-02-08) - Renamed
multilevelannotationstep4_v2.Rtomultilevelannotationstep4.Rand stripped all_v2suffixes from function names:filter_clusters,create_cluster_table,compute_delta_rt,assign_conf,get_confidence_stage4,compute_delta_ppm,boost_confidence_of_IDs,multilevelannotationstep4. The refactored version is now the canonical implementation. (2026-02-07) - Updated
advanced_annotation()to callmultilevelannotationstep4()(wasmultilevelannotationstep4_v2()). (2026-02-07) - Updated comment in
compute_permutation.Rto remove stale reference to original file'sdata(adduct_table)global scope pollution. (2026-02-07) - Switched
advanced_annotation()to usemultilevelannotationstep4_v2()for Stage 4 confidence level assignment. The v2 version includes bug fixes (forward-iterating row deletion, NULL adduct weights, unreachable guards), performance improvements (pre-split by compound_id, vectorized delta ppm), and better code structure (11 smaller functions with named constants). (2026-02-07) - Extracted
is_filter_empty()helper function in multilevelannotationstep4_v2.R to consolidate the repeatedis.null(filter.by) || (length(filter.by) == 1 && is.na(filter.by[1]))pattern used inhas_filter_match(),count_filter_matches(), andapply_rt_clustering_rules(). (2026-02-07) - Optimized permutation testing with precomputed isotope pattern cache. Isotope patterns (computed via rcdk) are now calculated once for all unique molecular formulas before the permutation loop, then reused across all permutations. This eliminates redundant rcdk::get.isotopes.pattern() calls - e.g., for 100 permutations x 1000 annotations, this reduces pattern computations from ~100,000 to ~2,000 (number of unique formulas). Added
precompute_isotope_patterns(),detect_isotopic_peaks_cached(), andcompute_isotopes_with_cache()functions to compute_permutation.R. (2026-01-26) - Improved permutation progress messages to print every 10 permutations instead of every 100, providing better feedback during long-running permutation tests. (2026-01-26)
- Renamed
chemical_IDcolumn tocompound_idthroughout the package for consistency. This affects all stage output files (Stage1 through Stage5), internal functions, and test files. The column now matches the inputcompound_idcolumn name from the compound_table, eliminating the confusing name change from input to output. (2026-01-25) - Updated documentation in xMSannotator_Input_Formats.md to clarify that mass tolerance parameters (
mass_tolerance,isotope_mass_tolerance) use fractional (relative) tolerance format (e.g.,5e-6for 5 ppm, not direct ppm values). Added formula explanation and usage examples. (2026-01-25) - Refactored
get_chemscore()to preserve isotopes through chemical scoring. Isotopes are now separated beforecompute_chemical_score(), which prevents them from being filtered out during module selection. Isotopes are re-added after scoring with their parent adduct's score. This ensures isotopes appear in Stage 3+ outputs. (2026-01-24) - Added 100x isotope boost to chemical scores in
get_chemscore()when isotope evidence is present, matching master branch behavior incalc_base_score(). Chemicals with detected isotopes now receive significantly higher scores, improving confidence level assignments. (2026-01-24) - Added
Stage3_chemical_scores.txtoutput afterget_chemscore()inadvanced_annotation(). This captures the chemical scoring results before pathway matching. (2026-01-24) - Refactored all stage outputs to use tab-delimited text files (.txt) with descriptive filenames, saved directly to output directory (no subfolders). New output files:
Stage1_mass_matched.txt,Stage1_peak_clusters.txt,Stage2_isotope_detection.txt,Stage3_chemical_scores.txt,Stage4_confidence_levels.txt,Stage5_curated_results.txt. (2026-01-24) - Added
outlocparameter tomultilevelannotationstep3()function to specify output directory (previously wrote to current directory). (2026-01-24)
- Fixed forward-iterating row deletion bug in
apply_multimer_rules()(multilevelannotationstep4_v2.R). Removing rows inside a forward loop corrupted indices, causing wrong rows to be removed or skipped. Fix: collect indices to remove first, then remove all at once. (2026-02-07) - Fixed
create_adduct_weights(NULL)returning NULL instead of default weights (utils.R). Whenadduct_weightswas converted fromNAtoNULL,is.na(NULL)returnslogical(0)which bypassed the default creation. Added explicitis.null()check. (2026-02-07) - Fixed unreachable early-return guard in
apply_multimer_rules()multimer check (multilevelannotationstep4_v2.R).gregexpr()always returns a list matching input length, solength(check_abundance) == 0was never true. Replaced with proper check for actual multimer pattern matches. (2026-02-07) - Fixed fragile column deletion by position after
merge()inget_confidence_stage4_v2()(multilevelannotationstep4_v2.R). Replacedcurdata[, -1]withcurdata[["cur_adducts"]] <- NULLfor explicit column removal by name. (2026-02-07) - Fixed hardcoded column index 8 in
compute_delta_ppm_v2()(multilevelannotationstep4_v2.R). Now dynamically finds thetheoretical.mzcolumn position for insertingdelta_ppm. (2026-02-07) - Fixed position-based column access
curdata[,1]incompute_confidence_for_compound()(multilevelannotationstep4_v2.R). Replaced with explicitcurdata$score_levelaccess. (2026-02-07) - Fixed
cbind()type coercion risk when building result data frames inget_confidence_stage4_v2()andcheck_minimum_score()(multilevelannotationstep4_v2.R). Replacedcbind(score_level = ..., curdata)withdata.frame(score_level = ..., curdata, check.names = FALSE)to prevent silent coercion of all columns to character. (2026-02-07) - Fixed inconsistent confidence type in
apply_unique_adduct_rules()early return (multilevelannotationstep4_v2.R). Was using bareCONFIDENCE_MEDIUMnumeric withoutscore_levelcolumn name. Now usesdata.frame(score_level = CONFIDENCE_MEDIUM, ...)consistent with other return paths. (2026-02-07) - Fixed
data(adduct_table)polluting global environment and causingsimple_annotation()to fail in permutation testing. Root cause:data(adduct_table)loads the package's dataset (with uppercaseAdductcolumn) into the global environment, which then gets captured in closures passed tocompute_permutation_pvalues(). Whensimple_annotation()runs inside permuted contexts, it references the globaladduct_tableinstead of its parameteradduct_table(lowercaseadductcolumn), causing column name mismatches. Fix: (1) Removed deaddata(adduct_table)code frommultilevelannotationstep4.R- the loaded table was sorted but never used, (2) Changeddata(adduct_table)todata("adduct_table", envir = environment())inget_confidence_stage2()to load into local scope instead of global. (2026-01-26) - Fixed permutation p-values generating 0 null scores because
run_permutation()was missing preprocessing steps. Aftersimple_annotation(), the annotation was missing required columns (mass_defect,module,rt_cluster,mean_intensity) that downstream functions (compute_isotopes(),reformat_annotation_table()) require. Fix: Addedcompute_mass_defect()call anddplyr::inner_join()to copymodule,rt_cluster,mean_intensityfrompeak_tableto the null annotation. (2026-01-26) - Fixed permutation p-values generating 0 null scores because
simple_annotation()doesn't return ascorecolumn. Root cause: The previous permutation test only ran mass matching, but scores are computed downstream by the full pipeline (isotope detection + chemical scoring). Fix: Updatedcompute_permutation_pvalues()to run the full scoring pipeline for each permutation: simple_annotation -> compute_isotopes -> reformat_annotation_table -> get_chemscore. The original peak correlation matrix (computed from peak intensity patterns) is preserved and re-indexed for each permutation's shuffled mz values. This preserves co-elution evidence while breaking the mz-to-compound relationship, providing a meaningful null hypothesis test. New parameters added:adduct_weights,time_tolerance,intensity_deviation_tolerance,mass_defect_tolerance,isotope_mass_tolerance_ppm,correlation_threshold,filter_by,peak_correlation_matrix. (2026-01-26) - Fixed permutation p-values all being identical (100% p < 0.05) due to flawed null score matching logic. Root cause: After permuting mz values, the code tried to match null_annotation back to original annotations by mz value, but the permuted mz values almost never matched original mz values within 1e-6 tolerance, resulting in null_scores being mostly 0. Fix: Changed to global null distribution approach - each permutation returns ALL null scores, and p-values are calculated as the proportion of all null scores >= each observed score. This is a valid permutation test that sidesteps the matching problem. (2026-01-26)
- Fixed parallel processing function lookup failure in
compute_permutation_pvalues()where forked processes couldn't findsimple_annotationby name. Root cause: whenmclapplyforks processes on Unix, the closurerun_permutationreferencessimple_annotationby name but the package namespace isn't properly accessible in child processes. Fix: Explicitly capturesimple_annotationas a local variable (simple_annotation_fn) before defining the closure, ensuring the function reference is stored in the closure's environment. (2026-01-26) - Fixed parallel processing errors in
compute_permutation_pvalues()where all workers would fail with "all scheduled cores encountered errors in user code". Root cause: forked processes couldn't findsimple_annotationand NULL results weren't detected. Fix: (1) Wrappedrun_permutationintryCatchto handle errors gracefully, (2) Addedmc.preschedule = FALSEtomclapplycalls for better error handling, (3) Added detection of failed permutations (NULL or try-error results), (4) Added warning messages reporting failed permutation counts, (5) Added error if all permutations fail, (6) P-values now calculated using actual successful permutation count. (2026-01-26) - Fixed undefined
adduct_weightsvariable incompute_pathways()function (compute_pathways.R line 126). Addedadduct_weightsas a required parameter. (2026-01-24) - Fixed hardcoded RT tolerance in
get_chemscore(). The function previously used<= 10regardless of themax_diff_rtparameter value. Now correctly uses themax_diff_rtparameter for RT filtering. (2026-01-24) - Fixed NA rows appearing in annotation output from
get_chemscore(). Whencompute_chemical_score()returned empty data (no valid adduct evidence), the function returned undefined values whichpmap_dfrconverted to NA rows. Fix: ReturnNULLexplicitly for empty results, whichpmap_dfrskips automatically. Added early check for emptyfiltdataand post-filter check aftercomplete.cases. (2026-01-24) - Fixed Stage 2 output (
Stage2_isotope_detection.txt) missing column headers. The file was previously written withcol.names = FALSEfor all rows. Now writes header on first row only. (2026-01-24) - Fixed duplicate rows in annotation output caused by
pmap_dfrinadvanced_annotation()callingget_chemscore()once per input row instead of once per unique compound_id. This caused N*M row multiplication where N is the number of input rows per chemical and M is the number of output rows per chemical. Fix: Usedistinct(compound_id)before the pmap_dfr call. (2026-01-24) - Fixed isotope rows being removed by
na.omit()inget_chemscore(). Isotopes detected bycompute_isotopes()have NA values in non-critical columns (theoretical.mz,Name,MonoisotopicMass) which caused them to be incorrectly filtered out. This resulted in loss of isotope evidence and lower confidence level assignments. Fix: Replacena.omit()with selective filtering on critical columns only. (2026-01-24) - Removed all
setwd()calls from annotation pipeline functions. Usingsetwd()changes global state and can leave the working directory in an unexpected state if a function fails. Replaced with absolute paths usingfile.path(). Affected files:get_chemscore_october.R,multilevelannotationstep4.R,multilevelannotationstep5.R,get_chemscorev1.6.71.R. (2026-01-24) - Fixed
feature_id_columncausing validation error inas_peak_table(). When peak_table contained a non-numeric feature ID column (e.g., "C0001"), theas_peak_table()function's validation that all columns must be numeric would fail. Fix: Remove the feature_id_column from peak_table before validation while preserving it in peak_table_orig for mapping. (2026-01-24) - Fixed Stage 2 output (
Stage2_isotope_detection.txt) not being created. Theoutlocorigparameter was passed toget_chemscore()but never used (dead code). Fix: Write Stage 2 output directly inadvanced_annotation()afterreformat_annotation_table()completes. Removed deadoutlocorigparameter fromget_chemscore()call. (2026-01-24) - Fixed
rm()warnings inget_confidence_stage4()function. The function attempted to remove variables (temp_curdata,groupB,good_mod,module_clust) that only exist in certain code paths, causing "object not found" warnings. Removed unnecessaryrm()call - R's garbage collection handles cleanup automatically when the function returns. (2026-01-25) - Fixed
Stage5_curated_results.txtnot being created whenredundancy_filtering = TRUEbutfeature_id_columnis not provided. Stage 5 output is now always written when redundancy filtering is enabled. (2026-01-25) - Removed redundant
time.ycolumn fromreformat_annotation_table()output in integration_utils.R. The column was identical totime(both containedannotation$rt) and was legacy code. (2026-01-25) - Fixed duplicate
feature_id.xandfeature_id.ycolumns appearing in stage outputs. The feature_id was being joined multiple times (insidemultilevelannotationstep3()andmultilevelannotationstep4(), then again in stage output sections). Addedsafe_join_feature_id()helper function that skips the join if the column already exists. (2026-01-25)
- Removed unused
conda/directory (meta.yaml,environment-build.yaml,environment-dev.yaml) and.github/workflows/r-conda.ymlCI workflow. These were inherited from upstream RECETOX and are not needed since CLUES.xMSannotator is distributed as a standard R package only. Removed the R Conda CI badge fromreadme.mdand updated setup instructions indocs/developer_documentation.md. (2026-02-17) - Removed redundant
Stage3_pathway_matched.txtwrite.table fromadvanced_annotation()(HMDB mode).multilevelannotationstep3()already writesStage3_HMDB_pathways.txtwith the same data. (2026-02-08) - Removed
custom_pathway_step()fromadvanced_annotation.R. Its functionality is replaced bymultilevelannotationstep3()withdb_name = "custom". (2026-02-08) - Deleted original
multilevelannotationstep4.Randget_confidence_stage4.R(now dead code, replaced by refactored v2 implementation). (2026-02-07) - Removed unused
compute_confidence_levels.Rfile (~67 lines). This file contained alternative implementations (compute_expected_confidences(),compute_boosted_confidences(),compute_confidence_levels()) that were never called in the pipeline. The active implementation is inmultilevelannotationstep4.R+get_confidence_stage4.R. The file also created a naming collision with the localcompute_confidence_levels()function inmultilevelannotationstep4.R. (2026-01-27) - Removed unused
ISgroupdummy column from annotation output. The column was always set to "-" and provided no information. (2026-01-25) - Removed redundant
forms_valid_adduct_pairfilter inadvanced_annotation()- this filter is already applied insimple_annotation()(2026-01-24) - Removed
remove_tmp_files()function and automatic cleanup of stage output files. All output files are now preserved for user inspection. (2026-01-24) - Removed dead code files not used by
advanced_annotation()workflow (~1500 lines):get_confidence_stage2.R(legacy v1 confidence),multilevelannotationstep2.R(unused Step 2 wrapper),get_chemscorev1.6.71.R(replaced by get_chemscore_october.R),group_by_rt_histv2.R(only called by removed get_chemscorev1.6.71). See Code_Cleanup.md for full analysis. (2026-01-25)