Conversation
…[columnIds, columnCounts] in readClassDt
…ists by sample names
There was a problem hiding this comment.
Pull request overview
This PR refactors how intermediate quantification inputs are represented and passed through the bambu pipeline by introducing a new quantData S4 class and by embedding per-column observed counts into readClassDt (via list-columns) instead of storing full count matrices in quantData.
Changes:
- Introduces an exported S4
quantDataclass (with a$accessor) and changesassignReadClasstoTranscripts()to return this object. - Refactors quantification to compute per-sample/cluster
nobsfromreadClassDt’scolumnIds/columnCountslist-columns rather than from stored count matrices. - Renames/propagates per-column identifiers in read-class construction (
sampleID→columnID) and makes returned lists named rather than numerically indexed.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| R/bambu.R | Adapts quantification loop to new quantData structure; adds naming and attaches optional read/dist metadata to output SE. |
| R/bambu-quantify.R | Updates bambu.quantify() signature and computes nobs from embedded columnIds/columnCounts. |
| R/bambu-quantify_utilityFunctions.R | Adjusts equivalence-class utility calculations (e.g., removes nobs derivation in one helper). |
| R/bambu-processReads.R | Renames sample identifiers, adds column-level count metadata, and updates incompatible-count handling. |
| R/bambu-processReads_utilityConstructReadClasses.R | Propagates columnID/columnIds into read-class construction and stored metadata columns. |
| R/bambu-classes.R | Adds exported S4 quantData class and $ method. |
| R/bambu-assignDist.R | Refactors quantData generation to create the new S4 object and adds quantData-based count helper functions. |
| NAMESPACE | Exports the new quantData class and $ method. |
Comments suppressed due to low confidence (1)
R/bambu-processReads.R:414
- The
sparseMatrix(call that constructscountsis missing its closing)before the incompatible-counts section begins, which makes this function a syntax error and prevents the package from loading.
counts <- sparseMatrix(
i = rep(seq_along(counts.table), lengths(counts.table)),
j = as.numeric(names(unlist(counts.table))),
x = unlist(counts.table),
dims = c(nrow(eqClasses), nrow(metadata(readClassFile)$sampleData)))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code reviewFound 2 issues:
Lines 280 to 288 in ed08825 Class definition for reference: Lines 6 to 15 in ed08825
Lines 62 to 157 in ed08825 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Code review (additional lower-confidence issues)The following issues did not pass the default confidence threshold (80/100) and may be false positives, but are noted here for completeness.
Lines 91 to 95 in ed08825 Downstream code expecting bambu/R/bambu-processReads_utilityConstructReadClasses.R Lines 19 to 20 in ed08825
Lines 11 to 22 in ed08825
Lines 8 to 9 in ed08825 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
jonathangoeke
left a comment
There was a problem hiding this comment.
- I suggest to simplify how gene and transcript expression is calculated from quantData to be consistent with other bambu output and organise code accordingly
- I have not tested the other code
|
Tested on sample data, incompatibleCounts results changed When running on multiple samples, same issue. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ment Replace generateUniqueCountsFromQuantData (returning a sparse matrix) with generateUniqueCountsSEFromQuantData, which returns a transcript-level SE with assays$uniqueCounts, metadata$incompatibleCounts, and metadata$nonuniqueCounts. This makes the output directly compatible with transcriptToGeneExpression. Move both functions into a new summarizeExpression.R (renamed from transcriptToGeneExpression.R) to reflect the broader scope of the file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ounts + incompatibleCounts + nonuniqueCounts Previously transcriptToGeneExpression aggregated EM counts by gene then added incompatibleCounts and nonuniqueCounts, which double-counted nonuniqueCounts since EM counts already incorporate them at the gene level. Now use assays$uniqueCounts as the base, aggregate by gene, then add incompatibleCounts and nonuniqueCounts explicitly. This gives the correct total: reads uniquely assigned to a transcript + reads multi-aligning within a gene + reads incompatible with any transcript. Also fix generateIncompatibleCounts and generateNonUniqueCountMatrix to use unique(mcols(annotations)$GENEID) (appearance order) instead of levels(factor(unique(...))), consistent with combineCountSes, preventing gene ID mislabelling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace catch-all $ method with typed accessors (getSampleData,
getReadClassDt, getIncompatibleCounts, getNonuniqueCounts, getDistTable,
getReadToTranscriptMap), add prototype, validity checks, and exported
constructor constructQuantData. Type incompatibleCounts and nonuniqueCounts
slots as sparseMatrix. Update all call sites in bambu.R,
bambu-assignDist.R, and summarizeExpression.R. Also fixes pre-existing
sampleNames typo bug in bambu.R. Drop exportClass(quantData) and
exportMethods("$") from NAMESPACE.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nonuniqueCountMatrix is now computed on demand in bambu.R and generateUniqueCountsSEFromQuantData, and stored only in the uniqueCountsSe and final SE metadata. Moved generateNonUniqueCountMatrix to bambu_utilityFunctions.R as it is shared across both call sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
finished testing on different test datasets, and also different parameters, surprisingly, this also fixs a bug that readTranscriptMap not reported when readTranscriptMap is set to TRUE in devel_pre_v4. |
- revert transcriptToGeneExpression to use assays$counts + incompatibleCounts (+ nonuniqueCounts if present in metadata, for uniqueCounts SE path) - rename summarizeExpression.R to transcriptToGeneExpression.R - move generateNonUniqueCountMatrix to transcriptToGeneExpression.R - remove nonuniqueCounts from bambu.quantify and combineCountSes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tag SE objects with metadata$seType to distinguish their content: "uniqueCounts", "EMCounts", and "geneCounts" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use levels(factor(unique(...))) to ensure genes vector is alphabetically sorted, matching the GENEID.i index scheme assigned in bambu_utilityFunctions.R. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After decoding numeric indices to gene names, reorder the output matrix to match annotation order so values align with the rownames assigned in combineCountSes. Use drop=FALSE to preserve sparse matrix class for single-column inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace rowSums+names with direct rownames/as.vector to avoid collapsing multi-column subsets and to keep rownames attached as gene IDs for the GENEID.i join in bambu.quantify. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lapply on a named quantData list preserves names, causing do.call(cbind/rbind) to prepend the list element name (sampleName) to each column/row name with a period separator — producing e.g. demultiplexed2.demultiplexed2_TCAGATGTCACCAGGC instead of demultiplexed2_TCAGATGTCACCAGGC. unname() the lapply results before cbind/rbind in generateUniqueCountsSEFromQuantData. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tyFunctions.R Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Related Issue
Addresses #566
Type of Change
- to be updated
Description
This PR aims to tidy the data structure for readClassList & quantData. In particular, we minimises the information required to store in quantData so that it is sufficient for generating unique count matrix, gene count matrix and can be used to perform EM quantification. We also aim to document all the datas stored in readClassList & quantData for clarify (i.e. what every column in the dataframe does)
Implementation Details
generateUniqueCountFromQuantData(quantData, extendedAnno)andgenerateGeneCountFromQuantData(quantData, extendedAnno)was created to get the gene and unique count matrix for all samples & cells from the quantDataImpact of Changes
test report on single-cell data:
se_identical.txt
transcript_comparison_report.pdf
bambu_benchmark.pdf
test report on bulk data:
se_identical.txt
transcript_comparison_report.pdf
bambu_benchmark.pdf
Checklist