Skip to content

Add --NumSVDPCs and log variance explained per principal component#78

Merged
Griffan merged 3 commits into
Griffan:masterfrom
tfenne:add-num-svd-pcs-parameter
Apr 11, 2026
Merged

Add --NumSVDPCs and log variance explained per principal component#78
Griffan merged 3 commits into
Griffan:masterfrom
tfenne:add-num-svd-pcs-parameter

Conversation

@tfenne
Copy link
Copy Markdown
Contributor

@tfenne tfenne commented Apr 10, 2026

WriteSVD() hardcoded writing exactly 10 principal components to the .UD and .V output files. But sometimes it is useful to be able to explore > 10 PCs generated by the SVD process.

Add --NumSVDPCs (default: 10, matching prior behavior; set to 0 for all available components). WriteSVD now respects this parameter and caps at the actual number of components available.

After SVD computation, log the singular value, proportion of variance explained, and cumulative variance for the first 20 components. This helps users choose an appropriate --NumPC for estimation and verify that their reference panel has meaningful population structure.

tfenne and others added 2 commits April 9, 2026 15:40
WriteSVD() hardcoded writing exactly 10 principal components to the
.UD and .V output files. With fewer than 10 individuals (or fewer
than 10 components from the SVD), this read past the end of the
vectors — undefined behavior that could crash or produce garbage.

Add --NumSVDPCs (default: 10, matching prior behavior; set to 0 for
all available components). WriteSVD now respects this parameter and
caps at the actual number of components available.

After SVD computation, log the singular value, proportion of variance
explained, and cumulative variance for the first 20 components. This
helps users choose an appropriate --NumPC for estimation and verify
that their reference panel has meaningful population structure.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes SVD output more configurable by allowing users to control how many principal components are written to .UD/.V, and adds post-SVD logging of variance explained per component to help pick an appropriate --NumPC for downstream estimation.

Changes:

  • Add --NumSVDPCs (default 10; 0 = all) and pass it through to SVD generation.
  • Update WriteSVD() to write min(--NumSVDPCs, available_components) PCs instead of a hardcoded 10.
  • Log singular values and (cumulative) variance explained for the first 20 PCs after SVD.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
SVDcalculator.h Extends ProcessRefVCF / WriteSVD signatures to accept a PC-count parameter.
SVDcalculator.cpp Implements PC-count capping in WriteSVD and logs variance explained after SVD.
main.cpp Adds CLI flag --NumSVDPCs and wires it into ProcessRefVCF.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread SVDcalculator.cpp Outdated
Comment thread SVDcalculator.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Griffan Griffan self-requested a review April 11, 2026 23:05
@Griffan Griffan merged commit e344d0c into Griffan:master Apr 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants