Skip to content

Add public API skeletons and export ml module#1081

Open
lyraluoyu wants to merge 3 commits into
neurostuff:001-ma-feature-datasetfrom
lyraluoyu:001-ma-feature-dataset
Open

Add public API skeletons and export ml module#1081
lyraluoyu wants to merge 3 commits into
neurostuff:001-ma-feature-datasetfrom
lyraluoyu:001-ma-feature-dataset

Conversation

@lyraluoyu
Copy link
Copy Markdown
Contributor

@lyraluoyu lyraluoyu commented May 20, 2026

Closes #1016 #1017 (T006 T007 T008 T009).

Changes proposed in this pull request:

  • Add public API skeletons
  • Export nimare/ml module
  • Add shared assertions and internel helper placeholders

Summary by Sourcery

Introduce initial public machine learning API scaffolding and export the ml module at the package level.

New Features:

  • Add MAFeatureDataset container class for aligned map features, metadata, targets, and provenance.
  • Add MAFeatureExtractor interface for converting NiMARE study collections into MAFeatureDataset objects.
  • Add make_map_reducer factory function for constructing map-feature reduction workflows.

Enhancements:

  • Export the ml submodule from the top-level nimare package namespace.

Tests:

  • Add unit tests covering MAFeatureDataset initialization, validation, and placeholder methods.
  • Add unit tests covering MAFeatureExtractor initialization and placeholder methods.
  • Update tests for make_map_reducer to use a named reduction method argument.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 20, 2026

Reviewer's Guide

Introduce initial public ML API scaffolding by implementing a concrete MAFeatureDataset initializer with validation, adding placeholder methods for ML workflows, expanding tests to cover the new behavior and error cases, and exposing the ml module at the package level.

File-Level Changes

Change Details Files
Implemented MAFeatureDataset initialization logic with size inference and validation while keeping downstream ML methods scaffolded as NotImplemented, and updated tests accordingly.
  • Replace MAFeatureDataset placeholder with a real init that infers n_samples and n_features from map_features via shape or len, with a fallback error if undeterminable.
  • Add length consistency checks for sample_ids, study_ids, sample_metadata, descriptor_features, target, and feature_names relative to map_features and raise ValueError with clear messages when mismatched.
  • Store all documented attributes (map_features, sample_ids, study_ids, sample_metadata, masker, descriptor_features, target, feature_names, exclusion_report, provenance) on the instance.
  • Define MAFeatureDataset methods (to_sklearn, split, apply_map_reducer, get_feature_names, copy) as public APIs that currently raise NotImplementedError with specific messages.
  • Extend tests to verify successful MAFeatureDataset construction, each validation branch for length mismatches, the UnknownShape failure path, and that all instance methods raise NotImplementedError.
nimare/ml.py
nimare/tests/test_ml.py
Scaffold a configurable MAFeatureExtractor and map reducer factory as public ML APIs and test their current NotImplemented behavior.
  • Replace MAFeatureExtractor placeholder with an init that accepts kernel_transformer and optional descriptor/target configuration, storing parameters on the instance.
  • Add MAFeatureExtractor methods (fit, transform, fit_transform) that define the intended API but currently raise NotImplementedError with explicit messages.
  • Update tests to assert stored constructor parameters and to confirm that fit, transform, and fit_transform all raise NotImplementedError when called.
  • Expand make_map_reducer into a documented factory signature that accepts a method string and kwargs and currently raises NotImplementedError, and adjust the corresponding test to call it with a sample method name.
nimare/ml.py
nimare/tests/test_ml.py
Expose the ml module as part of the top-level NiMARE package API.
  • Import the ml submodule into nimare.init alongside existing submodules so it is available under nimare.ml.
  • Add 'ml' to the all list to make it an officially exported public module.
nimare/__init__.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • MAFeatureDataset currently keeps direct references to inputs like map_features and sample_metadata; if callers might mutate those structures later, consider copying or documenting the expectation of immutability to avoid hard-to-debug side effects.
  • The type annotations for many parameters (e.g., map_features, sample_metadata, descriptor_features, descriptor_fields, descriptor_transformers) are very broad (Any); tightening these to concrete interfaces (e.g., numpy.ndarray | scipy.sparse.spmatrix | Sequence[Sequence[float]], Mapping, etc.) would make the new public API easier to understand and use correctly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- MAFeatureDataset currently keeps direct references to inputs like map_features and sample_metadata; if callers might mutate those structures later, consider copying or documenting the expectation of immutability to avoid hard-to-debug side effects.
- The type annotations for many parameters (e.g., map_features, sample_metadata, descriptor_features, descriptor_fields, descriptor_transformers) are very broad (Any); tightening these to concrete interfaces (e.g., numpy.ndarray | scipy.sparse.spmatrix | Sequence[Sequence[float]], Mapping, etc.) would make the new public API easier to understand and use correctly.

## Individual Comments

### Comment 1
<location path="nimare/ml.py" line_range="93-108" />
<code_context>
+            if len(target) != n_samples:
+                raise ValueError("target length must match number of rows in map_features")
+
+        if feature_names is not None:
+            if len(feature_names) != n_features:
+                raise ValueError(
+                    "feature_names length must match number of columns in map_features"
+                )
+
+        self.map_features = map_features
+        self.sample_ids = list(sample_ids)
+        self.study_ids = list(study_ids)
+        self.sample_metadata = sample_metadata
+        self.masker = masker
+        self.descriptor_features = descriptor_features
+        self.target = target
+        self.feature_names = feature_names
+        self.exclusion_report = exclusion_report
+        self.provenance = provenance
</code_context>
<issue_to_address>
**suggestion:** Normalize `feature_names` to a list to match the documented type and avoid surprises with Index/array-like inputs.

The docstring declares `feature_names` as `list[str]`, but the value is stored unchanged. If callers pass an Index/ndarray/other sequence, code that assumes a plain list (e.g., for JSON serialization or mutation) may misbehave.

You can enforce the documented type with a small normalization step:

```python
if feature_names is not None:
    feature_names = list(feature_names)
    if len(feature_names) != n_features:
        raise ValueError(
            "feature_names length must match number of columns in map_features"
        )
self.feature_names = feature_names
```

This keeps the public contract consistent with the implementation.

```suggestion
        if feature_names is not None:
            feature_names = list(feature_names)
            if len(feature_names) != n_features:
                raise ValueError(
                    "feature_names length must match number of columns in map_features"
                )

        self.map_features = map_features
        self.sample_ids = list(sample_ids)
        self.study_ids = list(study_ids)
        self.sample_metadata = sample_metadata
        self.masker = masker
        self.descriptor_features = descriptor_features
        self.target = target
        self.feature_names = feature_names
        self.exclusion_report = exclusion_report
        self.provenance = provenance
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread nimare/ml.py
Comment on lines +93 to +108
if feature_names is not None:
if len(feature_names) != n_features:
raise ValueError(
"feature_names length must match number of columns in map_features"
)

self.map_features = map_features
self.sample_ids = list(sample_ids)
self.study_ids = list(study_ids)
self.sample_metadata = sample_metadata
self.masker = masker
self.descriptor_features = descriptor_features
self.target = target
self.feature_names = feature_names
self.exclusion_report = exclusion_report
self.provenance = provenance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Normalize feature_names to a list to match the documented type and avoid surprises with Index/array-like inputs.

The docstring declares feature_names as list[str], but the value is stored unchanged. If callers pass an Index/ndarray/other sequence, code that assumes a plain list (e.g., for JSON serialization or mutation) may misbehave.

You can enforce the documented type with a small normalization step:

if feature_names is not None:
    feature_names = list(feature_names)
    if len(feature_names) != n_features:
        raise ValueError(
            "feature_names length must match number of columns in map_features"
        )
self.feature_names = feature_names

This keeps the public contract consistent with the implementation.

Suggested change
if feature_names is not None:
if len(feature_names) != n_features:
raise ValueError(
"feature_names length must match number of columns in map_features"
)
self.map_features = map_features
self.sample_ids = list(sample_ids)
self.study_ids = list(study_ids)
self.sample_metadata = sample_metadata
self.masker = masker
self.descriptor_features = descriptor_features
self.target = target
self.feature_names = feature_names
self.exclusion_report = exclusion_report
self.provenance = provenance
if feature_names is not None:
feature_names = list(feature_names)
if len(feature_names) != n_features:
raise ValueError(
"feature_names length must match number of columns in map_features"
)
self.map_features = map_features
self.sample_ids = list(sample_ids)
self.study_ids = list(study_ids)
self.sample_metadata = sample_metadata
self.masker = masker
self.descriptor_features = descriptor_features
self.target = target
self.feature_names = feature_names
self.exclusion_report = exclusion_report
self.provenance = provenance

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 98.30508% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (001-ma-feature-dataset@347f141). Learn more about missing BASE report.

Files with missing lines Patch % Lines
nimare/ml.py 98.30% 1 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                    @@
##             001-ma-feature-dataset    #1081   +/-   ##
=========================================================
  Coverage                          ?   85.58%           
=========================================================
  Files                             ?       57           
  Lines                             ?    11316           
  Branches                          ?        0           
=========================================================
  Hits                              ?     9685           
  Misses                            ?     1631           
  Partials                          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jdkent jdkent force-pushed the 001-ma-feature-dataset branch from 9976aeb to 6d66ec2 Compare May 20, 2026 18:11
@lyraluoyu lyraluoyu force-pushed the 001-ma-feature-dataset branch from f8849e2 to 110cabb Compare May 27, 2026 02:31
@lyraluoyu
Copy link
Copy Markdown
Contributor Author

hi @jdkent, I've committed once again to meet the requirements of the new specs. I have a question on task T009 and I've asked you on slack. Please let me now if there's anything to change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant