Skip to content

Commit d0f2f07

Browse files
Moving bionemo-core loads to bionemo-scdl, so that it's fully standalone. (#1133)
The load functions in bionemo-core are moved into bionemo-scdl. This creates some code duplication, but allows SCDL to be a fully standalone package. Additionally, this assumes that the downloads are always a resource, not a model. An awkward part of this setup is that some scdl resources are listed in both bionemo-core (when used in geneformer) and in bionemo-scdl. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added SCDL data-loading utilities with caching and support for NGC and S3 (PBSS); public API to discover and load named resources. - **Data** - Added several SCDL test datasets to package resources and removed duplicate entries from core resources. - **Tests** - Added tests for local resource discovery and loader behavior; simplified fixtures and removed startup delays. - **Dependencies** - Updated runtime deps (pooch, tqdm) and test extras (removed one package, added psutil). - **Chores** - Cleaned test setup and moved test data handling into the SCDL package. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: polinabinder1 <pbinder@nvidia.com> Signed-off-by: Polina Binder <pbinder@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 4bf3878 commit d0f2f07

8 files changed

Lines changed: 564 additions & 39 deletions

File tree

sub-packages/bionemo-core/src/bionemo/core/data/resources/scdl.yaml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,3 @@
1313
sha256: 9020ba336dbfe33bddadba26ca0cde49958cbd73c5ad44f0960a5a4837c9db26 # pragma: allowlist secret
1414
owner: Savitha Srinivasan <savithas@nvidia.com>
1515
description: Sample test data for SCDL with feature IDs appended.
16-
17-
- tag: sample_scdl_neighbor
18-
ngc: nvidia/clara/scdl_neighbor_testdata:1.0
19-
ngc_registry: resource
20-
pbss: "s3://bionemo-ci/test-data/scdl_neighbor_test_20250616.tar.gz"
21-
sha256: f64a723e5a1d3223d7ad636c2b7601fe5927be47fb1a418a60687ef80eab83d0 # pragma: allowlist secret
22-
owner: Camir Ricketts <camirr@nvidia.com>
23-
description: Sample test data for SCDL with neighbors.

sub-packages/bionemo-scdl/pyproject.toml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,17 @@ dependencies = [
1616
'numpy>=1.24.4',
1717
'pandas>=2.2.1',
1818
'pyarrow>=16.0.0',
19+
'pooch>=1.6.0',
1920
'scipy>=1.11.1',
2021
'torch>=2.2.1',
2122
'pydantic[email]>=2.2.0',
23+
'tqdm>=4.67.1'
2224
]
2325

2426
[project.optional-dependencies]
2527
test = [
26-
"bionemo-core>=2.4.5",
27-
'pytest>=8.4.1'
28+
'pytest>=8.4.1',
29+
'psutil>=7.0.0'
2830
]
2931

3032
[project.scripts]
@@ -36,6 +38,11 @@ include = ["bionemo.*"]
3638
namespaces = true
3739
exclude = ["test*."]
3840

41+
# Make sure that the resource yaml files are being packaged alongside the python files.
42+
[tool.setuptools.package-data]
43+
"bionemo.scdl" = ["**/*.yaml"]
44+
45+
3946
[tool.setuptools.dynamic]
4047
version = { file = "VERSION" }
4148

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-Apache2
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""SCDL data loading utilities."""
17+
18+
from .load import Resource, get_all_resources, load
19+
20+
21+
__all__ = ["Resource", "get_all_resources", "load"]

0 commit comments

Comments
 (0)