Skip to content

[Spheron] Add Spheron cloud provider support#9206

Open
rekpero wants to merge 5 commits into
skypilot-org:masterfrom
spheron-core:master
Open

[Spheron] Add Spheron cloud provider support#9206
rekpero wants to merge 5 commits into
skypilot-org:masterfrom
spheron-core:master

Conversation

@rekpero
Copy link
Copy Markdown

@rekpero rekpero commented Mar 27, 2026

This PR adds support for Spheron as a new cloud provider in SkyPilot. Spheron is a unified GPU cloud platform that aggregates enterprise-grade GPU compute from certified global data centers into a single platform — providing on-demand access to NVIDIA GPUs (H100, A100, B200, B300, and more) across multiple providers with no lock-in, pay-per-hour pricing, and costs 40–60% below major hyperscalers.

We currently have an open PR for our catalog here.

YAML Config Example

Use cheapest GPU across all providers in Spheron's marketplace

resources:
    infra: spheron

Look across all providers that have a specific accelerator and use the cheapest one

resources:
    infra: spheron
    accelerators: A100:1

Use a specific instance type (offer ID) from Spheron

resources:
    infra: spheron
    accelerators: A100:1
    instance_type: <spheron_offer_id>

NOTE: accelerators is optional when instance_type is specified.

Setup

  1. Get Spheron API Key: Sign up at https://app.spheron.ai and obtain an API key.
  2. Configure credentials:
    mkdir ~/.spheron
    echo "your-api-key-here" > ~/.spheron/api_key
  3. Fetch the catalog:
    python sky/catalog/data_fetchers/fetch_spheron.py
  4. Verify setup:
    sky check

I have ran a lot of local tests across different Spheron GPU instance types. Note that code formatting has only been fixed for Spheron-related changes — there are pre-existing formatting issues in other parts of the codebase that have not been touched.

Tested

  • Code formatting: bash format.sh (Spheron-related files only; pre-existing issues in other files left untouched)
  • Unit tests: pytest tests/unit_tests/test_spheron.py
  • Manual tests: launched clusters across different Spheron GPU instance types locally
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

rekpero and others added 3 commits March 27, 2026 16:56
- Add Spheron as a new cloud provider with full integration: cloud abstraction, provisioning, authentication, catalog, and Ray cluster template
- Register Spheron in clouds, provision, and constants modules; add to README and install extras

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix mypy type errors in fetch_spheron.py, spheron_catalog.py, and spheron.py
- Add missing `name = 'spheron'` class attribute
- Assert instance_type is not None before passing to get_instance_info
- Add unit tests for credentials, catalog, unsupported features, and cloud properties

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add retry_if_missing parameter to query_instances to match provisioner interface
- Wrap get_hourly_cost with _call_or_default to gracefully handle stale instance types
- Fix _is_not_found_error to catch "No instance type X found." error messages
- Update Spheron class docstring with accurate product description
- Register spheron in conftest and add no_spheron markers across all smoke test files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Spheron Cloud as a new provider in SkyPilot. The implementation includes a new cloud adaptor, authentication setup for SSH keys, catalog fetching scripts, and provisioning logic for instance management. It also integrates Spheron with the Ray backend using a dedicated template and updates the test suite to include Spheron-specific configurations. Feedback was provided to improve the robustness of the catalog fetching script by replacing a debug assertion with an explicit exception for missing API keys.

Comment on lines +199 to +201
assert api_key is not None, (
f'API key not found. Please provide via --api-key or place in '
f'{DEFAULT_SPHERON_API_KEY_PATH}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assert statement here is typically used for debugging and can be optimized out in production. For robust error handling, it's better to explicitly raise a ValueError or FileNotFoundError if the API key is not found or is empty. This aligns with the error handling in sky/provision/spheron/spheron_utils.py.

Suggested change
assert api_key is not None, (
f'API key not found. Please provide via --api-key or place in '
f'{DEFAULT_SPHERON_API_KEY_PATH}')
if api_key is None:
raise ValueError(
f'API key not found. Please provide via --api-key or place in '
f'{DEFAULT_SPHERON_API_KEY_PATH}')

rekpero and others added 2 commits March 27, 2026 23:27
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix line-too-long (C0301) in instance.py and spheron_catalog.py
- Fix logging-not-lazy (W1201) by using % formatting in logger.info
- Apply YAPF-required reformatting to test_spheron.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rekpero
Copy link
Copy Markdown
Author

rekpero commented Apr 18, 2026

@Michaelvll @aylei can you please check this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant