[Spheron] Add Spheron cloud provider support#9206
Conversation
- Add Spheron as a new cloud provider with full integration: cloud abstraction, provisioning, authentication, catalog, and Ray cluster template - Register Spheron in clouds, provision, and constants modules; add to README and install extras Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix mypy type errors in fetch_spheron.py, spheron_catalog.py, and spheron.py - Add missing `name = 'spheron'` class attribute - Assert instance_type is not None before passing to get_instance_info - Add unit tests for credentials, catalog, unsupported features, and cloud properties Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add retry_if_missing parameter to query_instances to match provisioner interface - Wrap get_hourly_cost with _call_or_default to gracefully handle stale instance types - Fix _is_not_found_error to catch "No instance type X found." error messages - Update Spheron class docstring with accurate product description - Register spheron in conftest and add no_spheron markers across all smoke test files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for Spheron Cloud as a new provider in SkyPilot. The implementation includes a new cloud adaptor, authentication setup for SSH keys, catalog fetching scripts, and provisioning logic for instance management. It also integrates Spheron with the Ray backend using a dedicated template and updates the test suite to include Spheron-specific configurations. Feedback was provided to improve the robustness of the catalog fetching script by replacing a debug assertion with an explicit exception for missing API keys.
| assert api_key is not None, ( | ||
| f'API key not found. Please provide via --api-key or place in ' | ||
| f'{DEFAULT_SPHERON_API_KEY_PATH}') |
There was a problem hiding this comment.
The assert statement here is typically used for debugging and can be optimized out in production. For robust error handling, it's better to explicitly raise a ValueError or FileNotFoundError if the API key is not found or is empty. This aligns with the error handling in sky/provision/spheron/spheron_utils.py.
| assert api_key is not None, ( | |
| f'API key not found. Please provide via --api-key or place in ' | |
| f'{DEFAULT_SPHERON_API_KEY_PATH}') | |
| if api_key is None: | |
| raise ValueError( | |
| f'API key not found. Please provide via --api-key or place in ' | |
| f'{DEFAULT_SPHERON_API_KEY_PATH}') |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix line-too-long (C0301) in instance.py and spheron_catalog.py - Fix logging-not-lazy (W1201) by using % formatting in logger.info - Apply YAPF-required reformatting to test_spheron.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@Michaelvll @aylei can you please check this PR |
This PR adds support for Spheron as a new cloud provider in SkyPilot. Spheron is a unified GPU cloud platform that aggregates enterprise-grade GPU compute from certified global data centers into a single platform — providing on-demand access to NVIDIA GPUs (H100, A100, B200, B300, and more) across multiple providers with no lock-in, pay-per-hour pricing, and costs 40–60% below major hyperscalers.
We currently have an open PR for our catalog here.
YAML Config Example
Use cheapest GPU across all providers in Spheron's marketplace
Look across all providers that have a specific accelerator and use the cheapest one
Use a specific instance type (offer ID) from Spheron
Setup
I have ran a lot of local tests across different Spheron GPU instance types. Note that code formatting has only been fixed for Spheron-related changes — there are pre-existing formatting issues in other parts of the codebase that have not been touched.
Tested
bash format.sh(Spheron-related files only; pre-existing issues in other files left untouched)pytest tests/unit_tests/test_spheron.py/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)