Skip to content

Add devices parameter for GPU device selection#99

Draft
LeoGrin wants to merge 3 commits into
mainfrom
add-gpu-device-selection
Draft

Add devices parameter for GPU device selection#99
LeoGrin wants to merge 3 commits into
mainfrom
add-gpu-device-selection

Conversation

@LeoGrin
Copy link
Copy Markdown
Contributor

@LeoGrin LeoGrin commented Apr 7, 2026

Summary

  • Add devices: list[int] | None parameter to GPUParallelWorker, TimeSeriesPredictor.from_tabpfn_family, and TabPFNTSPipeline
  • Allows callers to specify which GPU devices to use (e.g. devices=[2, 3]) instead of always using devices 0..N
  • Validates that devices is only passed with TabPFNMode.LOCAL
  • Backward compatible — existing callers using num_gpus are unaffected

Motivation

Distributed eval workloads need to pin different pipeline instances to specific GPUs. Previously the only control was num_gpus, which always selected devices starting from 0.

Usage

pipeline = TabPFNTSPipeline(
    tabpfn_mode=TabPFNMode.LOCAL,
    devices=[2, 3],
)

Test plan

  • Verify single-GPU fallback uses self.devices[0] instead of hardcoded 0
  • Verify multi-GPU parallel path indexes into self.devices
  • Verify devices with CLIENT mode raises ValueError
  • Verify backward compat: num_gpus=2 still works as before

🤖 Generated with Claude Code

Allow callers to specify which GPU devices to use (e.g. devices=[2, 3])
instead of always using devices 0..N. Useful for distributed eval where
different processes need different GPUs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to specify explicit GPU device indices for local inference by adding a devices parameter to the TabPFNTimeSeriesPipeline. The GPUParallelWorker has been updated to utilize these indices, overriding the default behavior of using all available GPUs. Feedback suggests adding validation for the devices list to ensure it is not empty and contains valid indices, preventing potential runtime errors like division by zero or out-of-range device access.

Comment on lines +43 to +47
if devices is not None:
self.devices = list(devices)
else:
num_gpus = num_gpus or torch.cuda.device_count()
self.devices = list(range(num_gpus))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The devices parameter should be validated to ensure it is not empty and contains valid GPU indices. Providing an empty list will lead to a ValueError during data splitting (division by zero) or an IndexError when accessing the first device. Additionally, validating that indices are within the range of available GPUs prevents late-stage RuntimeError when calling torch.cuda.set_device.

        if devices is not None:
            if not devices:
                raise ValueError("The 'devices' list cannot be empty.")
            self.devices = list(devices)
            num_available = torch.cuda.device_count()
            if any(d < 0 or d >= num_available for d in self.devices):
                raise ValueError(
                    f"Invalid device index in {devices}. "
                    f"Available device indices: 0 to {num_available - 1}."
                )
        else:
            num_gpus = num_gpus or torch.cuda.device_count()
            self.devices = list(range(num_gpus))

LeoGrin and others added 2 commits April 7, 2026 18:14
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant