Skip to content

zfsonwindows with remote vdev#578

Open
tiehexue wants to merge 5 commits into
openzfsonwindows:zfs-Windows-2.4.1-releasefrom
tiehexue:windows-zfs-with-remote-vdev
Open

zfsonwindows with remote vdev#578
tiehexue wants to merge 5 commits into
openzfsonwindows:zfs-Windows-2.4.1-releasefrom
tiehexue:windows-zfs-with-remote-vdev

Conversation

@tiehexue
Copy link
Copy Markdown

As we knew, zfs is a local filesystem while lustre, a parrallel filesystem that can built on zfs, can scale to thousands of nodes. I was wondering if we could modify zfs to be a multi-node filesystem.

It happens vdev can be remote, however, I am not sure if it is a good idea. Suppose a zfs node connected thousands of remote vdev as a Filesystem client, is this useful?

Motivation and Context

We can create a pool with remote vdevs, e.g. raw image file in remote machine or raw disks over Tcp connection. This is very convenient for testing zfs clusters.

Description

Added new vdev type, modify zpool to support creating, importing, exporting.

How Has This Been Tested?

Tested locally only.

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

@lundman
Copy link
Copy Markdown

lundman commented May 15, 2026

Nice looking PR and it looks complete. Give me some time to read through it..

@lundman
Copy link
Copy Markdown

lundman commented May 17, 2026

This is a neat idea, and the PR is fully formed which is nice to see. I can see it being the first thought when working with Windows. Discussing it with the upstream OpenZFS team, it would seem the standard way to handle it already exists with NetworkBlockDevice (nbd), and there are both clients and servers for Windows, and they can cross-platform communicate.

I feel it would have little chance to be accepted upstream, but I can rollout a codesigned version for you if you wanted to try it out?

@tiehexue
Copy link
Copy Markdown
Author

This is a neat idea, and the PR is fully formed which is nice to see. I can see it being the first thought when working with Windows. Discussing it with the upstream OpenZFS team, it would seem the standard way to handle it already exists with NetworkBlockDevice (nbd), and there are both clients and servers for Windows, and they can cross-platform communicate.

I feel it would have little chance to be accepted upstream, but I can rollout a codesigned version for you if you wanted to try it out?

Thanks for your time. I already test this in my local computer with self-signed installation package.

I just fill amazing that vdev_ops_t abstraction is so well that I could add a new type very quickly while all above zfs functionalities work.

The problem of this vdev_remoted is that I did not see use case or industry scenarios. Normally, within a big ZFS system, we export filesystem or ZVOL via NFS or nbd or iSCSI. And in rare cases, we aggregate disks via iSCSI or nbd or this vdev_remoted to create zfs pool. JBOD is much more suited to aggregate disks.

However, especially in many AI Compute Center, the network (RDMA, infiniband, optical communication) becomes very very fast. Should we use this network to make a large ZFS pool? If there is the case, this vdev_remoted will make sense, and the zfs_remoted daemon should live in kernel, make use of all modern network for best performance, the protocol should be cross-platform, and even make the disk I/O direct to GPU.

Lustre is widely deployed in AI Compute center, but I see there are double posix layers both is lustre and zfs/ldiskfs, and also double CoW, namespace, snapshots etc. And lustre and zfs are designed in old time, so if we design an new ZFS now to support multiple mount or single mount but with extreme performance and large capacity.

@tiehexue
Copy link
Copy Markdown
Author

This is a neat idea, and the PR is fully formed which is nice to see. I can see it being the first thought when working with Windows. Discussing it with the upstream OpenZFS team, it would seem the standard way to handle it already exists with NetworkBlockDevice (nbd), and there are both clients and servers for Windows, and they can cross-platform communicate.
I feel it would have little chance to be accepted upstream, but I can rollout a codesigned version for you if you wanted to try it out?

Thanks for your time. I already test this in my local computer with self-signed installation package.

I just fill amazing that vdev_ops_t abstraction is so well that I could add a new type very quickly while all above zfs functionalities work.

The problem of this vdev_remoted is that I did not see use case or industry scenarios. Normally, within a big ZFS system, we export filesystem or ZVOL via NFS or nbd or iSCSI. And in rare cases, we aggregate disks via iSCSI or nbd or this vdev_remoted to create zfs pool. JBOD is much more suited to aggregate disks.

However, especially in many AI Compute Center, the network (RDMA, infiniband, optical communication) becomes very very fast. Should we use this network to make a large ZFS pool? If there is the case, this vdev_remoted will make sense, and the zfs_remoted daemon should live in kernel, make use of all modern network for best performance, the protocol should be cross-platform, and even make the disk I/O direct to GPU.

Lustre is widely deployed in AI Compute center, but I see there are double posix layers both is lustre and zfs/ldiskfs, and also double CoW, namespace, snapshots etc. And lustre and zfs are designed in old time, so if we design an new ZFS now to support multiple mount or single mount but with extreme performance and large capacity.

The networks is fast enough more than what a filesystem need is or should be a game changing thing for filesystems. DeepSeek 3FS https://github.com/deepseek-ai/3fs is a good example. I am trying to make spa, or MOS or uberblock distributed, even separated DMU to client/server, but that is too complex for me now.

There are two rules: 1) the disk should contain all information; 2) disks can be multiple import/mount. vdev_remote plus one new component in zfs kernel to make it distributed, parallel filesystem. Where to add?

wangy10 added 2 commits May 19, 2026 20:13
…ev_remote module for error handling, keep stable when vdev connection is lost.
…te in different machine, it works, sometimes crash, and when the two client writes to the mounted filesystem, the last commited one success as we knew.
@lundman
Copy link
Copy Markdown

lundman commented May 20, 2026

It is a good concept, but if you want it to go anywhere, like upstream - it needs to be posted there. I don't have much say about what goes in there. But then the IO would need to be changed to POSIX style, and let SPL/libspl handle the conversion.

But as mentioned, upstream just said if you are going to remote disks, just use NBD to create the /dev/disk (PHYSICALDRIVEx) and attach to ZFS the usual way, no need to bake it into ZFS if it already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants