Skip to content

Add CPAL backend#786

Closed
roderickvd wants to merge 10 commits intolibrespot-org:devfrom
roderickvd:cpal-backend
Closed

Add CPAL backend#786
roderickvd wants to merge 10 commits intolibrespot-org:devfrom
roderickvd:cpal-backend

Conversation

@roderickvd
Copy link
Copy Markdown
Member

@roderickvd roderickvd commented Jun 1, 2021

This is a working CPAL backend based on the extensive initial work by @Johannesd3 (thanks!).

The intention is to deprecate Rodio in favour of CPAL: Rodio is based on CPAL, and builds on it, but there is nothing extra we need in Rodio that isn't already in CPAL. This has been discussed in various places, notably #648 and #734 (comment).

So far I have successfully tested this backend on Linux (S16, F32) and macOS (F32).

Todo

  • Pass the build on MSRV
  • Test on Windows (help needed: I don't have Windows machines to compile on)
  • Refactor slightly to bring it more in line with the other backends
  • Implement a companion CpalJack variant (like we also have RodioJack)
  • Promote it to default backend status

Open questions

  1. What do we want to do with the Rodio backend for the upcoming release: replace it with CPAL? Or keep it around for planned deprecation in a release sometime later?

  2. The code that checks whether the requested audio format is available, or falls back to the system default otherwise, actually returns the highest supported audio format by default. Currently librespot defaults to S16 but this code gives another opportunity: we could change format selection to an Option<AudioFormat>, selecting the highest quality by default unless specified otherwise. For other backends, that do not easily support querying supported formats, we could still default to S16 unless specified otherwise. Should we get this in or leave it be? This would be another PR but I'd first like to hear your thoughts before I put time in. No it would work with idiosyncracies for every backend and not be very transparent to the user.

  3. For an out-of-the-box experience on Windows, we need resampling. Do we want to add this, or forget about this PR and stick with Rodio? See: Add CPAL backend #786 (comment)

Johannesd3 and others added 3 commits April 18, 2021 17:33
This provides an out-of-the-box experience. Particularly CoreAudio
systems (Macs) may report they only support F32 while librespot
defaults to S16.
@JasonLG1979
Copy link
Copy Markdown
Contributor

A --format auto option would be pretty cool. I also like the idea to falling back to S16 if the requested format is not available. In theory I think you can do both of those things in the alsa backend also by just falling back to S16 on error for the fallback and telling it to pick the "nearest" to F32 when you set the sink up for the auto setting maybe? But really if CPAL works well enough and doesn't have much of any overhead it could make the alsa backend obsolete. One backend to rule them all would make things much easier. I'd still keep the alsa around though in at least maintenance mode as a backup.

@roderickvd
Copy link
Copy Markdown
Member Author

A --format auto option would be pretty cool. I also like the idea to falling back to S16 if the requested format is not available. In theory I think you can do both of those things in the alsa backend also by just falling back to S16 on error for the fallback and telling it to pick the "nearest" to F32 when you set the sink up for the auto setting maybe?

Actually I'm scrapping the idea because it's more complicated than that, and wouldn't be very transparent to the user. F32 may seem better than S16 but when the backend doesn't dither internally to a device that does S16 in hardware then it really isn't higher quality, but lower quality.

But really if CPAL works well enough and doesn't have much of any overhead it could make the alsa backend obsolete. One backend to rule them all would make things much easier. I'd still keep the alsa around though in at least maintenance mode as a backup.

This has been discussed in various places, and is not the intended direction. There are good reasons to keep at least some of the others around not only as backup, but for their features. e.g. CPAL doesn't offer S24, Alsa doesn't dither, some users latch onto Alsa's mixer mute switch, GStreamer can do DSP, and so on.

So while I think CPAL is great as a default, because it offers a good out-of-the-box experience, there's a place for (most of) the others as well.

@Johannesd3
Copy link
Copy Markdown
Contributor

(btw one nice thing about a decoupled playback library would be the possibility of automated tests on Windows)

@Johannesd3
Copy link
Copy Markdown
Contributor

Windows:

[2021-06-02T07:04:44Z INFO  librespot_playback::audio_backend::cpal] Using CPAL sink with format F32 and host: WASAPI
[2021-06-02T07:04:44Z INFO  librespot_playback::audio_backend::cpal] Using audio device: Speaker (Realtek(R) Audio)
thread '<unnamed>' panicked at 'Could not open output stream with that format: StreamConfigNotSupported', playback\src\audio_backend\cpal.rs:163:14

@Johannesd3
Copy link
Copy Markdown
Contributor

My guess: it doesn't support natively 44100 as sample rate, and resampling would be necessary. Rodio did that, cpal doesn't.

@roderickvd
Copy link
Copy Markdown
Member Author

Thanks. Could you provide me with the binary?
And the output of --backend cpal --device ?

Can I set up a Rust compilation environment with MinGW? I don't have the disk space for all the Visual Studio SDK baggage.

@roderickvd
Copy link
Copy Markdown
Member Author

My guess: it doesn't support natively 44100 as sample rate, and resampling would be necessary. Rodio did that, cpal doesn't.

I think you're right. On Windows this seems to be a bit of a minefield, with various Realtek cards (and who knows which others) only taking 48 kHz and WASAPI not doing resampling. We certainly don't want to expose users to all this, it should just work out of the box.

If so there are two things we can do:

  1. Forget about it and keep Rodio in.

  2. Introduce resampling in librespot for CPAL on Windows. It would not need to be an own implementation, in fact there is a nice pure Rust crate Rubato that can do this with higher quality than Rodio's linear interpolation.

@JasonLG1979
Copy link
Copy Markdown
Contributor

Actually I'm scrapping the idea because it's more complicated than that, and wouldn't be very transparent to the user. F32 may seem better than S16 but when the backend doesn't dither internally to a device that does S16 in hardware then it really isn't higher quality, but lower quality.

Yep, pretty much everyone assumes that more bits, more better. Implementation is really the important part.

This has been discussed in various places, and is not the intended direction. There are good reasons to keep at least some of the others around not only as backup, but for their features. e.g. CPAL doesn't offer S24, Alsa doesn't dither, some users latch onto Alsa's mixer mute switch, GStreamer can do DSP, and so on.

So while I think CPAL is great as a default, because it offers a good out-of-the-box experience, there's a place for (most of) the others as well.

No 24 bit is kinda a deal breaker. I was not privy to those discussions. I'm sorry.

@roderickvd
Copy link
Copy Markdown
Member Author

I shortly played around with the Rubato resampling crate but unfortunately its API is not directly compatible with ours. Rubato works with a vec![Vec<f64>; 2] of samples, with the samples for each channel in those separate Vec<f64>s. Our samples are stored in an interleaved, one-dimensional Vec<f64>.

To use Rubato we'd have to iterate over all samples two more times: once to split them into two vectors, then again to join them in a single vector. That looks aweful and seems wasteful.

There are other resampling libraries that I haven't investigated, because they are wrappers around libresample and the like. I think we should only go for a pure Rust solution.

Options I see now:

  1. Forget about this and keep Rodio in
  2. Split, resample and join only in Windows
  3. Keep Rodio default on Windows, make CPAL default on other platforms

So as much as it aches me that it's only on Windows, I'm starting to feel it might be best to just leave Rodio be. Your opinion?

@Johannesd3
Copy link
Copy Markdown
Contributor

Lewton doesn't deliver the samples interleaved by default: https://docs.rs/lewton/0.10.2/lewton/samples/trait.Samples.html#tymethod.from_floats. If you want to rewrite everything still another time, it would be possible to use this fact.

Are you still interested in the binary?

And are we sure we don't need resampling for any other OS than Windows? Is it possible that some devices with linux don't support the usual sample rate? Is it possible that formats other than ogg (which librespot doesn't support for now) use different sample rates? Is it possible that Spotify HiFi will give us different sample rates?

I think it's another good argument to create a librespot-tailored alternative crate to rodio as I suggested before, @roderickvd. Just to have more flexibility for any of these cases. But well...

@JasonLG1979
Copy link
Copy Markdown
Contributor

Is it possible that some devices with linux don't support the usual sample rate?

ALSA/Dmix defaults to resampling to 48 kHz if you go though the "Default" device because of the above reason that 48 kHz is actually the most common supported sampling rate for integrated sound cards (or at least it was when that decision was made) . PulseAudio, Pipewire and Gstreamer all sit on top of ALSA/Dmix. So on Linux resampling isn't really a must have. The only way it would really be a draw is if it were really high quality like comparable to sox.

@roderickvd
Copy link
Copy Markdown
Member Author

roderickvd commented Jun 5, 2021

Lewton doesn't deliver the samples interleaved by default: https://docs.rs/lewton/0.10.2/lewton/samples/trait.Samples.html#tymethod.from_floats. If you want to rewrite everything still another time, it would be possible to use this fact.

Yes for lewton that's a nice option. Meanwhile I am looking ahead, eying other crates for multi-format support. Today to fix some issues with tracks that are available as MP3 or AAC only (see #651), tomorrow for Spotify HiFi in FLAC. In fact I'm preparing a PR for this. For crates like minimp3, claxon, and Symphonia, the common denominator is interleaved samples.

Now suppose we would be able to extract non-interleaved samples from those crates. Then for the backends, it would probably add a bit of complexity. When writing non-interleaved samples (if the backends support it -- I haven't checked all of them although Alsa and GStreamer should be fine) you need to write periods ("chunks") of samples per channel. We would need to be careful to not run into latency issues and underruns.

And are we sure we don't need resampling for any other OS than Windows? Is it possible that some devices with linux don't support the usual sample rate? Is it possible that formats other than ogg (which librespot doesn't support for now) use different sample rates? Is it possible that Spotify HiFi will give us different sample rates?

The question isn't so much the source format, but rather the flexibility of the platform. Then yes, it's really only an issue on Windows because the other platforms have easy options to transparently (from a UX perspective) resample to whatever the hardware supports.

44.1 kHz remains the Red Book CD-standard and that is what Spotify has also announced for Spotify HiFi. It's certainly possible that at other sampling rates will also be offered in the future. Perhaps for streaming video or even multiples of 44.1 or 48 kHz. Again this wouldn't be a big deal on most platforms except on Windows.

So I'm thinking: we can delve into changing vector layouts but aren't we introducing too much complexity for a single platform? Especially when the other options seem so much easier?

I think it's another good argument to create a librespot-tailored alternative crate to rodio as I suggested before, @roderickvd. Just to have more flexibility for any of these cases. But well...

I know, I noticed your earlier suggestions too 😆

I don't oppose the idea. In fact, I think the way your handling formats with generics here in CPAL is a nice middle of the road between what we have now, and what you proposed in sinky. It got me thinking and I now do think that it would be good to refactor and use this everywhere, together with a Sample trait with a generic type system for converting from and to floats.

It's not the highest on my list but it's on there. Once refactoring is done, we could look into extracting it.

Are you still interested in the binary?

No I believe that you were correct in your analysis. Thanks.

@JasonLG1979
Copy link
Copy Markdown
Contributor

JasonLG1979 commented Jun 5, 2021

44.1 kHz remains the Red Book CD-standard and that is what Spotify has also announced for Spotify HiFi. It's certainly possible that at other sampling rates will also be offered in the future. Perhaps for streaming video or even multiples of 44.1 or 48 kHz. Again this wouldn't be a big deal on most platforms except on Windows.

That is true but not the answer to the question 😜

You basically just restated what I said in:

The question isn't so much the source format, but rather the flexibility of the platform. Then yes, it's really only an issue on Windows because the other platforms have easy options to transparently (from a UX perspective) resample to whatever the hardware supports.

It's honestly surprising that Windows doesn't automatically resample. It has a software mixer doesn't it?

But anyway I have nothing more to add so I'll see myself out ☮️ I don't own nor use a Windows machine on a regular basis.

@roderickvd
Copy link
Copy Markdown
Member Author

That is true but not the answer to the question 😜

Which question did I forget to answer? 🤷‍♂️

It's honestly surprising that Windows doesn't automatically resample. It has a software mixer doesn't it?

Usually it should for most cards (except when in WASAPI exclusive mode -- in which we aren't) but Realtek cards in particular seems real aggressive in enforcing its own sampling rate. And they are pretty pervasive.

@JasonLG1979
Copy link
Copy Markdown
Contributor

JasonLG1979 commented Jun 6, 2021

Which question did I forget to answer? man_shrugging

I answered this question:

Is it possible that some devices with linux don't support the usual sample rate?

With:

ALSA/Dmix defaults to resampling to 48 kHz if you go though the "Default" device because of the above reason that 48 kHz is actually the most common supported sampling rate for integrated sound cards (or at least it was when that decision was made) . PulseAudio, Pipewire and Gstreamer all sit on top of ALSA/Dmix. So on Linux resampling isn't really a must have. The only way it would really be a draw is if it were really high quality like comparable to sox.

And then you rephrased my answer as:

The question isn't so much the source format, but rather the flexibility of the platform. Then yes, it's really only an issue on Windows because the other platforms have easy options to transparently (from a UX perspective) resample to whatever the hardware supports.

And responded with this for some reason?:

44.1 kHz remains the Red Book CD-standard and that is what Spotify has also announced for Spotify HiFi. It's certainly possible that at other sampling rates will also be offered in the future. Perhaps for streaming video or even multiples of 44.1 or 48 kHz. Again this wouldn't be a big deal on most platforms except on Windows.

@JasonLG1979
Copy link
Copy Markdown
Contributor

I kinda feel like I got mansplained...

@roderickvd
Copy link
Copy Markdown
Member Author

Sorry that should have been in response to @Johannesd3’s question if we’d ever need to support other sampling rates.

I updated my comment accordingly. No ill intention here.

@roderickvd
Copy link
Copy Markdown
Member Author

Coming back to working with Vec<Vec<T>, I came across this by the author of the rtrb crate:

If you want to use a rtrb::RingBuffer with multi-channel (non-interleaved) audio blocks and want to access the channels of each block as slices of slices (i.e. as &[&[f32]] for reading and &mut [&mut [f32]] for writing), you might find another crate of mine useful: https://github.com/mgeier/rsor

Still doesn't quite fit the bill but if it sparks anyone's creativity...

@roderickvd
Copy link
Copy Markdown
Member Author

I'm scrubbing this idea for as long as we target Windows as a "tier 1" platform. Introducing resampling would just be redoing what Rodio already has, in spite of our ideas to do it a little better. One good thing that came from this is this idea for future work:

In fact, I think the way your handling formats with generics here in CPAL is a nice middle of the road between what we have now, and what you proposed in sinky. It got me thinking and I now do think that it would be good to refactor and use this everywhere, together with a Sample trait with a generic type system for converting from and to floats.

#786 (comment)

I'll keep my cpal branch around. @JasonLG1979 this uses the rtrb crate if you want to have a look.

@roderickvd
Copy link
Copy Markdown
Member Author

For reference, the Psst project is implementing a cpal backend with libsamplerate: jpochyla/psst#197. I still don't particularly care about reimplementing stuff that Rodio already does, but I'm open to PRs (my branch remains available to fork, and so is psst).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants