Is your feature request related to a problem? Please describe.
Finding cluster markers requires either downsampling or pushing the entire normalized matrix into GPU memory.
Downsampling at this point isn't feasible - we're drastically undersampling.
We need the ability to identify cluster markers using Dask arrays.
Describe the solution you'd like
Allow the function to accept Dask arrays. It would be ideal if we could include other control variables too in the regression. Like identify cell cluster markers, but control for batch (it may be controlled for in the embedding, but uneven cluster composition could drive some batch effects for example).
Solution:
Logistic regression for dask arrays is implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LogisticRegression.html
Is your feature request related to a problem? Please describe.
Finding cluster markers requires either downsampling or pushing the entire normalized matrix into GPU memory.
Downsampling at this point isn't feasible - we're drastically undersampling.
We need the ability to identify cluster markers using Dask arrays.
Describe the solution you'd like
Allow the function to accept Dask arrays. It would be ideal if we could include other control variables too in the regression. Like identify cell cluster markers, but control for batch (it may be controlled for in the embedding, but uneven cluster composition could drive some batch effects for example).
Solution:
Logistic regression for dask arrays is implemented here:
https://ml.dask.org/modules/generated/dask_ml.linear_model.LogisticRegression.html