Debug/default value error#18
Conversation
|
Okay, I think I can replicate now. It happens when values are returned in windows. Working on a fix. |
7d423cb to
a544afc
Compare
|
@mukamel-lab, I think this last commit should solves the problem. If it indeed does I will merge this and create a new release to pypi and the conda channel. |
I've tried out the new branch |
…tion outputs the same as the cp.nanmean on the full matrix
|
@mukamel-lab, when you create
I added tests that take bigwig files and takes a batch from them with window_size=1 and window_size=128. Then I compare whether Batch(window_size=128) is the same as the Batch(window_size=1) with a nanmean over the window. Are you creating the BigWigDataset (or PytorchBigWigDataset) something like this? from bigwig_loader.dataset import BigWigDataset
import cupy as cp
dataset_with_window = BigWigDataset(
regions_of_interest=merged_intervals,
collection=bigwig_path,
reference_genome_path=reference_genome_path,
sequence_length=2048,
center_bin_to_predict=2048,
window_size=128,
batch_size=32,
batches_per_epoch=10,
maximum_unknown_bases_fraction=0.1,
default_value=cp.nan,
return_batch_objects=True
) |
Sorry to ask, but did you git pull the latest changes on this branch? |
I found that your test (test_get_values_from_intervals_edge_case_1) fails if I use a default value other than 0 or nan. I realized this can be fixed by making sure default_value is np.float32. However, I'm still having trouble with PyTorchBigWigDataset filling in 0 instead of default_value for bins that are not at the right-hand end of the query region. My code looks like this: |
|
This is very helpful. My excuses. I am drilling into this now. |
…ared to the output of pyBigWig with a window function applied afterwards. Parametrized with different default values.
I came up with a fix (using chatGPT! -- I have no experience with cuda programming) which appears to work for me. Please check if this looks okay, and thanks! https://github.com/mukamel-lab/bigwig-loader/tree/mukamel-lab-patch-1 |
…stom_position_sampler and custom_track_sampler options to the dataset objects.
|
@mukamel-lab will look into your cuda code. In the meanwhile I added more tests. There is also a test using the actual PytorchBigWigDataset (instead of the stuff underneath) and I still have problems replicating what you're seeing (also on the bigwig file you send me). To make things easier to replicate I added a "custom_position_sampler" to the dataset objects (which can just be an Iterable of tuples of chromosome, center position like [("chr1", 73674382), ("chr6", 725209)....]. Could you maybe change this test, so it fails (on your file): ?Or tell me how what that test tests is not what you mean. I wondered, when you figured out that the "default_value" was incorrectly not a float32 when not 0 or NaN, did you also try cp.float32(cp.nan) as default_value? For which, thanks again. Because what could have happened was that positions in the float32 tensor were being overwritten with 64 bits somehow in a row, leaving the first 32 bits of the 64 bit NaN in each position?? I am just guessing at this point what could be happening. But it's so hard since (unlike before) I am not finding those zeros. |
It works for me now! I think I must have been incorrectly loading the wrong version of the module. I am now able to get the expected behavior from PytorchBigWigDataset. Thank you!! |
|
Great! I was already afraid we were chasing a Heisenbug. |
see comment: #17 (comment)
I can not replicate the behavior where there are zero's in the output matrix where there should not be.