Skip to content

Cannot learn from large numbers of interactions #719

@MariaZharova

Description

@MariaZharova

Hi there!
I'm trying to fit LightFM model on a quite large dataset: it contains ~13 million items and ~38.5 million users. But the main problem is that the number of interactions is more than 3 billion, this is > (2^32 - 1). And I have the following error while calling fit method:

File ~/.cache/pypoetry/virtualenvs/complementary-items-GrOWNq8P-py3.10/lib/python3.10/site-packages/lightfm/lightfm.py:684, in LightFM._run_epoch(self, item_features, user_features, interactions, sample_weight, num_threads, loss)
    677 """
    678 Run an individual epoch.
    679 """
    681 if loss in ("warp", "bpr", "warp-kos"):
    682     # The CSR conversion needs to happen before shuffle indices are created.
    683     # Calling .tocsr may result in a change in the data arrays of the COO matrix,
--> 684     positives_lookup = CSRMatrix(
    685         self._get_positives_lookup_matrix(interactions)
    686     )
    688 # Create shuffle indexes.
    689 shuffle_indices = np.arange(len(interactions.data), dtype=np.int32)

File lightfm/_lightfm_fast_openmp.pyx:167, in lightfm._lightfm_fast_openmp.CSRMatrix.__init__()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

I guess this problem caused by using int type for num interactions in CPython-file _lightfm_fast_openmp.c. Is there a plan to expand the data type to long?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions