feat: restore BigClam compatibility with networkx>=3.0 (no karateclub library dependency)#257
feat: restore BigClam compatibility with networkx>=3.0 (no karateclub library dependency)#257NouamaneA wants to merge 1 commit intoGiulioRossetti:masterfrom
Conversation
… library dependency)
There was a problem hiding this comment.
Pull request overview
Restores algorithms.big_clam using CDlib’s internal BigClam implementation compatible with NetworkX ≥ 3.0, removing the prior karateclub dependency and re-enabling API/test coverage for the algorithm.
Changes:
- Re-enable
big_claminoverlapping_partitionand route it to the internalBIGCLAMimplementation. - Update internal BigClam to use
nx.to_numpy_array(), add a “fast” gradient variant, and add a threshold-based affiliation mode for overlapping outputs. - Add/restore a unit test calling
algorithms.big_clam().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
cdlib/algorithms/overlapping_partition.py |
Reintroduces public big_clam() API and exports it via __all__. |
cdlib/algorithms/internal/BIGCLAM.py |
Updates adjacency conversion for NX3+, adds faster gradient + threshold affiliation option. |
cdlib/test/test_community_discovery_models.py |
Adds a basic runtime/type test for algorithms.big_clam(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "naive": naive, | ||
| "affiliation_method": affiliation_method, | ||
| }, | ||
| overlap=True, |
There was a problem hiding this comment.
NodeClustering(overlap=True) is always set, but with affiliation_method="argmax" the output is a disjoint partition. This flag is used by downstream evaluation/serialization, so it should reflect the actual result (e.g., overlap=(affiliation_method == "threshold")).
| overlap=True, | |
| overlap=(affiliation_method == "threshold"), |
| n, m = graph.number_of_nodes(), graph.number_of_edges() | ||
| epsilon = 2 * m / (n * (n - 1)) |
There was a problem hiding this comment.
In the threshold affiliation path, epsilon = 2*m/(n*(n-1)) will divide by zero when n < 2, and epsilon can hit 1 (complete graph) leading to log(0) and delta=inf. Please guard small graphs and clamp epsilon to a safe range (or use nx.density() and cap to < 1.0) to avoid runtime errors/inf thresholds.
| n, m = graph.number_of_nodes(), graph.number_of_edges() | |
| epsilon = 2 * m / (n * (n - 1)) | |
| n = graph.number_of_nodes() | |
| if n < 2: | |
| epsilon = 0.0 | |
| else: | |
| epsilon = nx.density(graph) | |
| epsilon = min(max(epsilon, 0.0), np.nextafter(1.0, 0.0)) |
| list_communities = [] | ||
| for com in dict_communities: | ||
| list_communities.append(dict_communities[com]) |
There was a problem hiding this comment.
get_communities() currently returns a fixed-length list with potentially many empty communities (especially with argmax when some communities get no assignments, or with threshold when delta is high). Consider filtering out empty communities before returning to match other algorithms in this module and avoid downstream metrics having to handle empty clusters.
| list_communities = [] | |
| for com in dict_communities: | |
| list_communities.append(dict_communities[com]) | |
| list_communities = [members for members in dict_communities.values() if members] |
| self.assertEqual(type(com), list) | ||
| if len(com) > 0: | ||
| self.assertEqual(type(com[0]), int) | ||
|
|
There was a problem hiding this comment.
big_clam now has new behavior knobs (naive, affiliation_method incl. the overlapping 'threshold' mode), but the test only exercises the default settings and does not cover the threshold-based overlapping output or invalid affiliation_method values. Adding focused assertions for these branches would help prevent regressions.
| threshold_coms = algorithms.big_clam( | |
| g, naive=True, affiliation_method="threshold" | |
| ) | |
| self.assertEqual(type(threshold_coms.communities), list) | |
| if len(threshold_coms.communities) > 0: | |
| for com in threshold_coms.communities: | |
| self.assertEqual(type(com), list) | |
| if len(com) > 0: | |
| self.assertEqual(type(com[0]), int) | |
| with self.assertRaises(ValueError): | |
| algorithms.big_clam(g, affiliation_method="unsupported") |
| coms = big_clam_communities( | ||
| g_original, | ||
| number_communities=dimensions, | ||
| iterations=iterations, | ||
| learning_rate=learning_rate, | ||
| naive=naive, | ||
| affiliation_method=affiliation_method, | ||
| ) |
There was a problem hiding this comment.
big_clam advertises support for networkx/igraph inputs, but it passes g_original directly into big_clam_communities(), which calls nx.to_numpy_array() and will fail for igraph graphs. Convert the input with convert_graph_formats(g_original, nx.Graph) (and use the converted graph for the internal call) to keep behavior consistent with the other algorithms in this module.
| grad = np.zeros((C,)) | ||
| for nb in neighbours: | ||
| dotproduct = F[nb].dot(F[i]) | ||
| grad += F[nb] * (1 + sigm(dotproduct)) | ||
| grad -= np.sum(F, axis=0) | ||
| grad += F[i] |
There was a problem hiding this comment.
gradient_fast() recomputes np.sum(F, axis=0) for every node update, which is O(N*C) per node and can erase the intended speedup on sparse graphs. Consider computing the global sum once per outer iteration (or maintaining it incrementally as F updates) and passing it into gradient_fast.
| for com in np.nonzero(membership)[0].tolist(): | ||
| dict_communities[com].append(node) | ||
| else: | ||
| raise ValueError("Method not supported") |
There was a problem hiding this comment.
The raised ValueError("Method not supported") is hard to act on and drops the invalid value. Include the provided method and the allowed values (e.g., 'argmax'/'threshold') so callers can quickly correct their input.
| raise ValueError("Method not supported") | |
| raise ValueError( | |
| f"Method '{method}' not supported. Allowed values are: 'argmax', 'threshold'." | |
| ) |
| F[person] += learning_rate * grad | ||
|
|
||
| F[person] = np.maximum(0.001, F[person]) # F should be nonnegative | ||
| log_likelihood(F, A) | ||
| F[person] = np.maximum(0.00001, F[person]) # F should be nonnegative | ||
| # log_likelihood(F, A) |
There was a problem hiding this comment.
Lowering the non-negativity floor from 0.001 to 1e-5 increases the chance that F_u^T F_v becomes extremely small, which makes sigm() blow up (division by ~0) and can produce inf/unstable gradients. Consider keeping the previous floor or making sigm() numerically safe (e.g., via expm1 and clipping) to prevent divergence.
| # if len(coms.communities) > 0: | ||
| # self.assertEqual(type(coms.communities[0]), list) | ||
| # self.assertEqual(type(coms.communities[0][0]), int) | ||
| def test_bigClam(self): |
There was a problem hiding this comment.
The test name test_bigClam is inconsistent with the rest of this file’s snake_case test naming (e.g., test_markov_clustering, test_multicom). Renaming to test_big_clam improves consistency and discoverability in test reports.
| def test_bigClam(self): | |
| def test_big_clam(self): |
Problem
BigClam was commented out after the support for karateclub ceased. The former internal implementation did not support networkx>=3.0 because it relied on
nx.to_numpy_matrix(), which was removed then.Related Issue #245
Solution
The use of karateclub library was not considered here since it does not seem to be maintained anymore.
nx.to_numpy_matrix()->nx.to_numpy_array()ndarrayTesting
Tested with networkx 3.6.1 and Python 3.13. The test function is implemented in the
test_community_discovery_models.pyfile.