Skip to content

feat: restore BigClam compatibility with networkx>=3.0 (no karateclub library dependency)#257

Open
NouamaneA wants to merge 1 commit intoGiulioRossetti:masterfrom
NouamaneA:bigclam
Open

feat: restore BigClam compatibility with networkx>=3.0 (no karateclub library dependency)#257
NouamaneA wants to merge 1 commit intoGiulioRossetti:masterfrom
NouamaneA:bigclam

Conversation

@NouamaneA
Copy link
Copy Markdown

Problem

BigClam was commented out after the support for karateclub ceased. The former internal implementation did not support networkx>=3.0 because it relied on nx.to_numpy_matrix(), which was removed then.

Related Issue #245

Solution

The use of karateclub library was not considered here since it does not seem to be maintained anymore.

  • Replace nx.to_numpy_matrix() -> nx.to_numpy_array()
  • Review necessary changes in consequence -> The rest of the internal implementation is compatible with ndarray
  • Computation of the log-likelihood was commented out since it is not necessary for the gradient ascent
  • Add faster computation of the gradient: eq. 4 in Yang & Leskovec (2013)
  • Add option to get actual overlapping communities: treshold method presented in section 5 of the paper. The treshold is based on the graph density as mentioned by the authors.

Testing

Tested with networkx 3.6.1 and Python 3.13. The test function is implemented in the test_community_discovery_models.py file.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores algorithms.big_clam using CDlib’s internal BigClam implementation compatible with NetworkX ≥ 3.0, removing the prior karateclub dependency and re-enabling API/test coverage for the algorithm.

Changes:

  • Re-enable big_clam in overlapping_partition and route it to the internal BIGCLAM implementation.
  • Update internal BigClam to use nx.to_numpy_array(), add a “fast” gradient variant, and add a threshold-based affiliation mode for overlapping outputs.
  • Add/restore a unit test calling algorithms.big_clam().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
cdlib/algorithms/overlapping_partition.py Reintroduces public big_clam() API and exports it via __all__.
cdlib/algorithms/internal/BIGCLAM.py Updates adjacency conversion for NX3+, adds faster gradient + threshold affiliation option.
cdlib/test/test_community_discovery_models.py Adds a basic runtime/type test for algorithms.big_clam().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"naive": naive,
"affiliation_method": affiliation_method,
},
overlap=True,
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeClustering(overlap=True) is always set, but with affiliation_method="argmax" the output is a disjoint partition. This flag is used by downstream evaluation/serialization, so it should reflect the actual result (e.g., overlap=(affiliation_method == "threshold")).

Suggested change
overlap=True,
overlap=(affiliation_method == "threshold"),

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +108
n, m = graph.number_of_nodes(), graph.number_of_edges()
epsilon = 2 * m / (n * (n - 1))
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the threshold affiliation path, epsilon = 2*m/(n*(n-1)) will divide by zero when n < 2, and epsilon can hit 1 (complete graph) leading to log(0) and delta=inf. Please guard small graphs and clamp epsilon to a safe range (or use nx.density() and cap to < 1.0) to avoid runtime errors/inf thresholds.

Suggested change
n, m = graph.number_of_nodes(), graph.number_of_edges()
epsilon = 2 * m / (n * (n - 1))
n = graph.number_of_nodes()
if n < 2:
epsilon = 0.0
else:
epsilon = nx.density(graph)
epsilon = min(max(epsilon, 0.0), np.nextafter(1.0, 0.0))

Copilot uses AI. Check for mistakes.
Comment on lines 119 to 121
list_communities = []
for com in dict_communities:
list_communities.append(dict_communities[com])
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_communities() currently returns a fixed-length list with potentially many empty communities (especially with argmax when some communities get no assignments, or with threshold when delta is high). Consider filtering out empty communities before returning to match other algorithms in this module and avoid downstream metrics having to handle empty clusters.

Suggested change
list_communities = []
for com in dict_communities:
list_communities.append(dict_communities[com])
list_communities = [members for members in dict_communities.values() if members]

Copilot uses AI. Check for mistakes.
self.assertEqual(type(com), list)
if len(com) > 0:
self.assertEqual(type(com[0]), int)

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big_clam now has new behavior knobs (naive, affiliation_method incl. the overlapping 'threshold' mode), but the test only exercises the default settings and does not cover the threshold-based overlapping output or invalid affiliation_method values. Adding focused assertions for these branches would help prevent regressions.

Suggested change
threshold_coms = algorithms.big_clam(
g, naive=True, affiliation_method="threshold"
)
self.assertEqual(type(threshold_coms.communities), list)
if len(threshold_coms.communities) > 0:
for com in threshold_coms.communities:
self.assertEqual(type(com), list)
if len(com) > 0:
self.assertEqual(type(com[0]), int)
with self.assertRaises(ValueError):
algorithms.big_clam(g, affiliation_method="unsupported")

Copilot uses AI. Check for mistakes.
Comment on lines +921 to +928
coms = big_clam_communities(
g_original,
number_communities=dimensions,
iterations=iterations,
learning_rate=learning_rate,
naive=naive,
affiliation_method=affiliation_method,
)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big_clam advertises support for networkx/igraph inputs, but it passes g_original directly into big_clam_communities(), which calls nx.to_numpy_array() and will fail for igraph graphs. Convert the input with convert_graph_formats(g_original, nx.Graph) (and use the converted graph for the internal call) to keep behavior consistent with the other algorithms in this module.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +79
grad = np.zeros((C,))
for nb in neighbours:
dotproduct = F[nb].dot(F[i])
grad += F[nb] * (1 + sigm(dotproduct))
grad -= np.sum(F, axis=0)
grad += F[i]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradient_fast() recomputes np.sum(F, axis=0) for every node update, which is O(N*C) per node and can erase the intended speedup on sparse graphs. Consider computing the global sum once per outer iteration (or maintaining it incrementally as F updates) and passing it into gradient_fast.

Copilot uses AI. Check for mistakes.
for com in np.nonzero(membership)[0].tolist():
dict_communities[com].append(node)
else:
raise ValueError("Method not supported")
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raised ValueError("Method not supported") is hard to act on and drops the invalid value. Include the provided method and the allowed values (e.g., 'argmax'/'threshold') so callers can quickly correct their input.

Suggested change
raise ValueError("Method not supported")
raise ValueError(
f"Method '{method}' not supported. Allowed values are: 'argmax', 'threshold'."
)

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +97
F[person] += learning_rate * grad

F[person] = np.maximum(0.001, F[person]) # F should be nonnegative
log_likelihood(F, A)
F[person] = np.maximum(0.00001, F[person]) # F should be nonnegative
# log_likelihood(F, A)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowering the non-negativity floor from 0.001 to 1e-5 increases the chance that F_u^T F_v becomes extremely small, which makes sigm() blow up (division by ~0) and can produce inf/unstable gradients. Consider keeping the previous floor or making sigm() numerically safe (e.g., via expm1 and clipping) to prevent divergence.

Copilot uses AI. Check for mistakes.
# if len(coms.communities) > 0:
# self.assertEqual(type(coms.communities[0]), list)
# self.assertEqual(type(coms.communities[0][0]), int)
def test_bigClam(self):
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name test_bigClam is inconsistent with the rest of this file’s snake_case test naming (e.g., test_markov_clustering, test_multicom). Renaming to test_big_clam improves consistency and discoverability in test reports.

Suggested change
def test_bigClam(self):
def test_big_clam(self):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants