-
Notifications
You must be signed in to change notification settings - Fork 46
Addition of coauthorship dataset #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
8737269
Sorting when converting set to tuple
Hellsegga 6771f64
Merge branch 'main' of github.com:Hellsegga/TopoNetX
Hellsegga bd93e4e
Addition of coauthorship dataset
Hellsegga 2d986a3
Suggested fixes on PR
Hellsegga 4067f4c
Merge branch 'pyt-team:main' into main
Hellsegga 2beee70
Changed names and completed description of dataset
Hellsegga File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| """Various examples of named graphs represented as complexes.""" | ||
|
|
||
| from pathlib import Path | ||
| from typing import Literal, overload | ||
|
|
||
| import networkx as nx | ||
|
|
@@ -9,7 +10,9 @@ | |
| from toponetx.algorithms.spectrum import hodge_laplacian_eigenvectors | ||
| from toponetx.transform.graph_to_simplicial_complex import graph_to_clique_complex | ||
|
|
||
| __all__ = ["karate_club"] | ||
| __all__ = ["karate_club", "coauthorship"] | ||
|
|
||
| DIR = Path(__file__).parent | ||
|
|
||
|
|
||
| @overload | ||
|
|
@@ -135,3 +138,41 @@ def karate_club( | |
| return cx | ||
|
|
||
| raise ValueError(f"complex_type must be 'simplicial' or 'cell' got {complex_type}") | ||
|
|
||
|
|
||
| def coauthorship() -> SimplicialComplex: | ||
| """Load the coauthorship network from [SNN20] as a simplicial complex. | ||
|
Hellsegga marked this conversation as resolved.
|
||
|
|
||
| The coauthorship network is a simplicial complex where a paper with k authors is represented by a (k-1)-simplex. | ||
| The dataset is pre-processed as in [SNN20]. From the Semantic Scholar Open Research Corpus 80 papers with number of citations between 5 and 10 were sampled. | ||
| The papers constitute simplices in the complex, which is completed with subsimplices (seen as collaborations between subsets of authors) to form a simplicial complex. | ||
| An attribute named "citations" is added to each simplex, corresponding to the sum of citations of all papers on which the authors represented by the simplex collaborated. | ||
| The resulting simplicial complex is of dimension 10 and contains 24552 simplices in total. See [SNN20] for a more detailed description of the dataset. | ||
|
|
||
| References | ||
| ---------- | ||
| [SNN20] Stefania Ebli, Michael Defferrard and Gard Spreemann. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do no think the description of the dataset is clear enough.
|
||
| Simplicial Neural Networks. | ||
| Topological Data Analysis and Beyond workshop at NeurIPS. | ||
| https://arxiv.org/abs/2010.03633 | ||
| https://github.com/stefaniaebli/simplicial_neural_networks | ||
|
|
||
| Returns | ||
| ------- | ||
|
Hellsegga marked this conversation as resolved.
|
||
| SimplicialComplex | ||
| The simplicial complex comes with the attribute "citations", the number of citations attributed to the given collaborations of k authors. | ||
|
|
||
| """ | ||
| coauthorship = np.load(DIR / "coauthorship.npy", allow_pickle=True) | ||
|
|
||
| simplices = [] | ||
| for dim in range(len(coauthorship) - 1, -1, -1): | ||
| simplices += [list(el) for el in coauthorship[dim].keys()] | ||
|
|
||
| sc = SimplicialComplex(simplices) | ||
|
|
||
| for i in range(len(coauthorship)): | ||
| dic = {tuple(sorted(k)): v for k, v in coauthorship[i].items()} | ||
| sc.set_simplex_attributes(dic, name="citations") | ||
|
|
||
| return sc | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the addition of this dataset to graph is justified, maybe it needs a different file all together. Can you justify your choice for graph? other people might not find it intuitive anyway,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can you suggest a name for the file?
I'll be away for a week, I can fix the code based on the comments when I'm back.