Skip to content

Commit 44b0500

Browse files
bdpedigojwellan1daxprycehugwuokektwillcode
authored
Added density and group connection tests (#1032)
* Created new branch with files from jw-feature branch * Fix seaborn syntax * Increment bugfix version * Cleared many mypy errors * Corrected some typing and improved test * More fixes and edits * Fixing docs issues * Exclude 3.6.1 I added to line 35. I assumed the asterisk on 3.3 meant all versions of 3.3 ex (3.3.1, 3.3.2) are all excluded from the matplotlib. Is this correct? * Update setup.cfg * types need to be AdjacencyMatrix * bump scipy, which means deprecate python 3.7 * maybe fix labels error * add beartype * clean up docs and add a note * clean up line lengths and indentation in docstring * black * fix line lengths * add intersphinx * add another intersphinx * fix doc indentation * more docstring work * clarify number of nodes * add math for null and alt * remove unused imports * typo * add some math and notes and ref * add ref * disclaimers * add some warnings * try to fix changes from auto toc * fix isort * fix typing issues * Update README.md Removed outdated Zenodo DOI from README.md * Deleted bad kwargs from tests * some typing/tests * fix to non-unity probability ratio * fix code format * fix tests * fix citation * fix more citation * remove file * fix docs * clean up tests * docs * fix up tutorials * fix notebook * fix int issue * run black * black with updated * fix notebook math * fix imports * clean up tutorial --------- Co-authored-by: Jeremy Welland <jwellan1@jhmi.edu> Co-authored-by: Dax Pryce <daxpryce@microsoft.com> Co-authored-by: hugwuoke <85888975+hugwuoke@users.noreply.github.com> Co-authored-by: Kartikeya Tripathi <96724863+ktwillcode@users.noreply.github.com>
1 parent 13d0d46 commit 44b0500

17 files changed

Lines changed: 1278 additions & 10 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
88

99
## `graspologic` is a package for graph statistical algorithms.
10-
10+
<!-- no toc -->
1111
- [Overview](#overview)
1212
- [Documentation](#documentation)
1313
- [System Requirements](#system-requirements)

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@
8686
"scipy": ("https://docs.scipy.org/doc/scipy", None),
8787
"seaborn": ("https://seaborn.pydata.org", None),
8888
"sklearn": ("https://scikit-learn.org/dev", None),
89+
"statsmodels": ("https://www.statsmodels.org/stable", None),
8990
}
9091

9192
intersphinx_disabled_reftypes = []

docs/reference/reference/inference.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ Inference
66
Two-graph hypothesis testing
77
----------------------------
88

9+
.. autofunction:: density_test
10+
11+
.. autofunction:: group_connection_test
12+
913
.. autofunction:: latent_position_test
1014

1115
.. autofunction:: latent_distribution_test

docs/sphinx-ext/toctree_filter.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
# Copied and modified from https://stackoverflow.com/questions/15001888/conditional-toctree-in-sphinx
22

33
import re
4+
45
from sphinx.directives.other import TocTree
56

67

78
def setup(app):
8-
app.add_config_value('toc_filter_exclude', [], 'html')
9-
app.add_directive('toctree-filt', TocTreeFilt)
10-
return {'version': '1.0.0'}
9+
app.add_config_value("toc_filter_exclude", [], "html")
10+
app.add_directive("toctree-filt", TocTreeFilt)
11+
return {"version": "1.0.0"}
12+
1113

1214
class TocTreeFilt(TocTree):
1315
"""
@@ -21,7 +23,8 @@ class TocTreeFilt(TocTree):
2123
form `:secret:ultra-api` or `:draft:new-features` will be excuded from
2224
the final table of contents. Entries without a prefix are always included.
2325
"""
24-
hasPat = re.compile('\s*(.*)$')
26+
27+
hasPat = re.compile("\s*(.*)$")
2528

2629
# Remove any entries in the content that we dont want and strip
2730
# out any filter prefixes that we want but obviously don't want the

docs/tutorials/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ are tutorials for robust statistical hypothesis testing on multiple graphs.
8080
:maxdepth: 1
8181
:titlesonly:
8282

83+
inference/density_test
84+
inference/group_connection_test
8385
inference/latent_position_test
8486
inference/latent_distribution_test
8587

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"id": "9b546925",
7+
"metadata": {},
8+
"source": [
9+
"# Testing Symmetry of Two Networks with the Density Test\n",
10+
"The \"inference\" module of graspologic contains functions that enable quantitative comparison of two networks to assess whether they are statistically similar. This \"similarity\" can be assessed in a few different ways, depending on the details of the networks to be compared and the preferences of the user. \n",
11+
"\n",
12+
"The simplest test that can be performed is the density test, which is based upon the Erdos-Renyi model. Under this model, it is assumed that the probability of an edge between any two nodes of the network is some constant, p. To compare two networks, then, the question is whether the edge probability for the first network is different from the edge probability for the second network. This test can be performed easily with the inference module, and the procedure is described in greater detail below. "
13+
]
14+
},
15+
{
16+
"attachments": {},
17+
"cell_type": "markdown",
18+
"id": "75c9ea1b",
19+
"metadata": {},
20+
"source": [
21+
"## The Erdos-Renyi (ER) model\n",
22+
"The [**Erdos-Renyi (ER) model**\n",
23+
"](https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model)\n",
24+
"is one of the simplest network models. This model treats\n",
25+
"the probability of each potential edge in the network occuring to be the same. In\n",
26+
"other words, all edges between any two nodes are equally likely.\n",
27+
"\n",
28+
"```{admonition} Math\n",
29+
"Let $n$ be the number of nodes. We say that for all $(i, j), i \\neq j$, with $i$ and\n",
30+
"$j$ both running\n",
31+
"from $1 ... n$, the probability of the edge $(i, j)$ occuring is:\n",
32+
"\n",
33+
"$$ P[A_{ij} = 1] = p_{ij} = p $$\n",
34+
"\n",
35+
"Where $p$ is the the global connection probability.\n",
36+
"\n",
37+
"Each element of the adjacency matrix $A$ is then sampled independently according to a\n",
38+
"[Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution):\n",
39+
"\n",
40+
"$$ A_{ij} \\sim Bernoulli(p) $$\n",
41+
"\n",
42+
"For a network modeled as described above, we say it is distributed\n",
43+
"\n",
44+
"$$ A \\sim ER(n, p) $$\n",
45+
"\n",
46+
"```\n",
47+
"\n",
48+
"Thus, for this model, the only parameter of interest is the global connection\n",
49+
"probability, $p$. This is sometimes also referred to as the **network density**."
50+
]
51+
},
52+
{
53+
"attachments": {},
54+
"cell_type": "markdown",
55+
"id": "27daff33",
56+
"metadata": {},
57+
"source": [
58+
"## Testing under the ER model\n",
59+
"In order to compare two networks $A^{(L)}$ and $A^{(R)}$ under this model, we\n",
60+
"simply need to compute these network densities ($p^{(L)}$ and $p^{(R)}$), and then\n",
61+
"run a statistical test to see if these densities are significantly different.\n",
62+
"\n",
63+
"```{admonition} Math\n",
64+
"Under this\n",
65+
"model, the total number of edges $m$ comes from a $Binomial(n(n-1), p)$ distribution,\n",
66+
"where $n$ is the number of nodes. This is because the number of edges is the sum of\n",
67+
"independent Bernoulli trials with the same probability. If $m^{(L)}$ is the number of\n",
68+
"edges on the left\n",
69+
"hemisphere, and $m^{(R)}$ is the number of edges on the right, then we have:\n",
70+
"\n",
71+
"$$m^{(L)} \\sim Binomial(n^{(L)}(n^{(L)} - 1), p^{(L)})$$\n",
72+
"\n",
73+
"and independently,\n",
74+
"\n",
75+
"$$m^{(R)} \\sim Binomial(n^{(R)}(n^{(R)} - 1), p^{(R)})$$\n",
76+
"\n",
77+
"To compare the two networks, we are just interested in a comparison of $p^{(L)}$ vs.\n",
78+
"$p^{(R)}$. Formally, we are testing:\n",
79+
"\n",
80+
"$$H_0: p^{(L)} = p^{(R)}, \\quad H_a: p^{(L)} \\neq p^{(R)}$$\n",
81+
"\n",
82+
"Fortunately, the problem of testing for equal proportions is well studied.\n",
83+
"Using graspologic.inference, we can conduct this comparison using either\n",
84+
"Fisher's exact test or the chi-squared test by using method=\"fisher\" or \n",
85+
"method = \"chi2\", respectively. In this example, we use Fisher's exact test.\n",
86+
"```"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "e9a74b29",
93+
"metadata": {
94+
"execution": {
95+
"iopub.execute_input": "2022-04-19T19:39:17.453921Z",
96+
"iopub.status.busy": "2022-04-19T19:39:17.453693Z",
97+
"iopub.status.idle": "2022-04-19T19:39:24.698064Z",
98+
"shell.execute_reply": "2022-04-19T19:39:24.697216Z"
99+
},
100+
"tags": [
101+
"hide-input"
102+
]
103+
},
104+
"outputs": [],
105+
"source": [
106+
"import numpy as np\n",
107+
"import matplotlib.pyplot as plt\n",
108+
"\n",
109+
"from graspologic.inference.density_test import density_test\n",
110+
"from graspologic.simulations import er_np\n",
111+
"from graspologic.plot import heatmap\n",
112+
"\n",
113+
"np.random.seed(8888)\n",
114+
"\n",
115+
"%matplotlib inline"
116+
]
117+
},
118+
{
119+
"attachments": {},
120+
"cell_type": "markdown",
121+
"id": "6895aedd",
122+
"metadata": {},
123+
"source": [
124+
"# Performing the Density Test \n",
125+
"\n",
126+
"To illustrate the density test, we will first randomly generate two networks of known density to compare using the test. "
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": null,
132+
"id": "a682bc4a",
133+
"metadata": {},
134+
"outputs": [],
135+
"source": [
136+
"A1 = er_np(500, 0.6)\n",
137+
"A2 = er_np(400, 0.8)\n",
138+
"heatmap(A1, title='Adjacency Matrix for Network 1')\n",
139+
"heatmap(A2, title='Adjacency Matrix for Network 2')"
140+
]
141+
},
142+
{
143+
"attachments": {},
144+
"cell_type": "markdown",
145+
"id": "18611379",
146+
"metadata": {},
147+
"source": [
148+
"Visibly, these networks have very different densities. We can statistically confirm this difference by conducting a density test."
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"id": "44d3204c",
155+
"metadata": {},
156+
"outputs": [],
157+
"source": [
158+
"stat, pvalue, er_misc = density_test(A1,A2)\n",
159+
"print(pvalue)\n"
160+
]
161+
}
162+
],
163+
"metadata": {
164+
"jupytext": {
165+
"cell_metadata_filter": "-all",
166+
"main_language": "python",
167+
"notebook_metadata_filter": "-all"
168+
},
169+
"kernelspec": {
170+
"display_name": "Python 3.9.5 ('venv')",
171+
"language": "python",
172+
"name": "python3"
173+
},
174+
"language_info": {
175+
"codemirror_mode": {
176+
"name": "ipython",
177+
"version": 3
178+
},
179+
"file_extension": ".py",
180+
"mimetype": "text/x-python",
181+
"name": "python",
182+
"nbconvert_exporter": "python",
183+
"pygments_lexer": "ipython3",
184+
"version": "3.9.5"
185+
},
186+
"vscode": {
187+
"interpreter": {
188+
"hash": "7b0fa2133086a4b245bc2fc6826174141053e0208e756a5fae09980b942619c9"
189+
}
190+
}
191+
},
192+
"nbformat": 4,
193+
"nbformat_minor": 5
194+
}

0 commit comments

Comments
 (0)