Some basic documentation#11
Conversation
stellaprins
left a comment
There was a problem hiding this comment.
Nice start on the docs, left some comments and found one cause of the Sphinx CI failure that I pointed out , but there are still a few others.
To summarise:
- I would move underlying maths (and possibly maths notations) to specific function docstrings for people who want to dig into the technical details.
- Here instead I would explain basic graph terms (nodes, edges, edge table) and how they map to use cases for this specific topic (brain connectivity network).
- The most important thing for typical users is understanding the required input format, so I’d prioritise clear explanations and simple, domain‑relevant examples of the expected inputs.
I think we probably also want to give examples of super simple excel files as input, this may be easier to generate, and we can convert it to an edge table after receiving the input.
Any deeper mathematical detail can live in the relevant functions or docstrings.
| ``` | ||
|
|
||
| This package acts as an interface between neurological region-connectivity data and network (graph) analysis. | ||
| This allows us to phrase questions such as "how many direct routes are there from region A to region B?" as purely mathematical tasks ("find all non-cyclic paths between region A and region B, sorted by total connection strength") which can be solved using well-established network algorithms. |
There was a problem hiding this comment.
Mathematical descriptions are probably not necessary for most users. Would it be better to move the more technical explanations into docstrings, where advanced users that want to use the API can still access them?
I went through the examples and these may give a good idea of what can be done:
Simple connectivity queries like
- Which brain areas receive direct input from brain area A?
Ranked connectivity queries like
- Sort all direct inputs to brain area A by connection strength.
- Sort all paths from brain area A to brain area B by number of steps.
Constrained connectivity queries like
- What is the shortest path between brain areas A and B that passes through area C?
There was a problem hiding this comment.
Yeah, I was trying with this paragraph to "set the scene" as it were for the package itself. I wanted to include one of these common neurological questions and it's equivalent mathematical representation somewhere here, just to give a concrete example of what the package is doing under the hood.
But I've since updated the text (in the latest commit) to emphasise more that the package exists to allow neuroscientists to ask relevant neuroscience-questions, which the package will translate into maths, solve, then translate back, under-the-hood. Though you can let me know how well I've done with the new paragraph 😅
| ## Networks | ||
|
|
There was a problem hiding this comment.
I wonder whether this the mathematical notations here may be distracting. I think it would be good to focus on what input we expect and giving a small example that allows users to relate the edge table to a network of connections in the brain.
It would be good to explain that the connectivity network (or graph) consists of:
- Nodes (also called vertices) that represent brain areas
- Edges that represent the connections between them
- Weights that represent the strength of those connections
The entire network can be described by listing its edges. This list is often called an edge list or edge table, and it is what this API uses as input.
Our API expects the edge table formatted as a three‑column, comma‑separated CSV file where each row specifies one connection in the form: source node index, target node index, weight.
for example:
0,1,1
1,0,2
1,2,5
2,0,3This network consists of three brain areas, identified by their node indices 0, 1, and 2.
Direct connections:
- The connection between Area 0 and Area 1 is bidirectional. The connection from Area 0 to Area 1 is weak (strength 1), while the connection from Area 1 back to Area 0 is somewhat stronger (strength 2).
- The strongest connection in this network runs from Area 1 to Area 2 (strength 5).
- Area 2 has a moderate connection that runs back to Area 0 (strength 3).
Indirect connections
- Area 0 is indirectly connected to Area 2 via Area 1.
- Area 1 is indirectly connected to Area 0 via Area 2, in addition to the direct connection between them.
- Area 2 is indirectly connected to Area 1 via Area 0.
There was a problem hiding this comment.
I wonder whether this the mathematical notations here may be distracting.
I did try and limit the amount of notation, but ultimately I think we need to have some here in order to properly set the scene. This is the "neuroscience to maths" translation page - so we're going to need to establish some notation somewhere.
Including an example is a good idea though, so that after speaking in abstracts we can actually give an example of going from neurological data to the represented network.
I agree that anything beyond the minimum required can be moved to docstrings (and possibly an "optional" section at the end of this page!). But we still need some notation standard somewhere in order to do this 😅
| By convention, regions that are not connected do not appear in the edge list. | ||
| This is to distinguish regions that aren't connected, from regions that may happen to be connected with a weight of $0$, for example. | ||
|
|
There was a problem hiding this comment.
This is confusing.
I would write something like:
Connections that are excluded from the edge table can fall into several categories:
No data: Following an extensive search, no reports were found with data relevant to the connection and suitable for inclusion in the network analysis.
Unclear: It is unclear from the report if the connection exists or not. It is reasonable to infer that the connection is likely weak (at most), very weak, or absent.
Evidence of Absence: Evidence indicates that the connection does not exist.
Same origin & termination: Within‑region connections are not included in the network
There was a problem hiding this comment.
I've moved this into it's own NOTES section within the example, so there is some extra context. And have shamelessly stolen your nice listing 😁
three to four components -> following components (because three to four components leaves some ambiguity about which component is optional, I am assuming the metadata one?) specimen -> animal (charachteristics like age and sex refer to the living animal before the samples or specimens were collected, specimens are typically preserved animals, samples or tissue)
…be/brainglobe-data-api-connectivity into wgraham/some-basic-docs
I did not know what "save for provision" meant
…erpreting anlyses but also for reproducibility also, on-hand should be without hyphen (on hand)
stellaprins
left a comment
There was a problem hiding this comment.
@willGraham01 I've added changes to your branch while you were away, adding some more examples and mainly restructuring the documentation. I think it's okay to leave in alpha and bravo here (rather than real brain areas).
Great, thanks 👍 Since we've both touched these docs now, let's defer to @adamltyson's judgement of whether what we've written is actually what he's been expecting us to do! |
Cleans up the placeholder file in the documentation and adds a semi-useful "maths to neuroscience" dictionary page.
TODOs