Skip to content

Commit 4cee395

Browse files
committed
Initial MS-DIAL EIC reader
0 parents  commit 4cee395

14 files changed

Lines changed: 1058 additions & 0 deletions

File tree

.github/workflows/ci.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
pull_request:
6+
7+
jobs:
8+
python:
9+
runs-on: ubuntu-latest
10+
strategy:
11+
matrix:
12+
python-version: ["3.9", "3.12"]
13+
steps:
14+
- uses: actions/checkout@v4
15+
- uses: actions/setup-python@v5
16+
with:
17+
python-version: ${{ matrix.python-version }}
18+
- name: Run tests
19+
run: PYTHONPATH=src python -m unittest discover -s tests -v
20+
21+
rust:
22+
runs-on: ubuntu-latest
23+
steps:
24+
- uses: actions/checkout@v4
25+
- uses: dtolnay/rust-toolchain@stable
26+
- name: Check formatting
27+
working-directory: rust
28+
run: cargo fmt --check
29+
- name: Run tests
30+
working-directory: rust
31+
run: cargo test

.gitignore

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.DS_Store
2+
3+
# Python
4+
__pycache__/
5+
*.py[cod]
6+
.pytest_cache/
7+
.mypy_cache/
8+
.ruff_cache/
9+
.venv/
10+
venv/
11+
build/
12+
dist/
13+
*.egg-info/
14+
15+
# Rust
16+
target/
17+
18+
# Local data
19+
*.aef
20+
*.mzML
21+
*.raw
22+
*.RAW
23+
*.ibd
24+
*.ibf

CONTRIBUTING.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Contributing
2+
3+
Contributions are welcome, especially compatibility reports against real
4+
MS-DIAL versions.
5+
6+
Useful contributions:
7+
8+
- small synthetic `.EIC.aef` fixtures;
9+
- examples from different MS-DIAL 5 releases;
10+
- bug reports with the MS-DIAL version and acquisition mode;
11+
- tests for edge cases such as ion mobility, RI mode, empty traces, or large
12+
traces;
13+
- documentation improvements.
14+
15+
Please avoid committing large raw data files. If a real example is needed,
16+
prefer a tiny synthetic archive plus a short note explaining what real-world
17+
case it represents.
18+
19+
This project is unofficial. If MS-DIAL adds an official export or API for
20+
aligned EIC traces, compatibility with that official path should take priority.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Francois-Xavier Lehr
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# msdial-eic-reader
2+
3+
Unofficial reader for MS-DIAL aligned EIC archives (`AlignResult*.EIC.aef`).
4+
5+
The goal is simple: inspect the chromatogram traces and integration boundaries
6+
that MS-DIAL already used, without recomputing EICs from raw files in a
7+
downstream QC dashboard.
8+
9+
This project is not affiliated with MS-DIAL, RIKEN, or the MS-DIAL maintainers.
10+
The `.EIC.aef` format appears to be an internal binary format and may change in
11+
future MS-DIAL releases.
12+
13+
## Why this exists
14+
15+
MS-DIAL can show aligned chromatograms in the GUI, but downstream tools often
16+
only receive alignment tables and summary values. For QC, it is useful to see
17+
the per-sample traces behind an aligned feature:
18+
19+
- did every replicate integrate the expected peak;
20+
- did one sample pick a smaller neighboring peak;
21+
- are the left, apex, and right markers plausible;
22+
- can an operator review a batch without reopening the full MS-DIAL GUI.
23+
24+
Recomputing EICs outside MS-DIAL is not ideal for this use case, because the
25+
external viewer may show traces or peak boundaries that differ from what
26+
MS-DIAL actually integrated.
27+
28+
## What it reads
29+
30+
The reader targets MS-DIAL 5 `CSS1` EIC archives named like:
31+
32+
```text
33+
AlignResult*.EIC.aef
34+
```
35+
36+
For each aligned feature and sample trace, the JSON output includes:
37+
38+
- center values: RT, RI, m/z, drift, and chromatogram x-axis type;
39+
- sample/file ID;
40+
- peak markers: `left`, `top`, and `right`;
41+
- EIC points as `{ "x": ..., "intensity": ... }`;
42+
- original and returned point counts.
43+
44+
Feature indices are zero-based archive indices. In observed outputs they match
45+
the row order of the paired MS-DIAL alignment table after its header, but treat
46+
that as a practical convention rather than an official guarantee.
47+
48+
## Install from source
49+
50+
```bash
51+
git clone https://github.com/Fraximov/msdial-eic-reader.git
52+
cd msdial-eic-reader
53+
python -m pip install .
54+
```
55+
56+
## Python usage
57+
58+
```python
59+
from msdial_eic_reader import MsdialEicArchive
60+
61+
archive = MsdialEicArchive("AlignResult-test.EIC.aef")
62+
63+
print(archive.summary())
64+
65+
feature = archive.read_feature(42, max_points_per_trace=900)
66+
for peak in feature["peaks"]:
67+
print(peak["file_id"], peak["left"], peak["top"], peak["right"])
68+
```
69+
70+
## Command line usage
71+
72+
Print archive metadata:
73+
74+
```bash
75+
msdial-eic-reader summary AlignResult-test.EIC.aef
76+
```
77+
78+
Read one aligned feature:
79+
80+
```bash
81+
msdial-eic-reader feature AlignResult-test.EIC.aef 42 --max-points 900
82+
```
83+
84+
Read a small window of features:
85+
86+
```bash
87+
msdial-eic-reader window AlignResult-test.EIC.aef 40 5 --max-points 900
88+
```
89+
90+
Set `--max-points 0` to return all points. The default downsamples long traces
91+
to keep browser dashboards responsive.
92+
93+
## Rust CLI
94+
95+
A dependency-free Rust CLI is included in `rust/` for batch or dashboard use:
96+
97+
```bash
98+
cd rust
99+
cargo run -- --file ../AlignResult-test.EIC.aef --index 42 --max-points 900
100+
```
101+
102+
It also supports window reads, which are useful when a UI slider moves through
103+
nearby features:
104+
105+
```bash
106+
cargo run -- --file ../AlignResult-test.EIC.aef --start 40 --count 11 --max-points 900
107+
```
108+
109+
## Format notes
110+
111+
The inferred binary layout is documented in
112+
[`docs/css1-eic-aef-format.md`](docs/css1-eic-aef-format.md).
113+
114+
Short version:
115+
116+
```text
117+
10 bytes version string, null-padded ASCII, observed: CSS1
118+
int32 feature count
119+
int64[] absolute offsets to aligned feature payloads
120+
121+
feature payload:
122+
float32 center RT
123+
float32 center RI
124+
float32 center m/z
125+
float32 center drift
126+
uint8 chromatogram x-axis type
127+
int32 sample trace count
128+
129+
sample trace:
130+
int32 file ID
131+
int32 point count
132+
float32 top/apex x position
133+
float32 left boundary x position
134+
float32 right boundary x position
135+
float32[] repeated x, intensity pairs
136+
```
137+
138+
All numeric values are little-endian in observed files.
139+
140+
## Limitations
141+
142+
- This is an unofficial reader for an internal MS-DIAL file.
143+
- It has been tested only against observed MS-DIAL 5 `CSS1` archives.
144+
- It does not parse the paired alignment table; use the table to map feature
145+
indices to metabolite names, average RT, average m/z, annotations, and sample
146+
names.
147+
- It is intended for QC visualization and traceability, not for replacing
148+
MS-DIAL's integration.
149+
150+
## Contributing
151+
152+
Real-world fixtures are the most useful contribution, but please do not commit
153+
large raw data. Small synthetic `.EIC.aef` examples, version information, and
154+
edge cases are ideal.
155+
156+
If the MS-DIAL project later exposes an official export or API for this data,
157+
this reader should either adapt to that API or clearly point users to it.

docs/css1-eic-aef-format.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# CSS1 `.EIC.aef` Format Notes
2+
3+
These notes describe the `CSS1` layout observed in MS-DIAL 5
4+
`AlignResult*.EIC.aef` files.
5+
6+
This is not an official MS-DIAL specification. Treat it as an implementation
7+
note for interoperability and QC tooling.
8+
9+
## Byte order
10+
11+
Observed numeric values are little-endian.
12+
13+
## Archive header
14+
15+
| Offset | Type | Description |
16+
| --- | --- | --- |
17+
| 0 | `char[10]` | Null-padded ASCII version string. Observed value: `CSS1`. |
18+
| 10 | `int32` | Number of aligned features in the archive. |
19+
| 14 | `int64[feature_count]` | Absolute byte offsets to feature payloads. |
20+
21+
The offset table allows one aligned feature to be read without loading the full
22+
archive.
23+
24+
## Feature payload
25+
26+
Each offset points to one aligned feature payload.
27+
28+
| Type | Description |
29+
| --- | --- |
30+
| `float32` | Center retention time. |
31+
| `float32` | Center retention index. |
32+
| `float32` | Center m/z. |
33+
| `float32` | Center drift value. |
34+
| `uint8` | Chromatogram x-axis type. |
35+
| `int32` | Number of sample traces. |
36+
37+
Observed chromatogram x-axis type mapping:
38+
39+
| Value | Label |
40+
| --- | --- |
41+
| `0` | `rt` |
42+
| `1` | `ri` |
43+
| `2` | `drift` |
44+
| `3` | `mz` |
45+
46+
## Sample trace payload
47+
48+
Each feature contains one trace payload per sample/file.
49+
50+
| Type | Description |
51+
| --- | --- |
52+
| `int32` | File/sample ID. |
53+
| `int32` | Number of chromatogram points. |
54+
| `float32` | Top/apex x position. |
55+
| `float32` | Left integration boundary x position. |
56+
| `float32` | Right integration boundary x position. |
57+
| `float32`, `float32` repeated | Chromatogram point pairs: x value and intensity. |
58+
59+
The x units depend on the feature's x-axis type. For ordinary LC-MS EICs this is
60+
typically retention time.
61+
62+
## JSON shape
63+
64+
The Python and Rust readers expose one feature like this:
65+
66+
```json
67+
{
68+
"version": "CSS1",
69+
"feature_count": 1234,
70+
"feature_index": 42,
71+
"center": {
72+
"rt": 5.12,
73+
"ri": 0.0,
74+
"mz": 302.1234,
75+
"drift": -1.0,
76+
"main_type": "rt"
77+
},
78+
"trace_count": 2,
79+
"peaks": [
80+
{
81+
"file_id": 0,
82+
"point_count": 120,
83+
"returned_point_count": 120,
84+
"top": 5.11,
85+
"left": 5.02,
86+
"right": 5.20,
87+
"points": [
88+
{ "x": 5.0, "intensity": 1000.0 },
89+
{ "x": 5.01, "intensity": 1200.0 }
90+
]
91+
}
92+
]
93+
}
94+
```
95+
96+
## Alignment table pairing
97+
98+
The archive does not contain metabolite names or sample names in the observed
99+
layout. Pair it with the MS-DIAL alignment table exported from the same
100+
alignment result.
101+
102+
Practical mapping used by the current reader:
103+
104+
- feature index `0` corresponds to the first data row after the `Alignment ID`
105+
header in the paired alignment table;
106+
- `file_id` maps to the sample columns after `MS/MS spectrum` in observed
107+
MS-DIAL alignment tables.
108+
109+
These mappings should be validated against more MS-DIAL versions and acquisition
110+
modes.

0 commit comments

Comments
 (0)