Skip to content

Commit 536effe

Browse files
committed
Modernize CLI: Consolidate redundant scripts, implement robust CLI, add pyproject.toml, and enhance documentation with type hints and docstrings
1 parent 164a41d commit 536effe

17 files changed

Lines changed: 665 additions & 1215 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
/data
22
/venv
3+
__pycache__/
4+
*.py[cod]
5+
.pytest_cache/

README.md

Lines changed: 62 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,76 @@
11
# Feather RDF Mapper
22

3-
A simple mapper to map and convert .feather data files into RDF triples in a specified ontology, specifically in the N-Triples format.
3+
A robust CLI tool to map and convert `.feather` data files into RDF triples (N-Triples format) based on a specified ontology.
4+
5+
## Features
6+
7+
- **Flexible Input**: Process a single `.feather` file or an entire directory recursively.
8+
- **Metric Filtering**: Filter for specific metrics (e.g., `environment.temperature`, `org.dyamand.types.health.SpO2`).
9+
- **Resampling**: Support for resampling data at a specified rate (in seconds).
10+
- **Event Limiting**: Limit the number of processed events for quick testing.
11+
- **Modern Packaging**: Uses `pyproject.toml` for easy installation and dependency management.
12+
13+
## Installation
14+
15+
It is highly recommended to use a virtual environment.
16+
17+
```bash
18+
# Clone the repository
19+
git clone <repo-url>
20+
cd stream-aggregator-evaluation-mapper
21+
22+
# Install the package in editable mode
23+
pip install -e .
24+
```
25+
26+
This will install all necessary dependencies and provide the `feather-mapper` command globally in your environment.
427

528
## Usage
629

7-
Note: It is highly advised to use a venv with the required packages installed, as described in the [requirements](./requirements.txt) file.
30+
You can run the mapper using the installed `feather-mapper` command:
31+
32+
```bash
33+
feather-mapper -i <input_path> -o <output_file> [options]
34+
```
35+
36+
Alternatively, you can still run it as a script:
37+
38+
```bash
39+
python3 mapper/cli.py -i <input_path> -o <output_file> [options]
40+
```
41+
42+
### Arguments
43+
44+
- `-i`, `--input`: (Required) Path to a single `.feather` file or a directory containing `.feather` files.
45+
- `-o`, `--output`: (Required) Path to the output `.nt` (RDF) file.
46+
- `-m`, `--metrics`: (Optional) Space-separated list of metrics to filter by.
47+
- `-n`, `--limit`: (Optional) Maximum number of total events to process.
48+
- `-s`, `--sample-rate`: (Optional) Resampling rate in seconds.
49+
50+
### Examples
51+
52+
**Process a single file for specific metrics:**
53+
```bash
54+
feather-mapper -i data/participant6.feather -o output/results.nt -m environment.temperature wearable.skt
55+
```
56+
57+
**Process a directory with resampling and an event limit:**
58+
```bash
59+
feather-mapper -i /path/to/dataset/ -o output/spo2_data.nt -m org.dyamand.types.health.SpO2 -n 1000 -s 2.0
60+
```
61+
62+
## Testing
63+
64+
To run the unit tests:
865

9-
- Specify the name of the .feather file and the .nt file in the `mapper.py` file inside the mapper folder.
10-
- Run the file as a script with,
1166
```bash
12-
python3 mapper.py
67+
python3 -m unittest discover mapper/tests
1368
```
14-
- The .nt file will be created.
1569

1670
## License
1771

18-
This code is copyrighted by [Ghent University - imec](https://www.ugent.be/ea/idlab/en) and
19-
released under the [MIT Licence](./LICENCE)
72+
This code is copyrighted by [Ghent University - imec](https://www.ugent.be/ea/idlab/en) and released under the [MIT Licence](./LICENCE).
2073

2174
## Contact
2275

23-
For any questions, please contact [Kush](mailto:kushagrasingh.bisen@ugent.be).
76+
For any questions, please contact [Kush](mailto:kushagrasingh.bisen@ugent.be).

mapper/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Mapper package

mapper/cli.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import argparse
2+
import sys
3+
import os
4+
5+
# Add the parent directory to sys.path to allow running this script directly
6+
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
7+
8+
from mapper.process import process_input
9+
from mapper.core import logger
10+
11+
def main() -> None:
12+
"""
13+
Entrypoint for the Feather to RDF Mapper CLI.
14+
Parses command line arguments and initiates the data processing.
15+
"""
16+
parser = argparse.ArgumentParser(description="Feather to RDF (Turtle) Mapper CLI")
17+
18+
parser.add_argument("-i", "--input", required=True,
19+
help="Path to a single .feather file or a directory containing .feather files")
20+
21+
parser.add_argument("-o", "--output", required=True,
22+
help="Path to the output .nt (RDF) file")
23+
24+
parser.add_argument("-m", "--metrics", nargs="+",
25+
help="List of metrics to filter by (e.g., environment.temperature wearable.skt)")
26+
27+
parser.add_argument("-n", "--limit", type=int,
28+
help="Maximum number of total events to process")
29+
30+
parser.add_argument("-s", "--sample-rate", type=float,
31+
help="Resampling rate in seconds (optional)")
32+
33+
args = parser.parse_args()
34+
35+
# Ensure output directory exists
36+
output_dir = os.path.dirname(args.output)
37+
if output_dir and not os.path.exists(output_dir):
38+
os.makedirs(output_dir)
39+
40+
try:
41+
process_input(
42+
input_path=args.input,
43+
output_file=args.output,
44+
metrics=args.metrics,
45+
limit=args.limit,
46+
sample_rate=args.sample_rate
47+
)
48+
except KeyboardInterrupt:
49+
logger.info("Process interrupted by user.")
50+
sys.exit(0)
51+
except Exception as e:
52+
logger.error(f"An unexpected error occurred: {e}")
53+
sys.exit(1)
54+
55+
if __name__ == "__main__":
56+
main()

0 commit comments

Comments
 (0)