Skip to content

Commit 9f29e04

Browse files
committed
**feat: add reproducibility features, headless mode docs, and multi-platform CI**
**Added** * Reproducibility support: `seed` parameter added to Bayesian statistical tests (`bayesian_sign_test`, `bayesian_signed_rank_test`) * Reproducibility support: `seed` parameter added to `HistoPlot` for deterministic jitter * Multi-platform GitHub Actions CI (Windows, Ubuntu, macOS) * Smoke tests for CLI and API across all platforms * New environment files: `requirements.txt`, `requirements-dev.txt`, `environment.yml` * Full headless mode documentation and examples (Python, shell, Windows batch) * New test suite `test_bayesian_seed.py` for seed reproducibility * New reproducibility documentation page (`docs/usage/reproducibility.rst`) **Changed** * Updated `test_bayesian.py` to include seed usage and maintain backward compatibility * Enhanced README with reproducibility and headless mode sections * Refined documentation structure to integrate reproducibility guidance **Fixed** * Ensured all random operations can be made deterministic for fully reproducible results
1 parent 47bdf42 commit 9f29e04

17 files changed

Lines changed: 719 additions & 82 deletions
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Multi-platform CI workflow for Software X publication requirements
2+
# Tests on Windows, Linux, and macOS with smoke tests
3+
4+
name: Multi-Platform Tests
5+
6+
on:
7+
push:
8+
branches:
9+
- "main"
10+
- "develop"
11+
- "feature/**"
12+
pull_request:
13+
branches: [ "main" ]
14+
15+
permissions:
16+
contents: read
17+
18+
jobs:
19+
test:
20+
name: Test on ${{ matrix.os }} with Python ${{ matrix.python-version }}
21+
runs-on: ${{ matrix.os }}
22+
strategy:
23+
fail-fast: false
24+
matrix:
25+
os: [ubuntu-latest, windows-latest, macos-latest]
26+
python-version: ["3.10", "3.11", "3.12"]
27+
28+
steps:
29+
- name: Checkout code
30+
uses: actions/checkout@v4
31+
32+
- name: Install uv
33+
uses: astral-sh/setup-uv@v4
34+
with:
35+
version: "latest"
36+
37+
- name: Set up Python ${{ matrix.python-version }}
38+
uses: actions/setup-python@v5
39+
with:
40+
python-version: ${{ matrix.python-version }}
41+
42+
- name: Install dependencies
43+
run: |
44+
uv pip install --system -e .[test,html]
45+
46+
- name: Run unit tests with coverage
47+
run: |
48+
coverage run -m unittest discover tests
49+
coverage report
50+
coverage xml
51+
52+
- name: Upload coverage to artifact
53+
uses: actions/upload-artifact@v4
54+
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10'
55+
with:
56+
name: coverage-report
57+
path: coverage.xml
58+
59+
smoke-tests:
60+
name: Smoke Tests on ${{ matrix.os }}
61+
runs-on: ${{ matrix.os }}
62+
strategy:
63+
fail-fast: false
64+
matrix:
65+
os: [ubuntu-latest, windows-latest, macos-latest]
66+
67+
steps:
68+
- name: Checkout code
69+
uses: actions/checkout@v4
70+
71+
- name: Install uv
72+
uses: astral-sh/setup-uv@v4
73+
with:
74+
version: "latest"
75+
76+
- name: Set up Python
77+
uses: actions/setup-python@v5
78+
with:
79+
python-version: "3.10"
80+
81+
- name: Install SAES
82+
run: |
83+
uv pip install --system -e .[test]
84+
85+
- name: Smoke test - Import package
86+
run: |
87+
python -c "import SAES; print('SAES version:', SAES.__version__)"
88+
89+
- name: Smoke test - Command line help
90+
run: |
91+
python -m SAES -h
92+
93+
- name: Smoke test - LaTeX table generation (Unix)
94+
if: runner.os != 'Windows'
95+
run: |
96+
python -m SAES -ls -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -s mean_median -op /tmp/test_output.tex
97+
98+
- name: Smoke test - LaTeX table generation (Windows)
99+
if: runner.os == 'Windows'
100+
shell: pwsh
101+
run: |
102+
python -m SAES -ls -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -s mean_median -op $env:TEMP/test_output.tex
103+
104+
- name: Smoke test - Boxplot generation (Unix)
105+
if: runner.os != 'Windows'
106+
run: |
107+
python -m SAES -bp -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -i DTLZ1 -op /tmp/test_boxplot.png
108+
109+
- name: Smoke test - Boxplot generation (Windows)
110+
if: runner.os == 'Windows'
111+
shell: pwsh
112+
run: |
113+
python -m SAES -bp -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -i DTLZ1 -op $env:TEMP/test_boxplot.png
114+
115+
- name: Smoke test - Critical distance plot (Unix)
116+
if: runner.os != 'Windows'
117+
run: |
118+
python -m SAES -cdp -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -op /tmp/test_cdplot.png
119+
120+
- name: Smoke test - Critical distance plot (Windows)
121+
if: runner.os == 'Windows'
122+
shell: pwsh
123+
run: |
124+
python -m SAES -cdp -ds tests/test_data/swarmIntelligence.csv -ms tests/test_data/multiobjectiveMetrics.csv -m HV -op $env:TEMP/test_cdplot.png
125+
126+
- name: Smoke test - Statistical tests with seed (reproducibility)
127+
run: |
128+
python -c "
129+
from SAES.statistical_tests.bayesian import bayesian_sign_test
130+
import pandas as pd
131+
import numpy as np
132+
133+
data = pd.DataFrame({
134+
'Algorithm_A': [0.9, 0.85, 0.95, 0.9, 0.92],
135+
'Algorithm_B': [0.5, 0.6, 0.55, 0.58, 0.52]
136+
})
137+
138+
# Test reproducibility with seed
139+
result1, _ = bayesian_sign_test(data, sample_size=1000, seed=42)
140+
result2, _ = bayesian_sign_test(data, sample_size=1000, seed=42)
141+
142+
np.testing.assert_array_almost_equal(result1, result2, decimal=10)
143+
print('✓ Reproducibility test passed: Results are identical with seed=42')
144+
"
145+
146+
- name: Smoke test - Histoplot with seed (reproducibility)
147+
run: |
148+
python -c "
149+
from SAES.plots.histoplot import HistoPlot
150+
import pandas as pd
151+
152+
data = pd.read_csv('tests/test_data/swarmIntelligence.csv')
153+
metrics = pd.read_csv('tests/test_data/multiobjectiveMetrics.csv')
154+
155+
# Test that histoplot can be initialized with seed
156+
histoplot = HistoPlot(data, metrics, 'HV', seed=42)
157+
print('✓ HistoPlot initialization with seed successful')
158+
"
159+

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ __pycache__/
77
*.py[cod]
88
*$py.class
99

10+
htmls/*.ipynb
11+
1012
# C extensions
1113
*.so
1214

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
## [Released]
1111

12+
## [1.5.0] - 2025-11-21
13+
14+
### Added
15+
- **Reproducibility support**: Added `seed` parameter to Bayesian statistical tests (`bayesian_sign_test`, `bayesian_signed_rank_test`) for deterministic results
16+
- **Reproducibility support**: Added `seed` parameter to `HistoPlot` class for consistent jitter in histogram generation
17+
- **Multi-platform CI**: Comprehensive GitHub Actions workflow testing on Windows, Linux (Ubuntu), and macOS
18+
- **Smoke tests**: Added automated smoke tests for CLI and API functionality across all platforms
19+
- **Environment files**: Added `requirements.txt`, `requirements-dev.txt`, and `environment.yml` for broader compatibility
20+
- **Headless mode documentation**: Complete documentation and examples for running SAES without display (CI/CD, servers)
21+
- **Headless mode examples**: Python and shell script examples for automated workflows (`examples/headless_mode_example.py`, `examples/headless_cli_example.sh`, `examples/headless_cli_example.bat`)
22+
- **New test suite**: Added `test_bayesian_seed.py` specifically for verifying seed reproducibility
23+
- **Documentation**: New reproducibility documentation page (`docs/usage/reproducibility.rst`) with best practices
24+
25+
### Changed
26+
- Updated `test_bayesian.py` to demonstrate both seed parameter usage and backward compatibility
27+
- Enhanced README with sections on reproducibility and headless mode
28+
- Improved documentation structure to include reproducibility guidance
29+
30+
### Fixed
31+
- Ensured all random operations can be made deterministic for reproducible research
32+
1233
## [1.4.0] - 2025-11-15
1334

1435
### Added

README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,51 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
124124
pip install -e ".[dev]"
125125
```
126126

127+
### Using Environment Files
128+
129+
For broader compatibility, environment files are provided:
130+
131+
```sh
132+
# Using pip with requirements.txt
133+
pip install -r requirements.txt
134+
135+
# Using conda with environment.yml
136+
conda env create -f environment.yml
137+
conda activate saes
138+
```
139+
140+
## 🔄 Reproducibility
141+
142+
SAES supports **deterministic seeds** for reproducible research:
143+
144+
```python
145+
from SAES.statistical_tests.bayesian import bayesian_sign_test
146+
from SAES.plots.histoplot import HistoPlot
147+
148+
# Bayesian tests with seed for reproducibility
149+
result, _ = bayesian_sign_test(data, sample_size=5000, seed=42)
150+
151+
# Histogram plots with consistent jitter
152+
histoplot = HistoPlot(data, metrics, "Accuracy", seed=42)
153+
```
154+
155+
See the [reproducibility documentation](https://jMetal.github.io/SAES/usage/reproducibility.html) for details.
156+
157+
## 💻 Headless Mode
158+
159+
SAES can run in headless mode (without display) for automated workflows, CI/CD pipelines, and server environments:
160+
161+
```bash
162+
# Set matplotlib backend
163+
export MPLBACKEND=Agg
164+
165+
# Run SAES commands
166+
python -m SAES -ls -ds data.csv -ms metrics.csv -m HV -s friedman -op results.tex
167+
python -m SAES -bp -ds data.csv -ms metrics.csv -m HV -i Problem1 -op boxplot.png
168+
```
169+
170+
See `examples/headless_mode_example.py` for a complete Python example or `examples/headless_cli_example.sh` for CLI usage.
171+
127172
## 🤝 Contributors
128173

129174
- [![GitHub](https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white)](https://github.com/rorro6787) **Emilio Rodrigo Carreira Villalta**

SAES/statistical_tests/bayesian.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ def bayesian_sign_test(data: pd.DataFrame,
77
rope_limits=[-0.01, 0.01],
88
prior_strength=0.5,
99
prior_place="rope",
10-
sample_size=5000) -> tuple:
10+
sample_size=5000,
11+
seed=None) -> tuple:
1112
"""
1213
Performs the Bayesian sign test to compare the performance of two algorithms across multiple instances.
1314
The Bayesian sign test is a non-parametric statistical test used to compare the performance of two algorithms on multiple instances. The null hypothesis is that the algorithms perform equivalently, which implies their average ranks are equal.
@@ -40,6 +41,9 @@ def bayesian_sign_test(data: pd.DataFrame,
4041
sample_size (int):
4142
Total number of random_search samples generated. Default is 5000.
4243
44+
seed (int, optional):
45+
Random seed for reproducibility. Default is None (non-deterministic).
46+
4347
Returns:
4448
tuple: A tuple containing the posterior probabilities and the samples drawn from the Dirichlet process. List of posterior probabilities:
4549
- Pr(algorith_1 < algorithm_2)
@@ -62,6 +66,10 @@ def bayesian_sign_test(data: pd.DataFrame,
6266
else:
6367
raise ValueError("Initialization ERROR. Incorrect number of dimensions for axis 1")
6468

69+
# Set random seed for reproducibility
70+
if seed is not None:
71+
np.random.seed(seed)
72+
6573
# Compute the differences
6674
Z = sample1 - sample2
6775

@@ -93,7 +101,8 @@ def bayesian_signed_rank_test(data,
93101
rope_limits=[-0.01, 0.01],
94102
prior_strength=1.0,
95103
prior_place="rope",
96-
sample_size=1000) -> tuple:
104+
sample_size=1000,
105+
seed=None) -> tuple:
97106
"""
98107
Performs the Bayesian version of the signed rank test to compare the performance of two algorithms across multiple instances.
99108
The Bayesian sign test is a non-parametric statistical test used to compare the performance of two algorithms on multiple instances. The null hypothesis is that the algorithms perform equivalently, which implies their average ranks are equal.
@@ -126,6 +135,9 @@ def bayesian_signed_rank_test(data,
126135
sample_size (int):
127136
Total number of random_search samples generated. Default is 5000.
128137
138+
seed (int, optional):
139+
Random seed for reproducibility. Default is None (non-deterministic).
140+
129141
Returns:
130142
tuple: A tuple containing the posterior probabilities and the samples drawn from the Dirichlet process. List of posterior probabilities:
131143
- Pr(algorith_1 < algorithm_2)
@@ -153,6 +165,10 @@ def weights(n, s):
153165
else:
154166
raise ValueError("Initialization ERROR. Incorrect number of dimensions for axis 1")
155167

168+
# Set random seed for reproducibility
169+
if seed is not None:
170+
np.random.seed(seed)
171+
156172
# Compute the differences
157173
Z = sample1 - sample2
158174
Z0 = [-float("Inf"), 0.0, float("Inf")][["left", "rope", "right"].index(prior_place)]

0 commit comments

Comments
 (0)