Exp/n50 fedavg baseline 20260411#2263
Conversation
- M2 priority subset: 36/36 complete (byzantine/label_flipping/model_replacement) - M3 priority subset: 36/36 complete (krum/trimmed_mean/VeriFL) - M1 baseline epochs=5: 5/24 (terminated, insufficient improvement) - M1 epochs=1 results archived to old_epochs1/ - Updated scripts with GPU support, error handling, label_flipping params
按《M1 闭环学术需求.md》冻结配方完整运行 26 个实验: - CIFAR-10 (ResNet18): 4/4 α 配置通过 (α=0.1 MA=56.43%≥55%, std偏高已用5 seeds) - MNIST (LeNet5): 4/4 全部通过 (最低 97.55%≥90%) - M1 总判定: ✅ 达标,可进入 M2 关键修正: - run_experiment.sh: server_lr 0.3→1.0, weight_decay 按数据集区分, server_momentum→0 - E=1 (从旧版 E=5 改回), 符合 Fang et al. 标准 FedAvg 设定 - α=0.1 CIFAR-10 增至 5 seeds 应对高方差
- M1.5_EXPERIMENT_REPORT.md: code fixes (WI-1~9), seed test (WI-10), M1 rerun results (WI-11)
- 24 JSONL: 8 configs × 3 seeds (ResNet18/CIFAR-10 + LeNet5/MNIST, α∈{0.1,0.3,0.5,100})
- 24 YAML: persistent experiment configs under results/configs/
- 48 P1 JSONL results (24 LF + 24 SA main experiments) - 3 SA gamma10 archive files (CIFAR-10 alpha=0.5 before gamma=1 overwrite) - 48 failed maxsamples300 archive (first run with truncated data) - 51 YAML configs for all P1 experiments - Phase 0 smoke test archive (2 experiments) - All results under python/examples/federate/prebuilt_jobs/shieldfl/results/
Changes: - model_replacement_backdoor_attack.py: add scale_gamma='auto' support (gamma = total_samples / malicious_samples, ensures gamma*w_m = 1.0) - run_p1.5_sa_gpu.sh: batch script for 35 P1.5 experiments - Configs updated: attack_rounds=[99]/[49], scale_gamma=auto - MNIST seed 3/4 added (D-P1.5-3) Results (AC 7/11 PASS, 4/11 FAIL): - CIFAR-10: 4/4 alpha mean ASR >= 0.80 (0.896~0.965) - CIFAR-10: 1/12 NaN (alpha=0.1 seed=0, gamma=22.94, BN cascade) - MNIST: 0/4 alpha median ASR >= 0.80 (single-round insufficient for LeNet5) - Causality: 3/3 control groups delta_ASR >= 0.30 - All 4 failures are academic design limitations, not code bugs Docs: - M2_SA_FIX_PHASE1_ERROR_FIX.md: P1.5 implementation spec - M2_SA_FIX_PHASE1_ERROR2.md: experiment report + code audit
…port
ShieldFL Scaling Attack (Model Replacement) formal experiment suite:
- 28 experiment groups: 9 CIFAR-10 + 15 MNIST attack + 2 γ=1 control + 2 baselines
- N=50 clients, FedAvg, PMR=20%, α∈{0.1,0.5,100}, 5 seeds each
Code changes:
- eval/metrics.py: add gamma_tag in JSONL filename to prevent collision
- trainer/shieldfl_aggregator.py: read gamma_actual via attacker.last_gamma
- model_replacement_backdoor_attack.py: persist self.last_gamma after scaling
- fedml_aggregator.py: fix GPU OOM in aggregation
- fedml/__init__.py: fix cpu_transfer MPI override
Artifacts:
- 26 JSONL result files + 28 YAML configs + batch_done.txt
- N50_EXPERIMENT_REPORT.md: full AC verification & code audit (§八)
- N50_EXPERIMENT_GUIDE.md: experiment execution guide
- M2_SA_FIX_PHASE1_ERROR2_FIX.md: 14 decisions formal spec
- .gitignore: exclude __pycache__, batch_logs, archives
AC results: AC-A-1/A-2 PASS, AC-A-3 FAIL (NaN), AC-A-7 FAIL (ΔASR)
Code audit confirms failures are academic design issues, not code bugs.
- Replace 17 if-elif branches in FedMLDefender with _DefenseRegistry (lazy-loaded, phase-declared, alias-supported) - Add FedMLDefender.register_defense() public API for external defenses - Explicit state reset in init() to prevent cross-experiment leakage - Rename VeriFL → VeriFLv16 (aggregator, trainer, classes, config values) - Fix leftover VeriFLTrainer._NORM_PARAMS reference in v16 trainer - Add smoke tests (T1-T6 unit + T7 e2e GPU) - Add PLUGGABLE_DEFENSE_CHANGELOG.md
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds ShieldFL “Phase 1 / N=50” experiment assets to the FedML prebuilt job, including model registry support, deterministic data loading, structured metrics/ASR evaluation, GPU mapping, and a large set of experiment configuration snapshots and run documentation.
Changes:
- Add ShieldFL prebuilt job entrypoint plus model hub (LeNet5/ResNet18/ResNet20/SimpleCNN) and dataset loader for MNIST/CIFAR-10.
- Add evaluation utilities for structured JSONL metrics logging and ASR (backdoor success rate) computation.
- Add many experiment YAML configs (FedAvg baselines + attack/label-flip variants) and Markdown run reports/guides.
Reviewed changes
Copilot reviewed 82 out of 655 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed4.yaml | Adds N=50 MNIST model-replacement config snapshot (seed 4). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed3.yaml | Adds N=50 MNIST model-replacement config snapshot (seed 3). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed2.yaml | Adds N=50 MNIST model-replacement config snapshot (seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed1.yaml | Adds N=50 MNIST model-replacement config snapshot (seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed0.yaml | Adds N=50 MNIST model-replacement config snapshot (seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed4.yaml | Adds MNIST model-replacement config snapshot (PMR 0.1, seed 4). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed3.yaml | Adds MNIST model-replacement config snapshot (PMR 0.1, seed 3). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed2.yaml | Adds MNIST model-replacement config snapshot (PMR 0.1, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed1.yaml | Adds MNIST model-replacement config snapshot (PMR 0.1, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed0.yaml | Adds MNIST model-replacement config snapshot (PMR 0.1, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed2.yaml | Adds MNIST label-flipping config snapshot (α=100, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed1.yaml | Adds MNIST label-flipping config snapshot (α=100, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed0.yaml | Adds MNIST label-flipping config snapshot (α=100, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed2.yaml | Adds MNIST label-flipping config snapshot (α=0.5, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed1.yaml | Adds MNIST label-flipping config snapshot (α=0.5, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed0.yaml | Adds MNIST label-flipping config snapshot (α=0.5, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed2.yaml | Adds MNIST label-flipping config snapshot (α=0.3, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed1.yaml | Adds MNIST label-flipping config snapshot (α=0.3, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed0.yaml | Adds MNIST label-flipping config snapshot (α=0.3, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed2.yaml | Adds MNIST label-flipping config snapshot (α=0.1, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed1.yaml | Adds MNIST label-flipping config snapshot (α=0.1, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed0.yaml | Adds MNIST label-flipping config snapshot (α=0.1, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed2.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed1.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed0.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed2.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed1.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed0.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed2.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed1.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed0.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed2.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 2). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed1.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 1). |
| python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed0.yaml | Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 0). |
| python/examples/federate/prebuilt_jobs/shieldfl/model/simple_cnn.py | Adds SimpleCNN model implementation. |
| python/examples/federate/prebuilt_jobs/shieldfl/model/resnet20.py | Adds ResNet20 model implementation. |
| python/examples/federate/prebuilt_jobs/shieldfl/model/resnet18.py | Adds CIFAR-10-adapted ResNet18 implementation. |
| python/examples/federate/prebuilt_jobs/shieldfl/model/model_hub.py | Adds model registry + factory for config-driven model creation. |
| python/examples/federate/prebuilt_jobs/shieldfl/model/lenet5.py | Adds MNIST LeNet5 implementation. |
| python/examples/federate/prebuilt_jobs/shieldfl/model/init.py | Exposes model creation helpers. |
| python/examples/federate/prebuilt_jobs/shieldfl/main_fedml_shieldfl.py | Adds ShieldFL FedML entrypoint wiring data/model/trainer/aggregator/runner. |
| python/examples/federate/prebuilt_jobs/shieldfl/eval/metrics.py | Adds structured JSONL metrics collector and metadata capture. |
| python/examples/federate/prebuilt_jobs/shieldfl/eval/asr.py | Adds ASR evaluation + trigger normalization helper. |
| python/examples/federate/prebuilt_jobs/shieldfl/data/data_loader.py | Adds deterministic dataset loading + stratified val/trust splits + Dirichlet partitioning. |
| python/examples/federate/prebuilt_jobs/shieldfl/data/init.py | Exposes data loader API. |
| python/examples/federate/prebuilt_jobs/shieldfl/config/gpu_mapping.yaml | Adds GPU mapping presets (incl. N=50 isolation mapping). |
| python/examples/federate/prebuilt_jobs/shieldfl/init.py | Adds package marker for ShieldFL prebuilt job. |
| python/examples/federate/prebuilt_jobs/shieldfl/.gitignore | Ignores generated logs/results/temp configs. |
| RUN_RECORD_PHASE1.md | Adds Phase 1 run record evidence and outcomes. |
| PLUGGABLE_DEFENSE_CHANGELOG.md | Adds defense-registry refactor + VeriFL v16 rename changelog. |
| PHASE1.md | Adds Phase 1 implementation checklist/spec. |
| N50_EXPERIMENT_REPORT.md | Adds N=50 experiment analysis report. |
| N50_EXPERIMENT_GUIDE.md | Adds N=50 execution guide and operational notes. |
| M2_SA_REPORT.md | Adds M2 scaling-attack report summary. |
| M2_SA_FIX_PHASE1_ERROR2.md | Adds P1.5 scaling-attack report with audit notes. |
| M2_LF_EXPERIMENT_REPORT.md | Adds label-flipping attack acceptance report. |
| M1_EXPERIMENT_REPORT.md | Adds M1 baseline report. |
| M1_EXPERIMENT_PARAMS.md | Adds frozen parameter recipe for M1. |
| M1_ACADEMIC_CONSISTENCY.md | Adds M1 academic consistency report. |
| M1.5_EXPERIMENT_REPORT.md | Adds M1.5 fixes + rerun acceptance report. |
| LF 复现与验收/威胁模型.md | Adds threat model writing skeleton and constraints. |
| LF 复现与验收/attack_list.md | Adds prioritized attack reproduction list and rationale. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| non_target_mask = (labels != target_label) | ||
| if non_target_mask.sum() == 0: | ||
| continue | ||
| images[:, :, -trigger_size:, -trigger_size:] = trigger |
There was a problem hiding this comment.
The trigger is currently injected into all images in the batch, including samples where label == target_label, but the docstring states the trigger should be injected only for label != target_label. Apply the trigger only to the subset indexed by non_target_mask (e.g., index the batch dimension) so the evaluation matches the stated ASR definition.
| images[:, :, -trigger_size:, -trigger_size:] = trigger | |
| images[non_target_mask, :, -trigger_size:, -trigger_size:] = trigger |
|
|
||
| train_args: | ||
| federated_optimizer: "FedAvg" | ||
| client_id_list: |
There was a problem hiding this comment.
client_id_list: is present but has no value, which YAML parses as null. If the runtime expects a list (even an empty one), this can break config parsing or downstream logic. Prefer removing the key entirely or setting it explicitly (e.g., client_id_list: [], or providing the intended list). This same pattern appears across multiple newly added configs.
| client_id_list: | |
| client_id_list: [] |
| val_loader = _seeded_dataloader(val_subset, batch_size, True, seed, num_workers) | ||
| trust_loader = _seeded_dataloader(trust_subset, batch_size, True, seed + 1, num_workers) |
There was a problem hiding this comment.
For val_loader/trust_loader, shuffle=True combined with a shared torch.Generator means the sample order will change on each new iterator creation (the generator state advances), which undermines run-to-run determinism and can change GA/validation behavior across rounds. If these loaders are used as fixed server reference sets, set shuffle=False (recommended) or re-seed/recreate the generator per epoch/round to keep ordering stable.
| val_loader = _seeded_dataloader(val_subset, batch_size, True, seed, num_workers) | |
| trust_loader = _seeded_dataloader(trust_subset, batch_size, True, seed + 1, num_workers) | |
| val_loader = _seeded_dataloader(val_subset, batch_size, False, seed, num_workers) | |
| trust_loader = _seeded_dataloader(trust_subset, batch_size, False, seed + 1, num_workers) |
| def _seeded_dataloader(dataset, batch_size, shuffle, seed, num_workers=0): | ||
| generator = torch.Generator() | ||
| generator.manual_seed(int(seed)) | ||
| return DataLoader( | ||
| dataset, | ||
| batch_size=batch_size, | ||
| shuffle=shuffle, | ||
| num_workers=num_workers, | ||
| generator=generator, | ||
| ) |
There was a problem hiding this comment.
When num_workers > 0, providing a generator alone typically doesn't fully seed Python/NumPy RNG state inside each worker, which can break determinism guarantees. Consider adding a worker_init_fn that seeds random, numpy, and torch per worker based on the base seed (and optionally setting persistent_workers appropriately) so deterministic modes remain deterministic even with multi-worker loading.
| from data.data_loader import load_shieldfl_data | ||
| from model.model_hub import create_model | ||
| from trainer.shieldfl_aggregator import ShieldFLAggregator | ||
| from trainer.verifl_v16_trainer import VeriFLv16Trainer | ||
| from utils.runtime import configure_runtime |
There was a problem hiding this comment.
These imports rely on data/, model/, and trainer/ being importable as top-level modules (usually via working-directory/PYTHONPATH setup). This is brittle and can also conflict with other modules named data in a larger environment. Prefer package-qualified imports (e.g., from shieldfl.data ...) or relative imports with execution via python -m ... to make the entrypoint robust across invocation contexts.
| from data.data_loader import load_shieldfl_data | |
| from model.model_hub import create_model | |
| from trainer.shieldfl_aggregator import ShieldFLAggregator | |
| from trainer.verifl_v16_trainer import VeriFLv16Trainer | |
| from utils.runtime import configure_runtime | |
| from .data.data_loader import load_shieldfl_data | |
| from .model.model_hub import create_model | |
| from .trainer.shieldfl_aggregator import ShieldFLAggregator | |
| from .trainer.verifl_v16_trainer import VeriFLv16Trainer | |
| from .utils.runtime import configure_runtime |
| def get_model_class(model_name: str): | ||
| if model_name not in MODEL_REGISTRY: | ||
| raise ValueError( | ||
| f"Unknown model '{model_name}'. Available: {list(MODEL_REGISTRY.keys())}" |
There was a problem hiding this comment.
For more stable/debug-friendly output, consider emitting a deterministic ordering of available models (e.g., sorted keys) in the error message. This avoids minor noise if the registry definition changes insertion order over time and makes logs easier to diff across runs.
| f"Unknown model '{model_name}'. Available: {list(MODEL_REGISTRY.keys())}" | |
| f"Unknown model '{model_name}'. Available: {sorted(MODEL_REGISTRY)}" |
No description provided.