Skip to content

Exp/n50 fedavg baseline 20260411#2263

Open
lvzaixian wants to merge 28 commits intoFedML-AI:masterfrom
YKDZ:exp/n50-fedavg-baseline-20260411
Open

Exp/n50 fedavg baseline 20260411#2263
lvzaixian wants to merge 28 commits intoFedML-AI:masterfrom
YKDZ:exp/n50-fedavg-baseline-20260411

Conversation

@lvzaixian
Copy link
Copy Markdown

No description provided.

YKDZ and others added 28 commits March 23, 2026 13:07
- M2 priority subset: 36/36 complete (byzantine/label_flipping/model_replacement)
- M3 priority subset: 36/36 complete (krum/trimmed_mean/VeriFL)
- M1 baseline epochs=5: 5/24 (terminated, insufficient improvement)
- M1 epochs=1 results archived to old_epochs1/
- Updated scripts with GPU support, error handling, label_flipping params
按《M1 闭环学术需求.md》冻结配方完整运行 26 个实验:
- CIFAR-10 (ResNet18): 4/4 α 配置通过 (α=0.1 MA=56.43%≥55%, std偏高已用5 seeds)
- MNIST (LeNet5): 4/4 全部通过 (最低 97.55%≥90%)
- M1 总判定: ✅ 达标,可进入 M2

关键修正:
- run_experiment.sh: server_lr 0.3→1.0, weight_decay 按数据集区分, server_momentum→0
- E=1 (从旧版 E=5 改回), 符合 Fang et al. 标准 FedAvg 设定
- α=0.1 CIFAR-10 增至 5 seeds 应对高方差
- M1.5_EXPERIMENT_REPORT.md: code fixes (WI-1~9), seed test (WI-10), M1 rerun results (WI-11)
- 24 JSONL: 8 configs × 3 seeds (ResNet18/CIFAR-10 + LeNet5/MNIST, α∈{0.1,0.3,0.5,100})
- 24 YAML: persistent experiment configs under results/configs/
- 48 P1 JSONL results (24 LF + 24 SA main experiments)
- 3 SA gamma10 archive files (CIFAR-10 alpha=0.5 before gamma=1 overwrite)
- 48 failed maxsamples300 archive (first run with truncated data)
- 51 YAML configs for all P1 experiments
- Phase 0 smoke test archive (2 experiments)
- All results under python/examples/federate/prebuilt_jobs/shieldfl/results/
Changes:
- model_replacement_backdoor_attack.py: add scale_gamma='auto' support
  (gamma = total_samples / malicious_samples, ensures gamma*w_m = 1.0)
- run_p1.5_sa_gpu.sh: batch script for 35 P1.5 experiments
- Configs updated: attack_rounds=[99]/[49], scale_gamma=auto
- MNIST seed 3/4 added (D-P1.5-3)

Results (AC 7/11 PASS, 4/11 FAIL):
- CIFAR-10: 4/4 alpha mean ASR >= 0.80 (0.896~0.965)
- CIFAR-10: 1/12 NaN (alpha=0.1 seed=0, gamma=22.94, BN cascade)
- MNIST: 0/4 alpha median ASR >= 0.80 (single-round insufficient for LeNet5)
- Causality: 3/3 control groups delta_ASR >= 0.30
- All 4 failures are academic design limitations, not code bugs

Docs:
- M2_SA_FIX_PHASE1_ERROR_FIX.md: P1.5 implementation spec
- M2_SA_FIX_PHASE1_ERROR2.md: experiment report + code audit
…port

ShieldFL Scaling Attack (Model Replacement) formal experiment suite:
- 28 experiment groups: 9 CIFAR-10 + 15 MNIST attack + 2 γ=1 control + 2 baselines
- N=50 clients, FedAvg, PMR=20%, α∈{0.1,0.5,100}, 5 seeds each

Code changes:
- eval/metrics.py: add gamma_tag in JSONL filename to prevent collision
- trainer/shieldfl_aggregator.py: read gamma_actual via attacker.last_gamma
- model_replacement_backdoor_attack.py: persist self.last_gamma after scaling
- fedml_aggregator.py: fix GPU OOM in aggregation
- fedml/__init__.py: fix cpu_transfer MPI override

Artifacts:
- 26 JSONL result files + 28 YAML configs + batch_done.txt
- N50_EXPERIMENT_REPORT.md: full AC verification & code audit (§八)
- N50_EXPERIMENT_GUIDE.md: experiment execution guide
- M2_SA_FIX_PHASE1_ERROR2_FIX.md: 14 decisions formal spec
- .gitignore: exclude __pycache__, batch_logs, archives

AC results: AC-A-1/A-2 PASS, AC-A-3 FAIL (NaN), AC-A-7 FAIL (ΔASR)
Code audit confirms failures are academic design issues, not code bugs.
- Replace 17 if-elif branches in FedMLDefender with _DefenseRegistry
  (lazy-loaded, phase-declared, alias-supported)
- Add FedMLDefender.register_defense() public API for external defenses
- Explicit state reset in init() to prevent cross-experiment leakage
- Rename VeriFL → VeriFLv16 (aggregator, trainer, classes, config values)
- Fix leftover VeriFLTrainer._NORM_PARAMS reference in v16 trainer
- Add smoke tests (T1-T6 unit + T7 e2e GPU)
- Add PLUGGABLE_DEFENSE_CHANGELOG.md
Copilot AI review requested due to automatic review settings April 11, 2026 01:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds ShieldFL “Phase 1 / N=50” experiment assets to the FedML prebuilt job, including model registry support, deterministic data loading, structured metrics/ASR evaluation, GPU mapping, and a large set of experiment configuration snapshots and run documentation.

Changes:

  • Add ShieldFL prebuilt job entrypoint plus model hub (LeNet5/ResNet18/ResNet20/SimpleCNN) and dataset loader for MNIST/CIFAR-10.
  • Add evaluation utilities for structured JSONL metrics logging and ASR (backdoor success rate) computation.
  • Add many experiment YAML configs (FedAvg baselines + attack/label-flip variants) and Markdown run reports/guides.

Reviewed changes

Copilot reviewed 82 out of 655 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed4.yaml Adds N=50 MNIST model-replacement config snapshot (seed 4).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed3.yaml Adds N=50 MNIST model-replacement config snapshot (seed 3).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed2.yaml Adds N=50 MNIST model-replacement config snapshot (seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed1.yaml Adds N=50 MNIST model-replacement config snapshot (seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.2_seed0.yaml Adds N=50 MNIST model-replacement config snapshot (seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed4.yaml Adds MNIST model-replacement config snapshot (PMR 0.1, seed 4).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed3.yaml Adds MNIST model-replacement config snapshot (PMR 0.1, seed 3).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed2.yaml Adds MNIST model-replacement config snapshot (PMR 0.1, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed1.yaml Adds MNIST model-replacement config snapshot (PMR 0.1, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atkmodel_replacement_defnone_a0.1_pmr0.1_seed0.yaml Adds MNIST model-replacement config snapshot (PMR 0.1, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed2.yaml Adds MNIST label-flipping config snapshot (α=100, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed1.yaml Adds MNIST label-flipping config snapshot (α=100, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a100_pmr0.3_seed0.yaml Adds MNIST label-flipping config snapshot (α=100, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed2.yaml Adds MNIST label-flipping config snapshot (α=0.5, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed1.yaml Adds MNIST label-flipping config snapshot (α=0.5, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.5_pmr0.3_seed0.yaml Adds MNIST label-flipping config snapshot (α=0.5, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed2.yaml Adds MNIST label-flipping config snapshot (α=0.3, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed1.yaml Adds MNIST label-flipping config snapshot (α=0.3, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.3_pmr0.3_seed0.yaml Adds MNIST label-flipping config snapshot (α=0.3, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed2.yaml Adds MNIST label-flipping config snapshot (α=0.1, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed1.yaml Adds MNIST label-flipping config snapshot (α=0.1, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_shieldfl_atklabel_flipping_defnone_a0.1_pmr0.3_seed0.yaml Adds MNIST label-flipping config snapshot (α=0.1, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed2.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed1.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a100_pmr0.0_seed0.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=100, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed2.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed1.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.5_pmr0.0_seed0.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.5, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed2.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed1.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.3_pmr0.0_seed0.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.3, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed2.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 2).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed1.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 1).
python/examples/federate/prebuilt_jobs/shieldfl/results/configs/config_LeNet5_mnist_fedavg_atknone_defnone_a0.1_pmr0.0_seed0.yaml Adds MNIST FedAvg no-attack baseline config snapshot (α=0.1, seed 0).
python/examples/federate/prebuilt_jobs/shieldfl/model/simple_cnn.py Adds SimpleCNN model implementation.
python/examples/federate/prebuilt_jobs/shieldfl/model/resnet20.py Adds ResNet20 model implementation.
python/examples/federate/prebuilt_jobs/shieldfl/model/resnet18.py Adds CIFAR-10-adapted ResNet18 implementation.
python/examples/federate/prebuilt_jobs/shieldfl/model/model_hub.py Adds model registry + factory for config-driven model creation.
python/examples/federate/prebuilt_jobs/shieldfl/model/lenet5.py Adds MNIST LeNet5 implementation.
python/examples/federate/prebuilt_jobs/shieldfl/model/init.py Exposes model creation helpers.
python/examples/federate/prebuilt_jobs/shieldfl/main_fedml_shieldfl.py Adds ShieldFL FedML entrypoint wiring data/model/trainer/aggregator/runner.
python/examples/federate/prebuilt_jobs/shieldfl/eval/metrics.py Adds structured JSONL metrics collector and metadata capture.
python/examples/federate/prebuilt_jobs/shieldfl/eval/asr.py Adds ASR evaluation + trigger normalization helper.
python/examples/federate/prebuilt_jobs/shieldfl/data/data_loader.py Adds deterministic dataset loading + stratified val/trust splits + Dirichlet partitioning.
python/examples/federate/prebuilt_jobs/shieldfl/data/init.py Exposes data loader API.
python/examples/federate/prebuilt_jobs/shieldfl/config/gpu_mapping.yaml Adds GPU mapping presets (incl. N=50 isolation mapping).
python/examples/federate/prebuilt_jobs/shieldfl/init.py Adds package marker for ShieldFL prebuilt job.
python/examples/federate/prebuilt_jobs/shieldfl/.gitignore Ignores generated logs/results/temp configs.
RUN_RECORD_PHASE1.md Adds Phase 1 run record evidence and outcomes.
PLUGGABLE_DEFENSE_CHANGELOG.md Adds defense-registry refactor + VeriFL v16 rename changelog.
PHASE1.md Adds Phase 1 implementation checklist/spec.
N50_EXPERIMENT_REPORT.md Adds N=50 experiment analysis report.
N50_EXPERIMENT_GUIDE.md Adds N=50 execution guide and operational notes.
M2_SA_REPORT.md Adds M2 scaling-attack report summary.
M2_SA_FIX_PHASE1_ERROR2.md Adds P1.5 scaling-attack report with audit notes.
M2_LF_EXPERIMENT_REPORT.md Adds label-flipping attack acceptance report.
M1_EXPERIMENT_REPORT.md Adds M1 baseline report.
M1_EXPERIMENT_PARAMS.md Adds frozen parameter recipe for M1.
M1_ACADEMIC_CONSISTENCY.md Adds M1 academic consistency report.
M1.5_EXPERIMENT_REPORT.md Adds M1.5 fixes + rerun acceptance report.
LF 复现与验收/威胁模型.md Adds threat model writing skeleton and constraints.
LF 复现与验收/attack_list.md Adds prioritized attack reproduction list and rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

non_target_mask = (labels != target_label)
if non_target_mask.sum() == 0:
continue
images[:, :, -trigger_size:, -trigger_size:] = trigger
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger is currently injected into all images in the batch, including samples where label == target_label, but the docstring states the trigger should be injected only for label != target_label. Apply the trigger only to the subset indexed by non_target_mask (e.g., index the batch dimension) so the evaluation matches the stated ASR definition.

Suggested change
images[:, :, -trigger_size:, -trigger_size:] = trigger
images[non_target_mask, :, -trigger_size:, -trigger_size:] = trigger

Copilot uses AI. Check for mistakes.

train_args:
federated_optimizer: "FedAvg"
client_id_list:
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client_id_list: is present but has no value, which YAML parses as null. If the runtime expects a list (even an empty one), this can break config parsing or downstream logic. Prefer removing the key entirely or setting it explicitly (e.g., client_id_list: [], or providing the intended list). This same pattern appears across multiple newly added configs.

Suggested change
client_id_list:
client_id_list: []

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +272
val_loader = _seeded_dataloader(val_subset, batch_size, True, seed, num_workers)
trust_loader = _seeded_dataloader(trust_subset, batch_size, True, seed + 1, num_workers)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For val_loader/trust_loader, shuffle=True combined with a shared torch.Generator means the sample order will change on each new iterator creation (the generator state advances), which undermines run-to-run determinism and can change GA/validation behavior across rounds. If these loaders are used as fixed server reference sets, set shuffle=False (recommended) or re-seed/recreate the generator per epoch/round to keep ordering stable.

Suggested change
val_loader = _seeded_dataloader(val_subset, batch_size, True, seed, num_workers)
trust_loader = _seeded_dataloader(trust_subset, batch_size, True, seed + 1, num_workers)
val_loader = _seeded_dataloader(val_subset, batch_size, False, seed, num_workers)
trust_loader = _seeded_dataloader(trust_subset, batch_size, False, seed + 1, num_workers)

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +34
def _seeded_dataloader(dataset, batch_size, shuffle, seed, num_workers=0):
generator = torch.Generator()
generator.manual_seed(int(seed))
return DataLoader(
dataset,
batch_size=batch_size,
shuffle=shuffle,
num_workers=num_workers,
generator=generator,
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When num_workers > 0, providing a generator alone typically doesn't fully seed Python/NumPy RNG state inside each worker, which can break determinism guarantees. Consider adding a worker_init_fn that seeds random, numpy, and torch per worker based on the base seed (and optionally setting persistent_workers appropriately) so deterministic modes remain deterministic even with multi-worker loading.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +8
from data.data_loader import load_shieldfl_data
from model.model_hub import create_model
from trainer.shieldfl_aggregator import ShieldFLAggregator
from trainer.verifl_v16_trainer import VeriFLv16Trainer
from utils.runtime import configure_runtime
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports rely on data/, model/, and trainer/ being importable as top-level modules (usually via working-directory/PYTHONPATH setup). This is brittle and can also conflict with other modules named data in a larger environment. Prefer package-qualified imports (e.g., from shieldfl.data ...) or relative imports with execution via python -m ... to make the entrypoint robust across invocation contexts.

Suggested change
from data.data_loader import load_shieldfl_data
from model.model_hub import create_model
from trainer.shieldfl_aggregator import ShieldFLAggregator
from trainer.verifl_v16_trainer import VeriFLv16Trainer
from utils.runtime import configure_runtime
from .data.data_loader import load_shieldfl_data
from .model.model_hub import create_model
from .trainer.shieldfl_aggregator import ShieldFLAggregator
from .trainer.verifl_v16_trainer import VeriFLv16Trainer
from .utils.runtime import configure_runtime

Copilot uses AI. Check for mistakes.
def get_model_class(model_name: str):
if model_name not in MODEL_REGISTRY:
raise ValueError(
f"Unknown model '{model_name}'. Available: {list(MODEL_REGISTRY.keys())}"
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more stable/debug-friendly output, consider emitting a deterministic ordering of available models (e.g., sorted keys) in the error message. This avoids minor noise if the registry definition changes insertion order over time and makes logs easier to diff across runs.

Suggested change
f"Unknown model '{model_name}'. Available: {list(MODEL_REGISTRY.keys())}"
f"Unknown model '{model_name}'. Available: {sorted(MODEL_REGISTRY)}"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants