This setup standardizes baseline inputs/outputs for local generation experiments.
- Repository root:
C:\Users\User\Desktop\DN-main - Python shim: not configured through
pyenv(pythonasks for a pyenv version) - Usable local Python:
C:\Users\User\Desktop\DN-main\.venv\Scripts\python.exe .venvPython:3.12.13.venvtorch:2.11.0+cpu- CUDA: unavailable from torch
nvidia-smi: not foundOPENAI_API_KEY: not setHF_TOKEN/HUGGINGFACE_HUB_TOKEN: not set
- Input sample:
data/input_samples.jsonl - Text outputs:
outputs/text/ - Image outputs:
outputs/image/
Input JSONL schema:
{"id":"sample_001","prompt":"...","characters":["..."],"scene":"...","story_context":""}| Baseline | Type | Source URL | Official | Local path | Current status | Smoke test |
|---|---|---|---|---|---|---|
| DOC | Text | https://github.com/facebookresearch/doc-storygen-v2 | Official, but skipped here | baselines/text/doc-storygen-v2 |
User confirmed existing local DOC experiments are complete; not re-run | Skipped by request |
| Rolling | Text | DOC rolling baseline script / CCI description | Not independent official repo | baselines/text/rolling |
Not implemented per user instruction | Skipped |
| w/oIG | Text | CCI ablation description | Not independent official repo | baselines/text/wo_ig |
Not implemented per user instruction | Skipped |
| w/oMW | Text | CCI ablation description | Not independent official repo | baselines/text/wo_mw |
Not implemented per user instruction | Skipped |
| SDM-v2 | Image | https://huggingface.co/stabilityai/stable-diffusion-2-1-base | Official model via Diffusers wrapper | baselines/image/sdm_v2 |
Wrapper supports full JSONL batch generation and low-VRAM flags; local .venv lacks diffusers; SSH GPU env has dependencies installed but model download needs authorized HF access/cache |
Blocked: no generated image; Hugging Face returned auth/cache error |
| StoryDiffusion | Image | https://github.com/HVision-NKU/StoryDiffusion | Official code | baselines/image/StoryDiffusion |
Official code copied from existing local repo; manifest wrapper implemented | Passed prepare-only manifest |
| IC-LoRA | Image | https://github.com/ali-vilab/In-Context-LoRA | Official repo/workflow | baselines/image/In-Context-LoRA |
Official workflow copied from existing local repo; ComfyUI workflow preparation implemented | Passed prepare-only manifest |
pip install -r baselines/image/sdm_v2/requirements.txtIf the model requires Hugging Face authorization:
set HF_TOKEN=your_huggingface_tokenPowerShell:
$env:HF_TOKEN="your_huggingface_token"On the SSH GPU machine used for this run, the working environment is:
ssh -p 14077 root@connect.cqa1.seetacloud.com
mkdir -p /root/autodl-tmp/baselines
cd /root/autodl-tmp/baselines
/root/miniconda3/envs/dn/bin/python -m venv --system-site-packages sdm_v2_venv
sdm_v2_venv/bin/python -m pip install diffusers transformers accelerate safetensorsThe venv reuses the existing CUDA torch from /root/miniconda3/envs/dn (torch 2.6.0+cu124) to avoid reinstalling multi-GB CUDA wheels. The model itself is not vendored in this repository. If it is not already cached, expect roughly 5-6 GB of Hugging Face downloads for stabilityai/stable-diffusion-2-1-base. If Hugging Face gates the repository, log in with a token that has accepted the model license/agreement.
Low-resource options:
- Use
--dtype float16only on CUDA. - Use
--attention_slicing. - Use
--cpu_offloadifaccelerateis installed. - Reduce
--num_inference_steps,--height, and--width. - Use
--overwriteonly when intentionally replacing an existingindex.jsonlor image files; by default the runner refuses to overwrite existing outputs.
Official requirements are in:
baselines/image/StoryDiffusion/requirements.txtOfficial low-VRAM interactive command:
cd baselines/image/StoryDiffusion
python gradio_app_sdxl_specific_id_low_vram.pyThe official release exposes Notebook/Gradio image generation rather than a stable batch CLI. The local wrapper currently prepares prompt manifests from the unified JSONL input.
The official repository provides training config, model zoo links, and a ComfyUI workflow. To generate images, prepare a ComfyUI FLUX environment with:
- FLUX base model, for example
flux1-dev.safetensors ali-vilab/In-Context-LoRALoRA weights, for examplemovie-shots.safetensorsae.safetensors- T5/CLIP text encoders such as
t5xxl_fp8_e4m3fn.safetensorsandclip_l.safetensors
The wrapper prepares per-sample workflow JSON files with the unified prompt and seed.
DOC/text placeholders:
python scripts/run_doc.py --input data/input_samples.jsonl --output outputs/text/doc.jsonl
python scripts/run_rolling.py --input data/input_samples.jsonl --output outputs/text/rolling.jsonl
python scripts/run_woig.py --input data/input_samples.jsonl --output outputs/text/woig.jsonl
python scripts/run_womw.py --input data/input_samples.jsonl --output outputs/text/womw.jsonlImage wrappers:
python scripts/run_sdm_v2.py --input data/input_samples.jsonl --output_dir outputs/image/sdm_v2 --seed 42 --num_inference_steps 10 --height 512 --width 512 --attention_slicing
python scripts/run_storydiffusion.py --input data/input_samples.jsonl --output_dir outputs/image/StoryDiffusion --prepare_only
python scripts/run_iclora.py --input data/input_samples.jsonl --output_dir outputs/image/In-Context-LoRA --prepare_only
python scripts/run_all_baselines.py --input data/input_samples.jsonlSSH GPU SDM-v2 smoke command:
cd /root/autodl-tmp/baselines/DN-main
export HF_HOME=/root/autodl-tmp/hf-cache
export HF_ENDPOINT=https://hf-mirror.com # optional mirror when huggingface.co is unreachable
/root/autodl-tmp/baselines/sdm_v2_venv/bin/python scripts/run_sdm_v2.py \
--input data/input_samples.jsonl \
--output_dir outputs/image/sdm_v2 \
--seed 42 \
--num_inference_steps 10 \
--height 512 \
--width 512 \
--attention_slicingExecuted with:
C:\Users\User\Desktop\DN-main\.venv\Scripts\python.exe scripts\run_storydiffusion.py --input data\input_samples.jsonl --output_dir outputs\image\StoryDiffusion --prepare_only --launch_hint
C:\Users\User\Desktop\DN-main\.venv\Scripts\python.exe scripts\run_iclora.py --input data\input_samples.jsonl --output_dir outputs\image\In-Context-LoRA --prepare_only
C:\Users\User\Desktop\DN-main\.venv\Scripts\python.exe scripts\run_sdm_v2.py --input data\input_samples.jsonl --output_dir outputs\image\sdm_v2 --local_files_only --num_inference_steps 1 --height 256 --width 256Results:
- StoryDiffusion prepare-only: passed; wrote
outputs/image/StoryDiffusion/index.jsonl. - IC-LoRA prepare-only: passed; wrote
outputs/image/In-Context-LoRA/index.jsonland per-sample workflow JSON underoutputs/image/In-Context-LoRA/workflows/. - SDM-v2 local generation: failed before model loading because
.venvdoes not havediffusers; local torch is CPU-only (2.11.0+cpu) and CUDA is unavailable. - SDM-v2 SSH GPU dependency check: passed in
/root/autodl-tmp/baselines/sdm_v2_venv; torch is CUDA-enabled (2.6.0+cu124, CUDA 12.4, RTX 4090 visible). - SDM-v2 SSH GPU smoke generation: blocked before generation;
huggingface.cowas unreachable from the GPU machine andhf-mirror.comreturned401 Unauthorizedforstabilityai/stable-diffusion-2-1-base. Nosample_001.pngor success index row was written. - Unified dispatcher: wrote
outputs/baseline_run_summary.json; exits non-zero while SDM-v2 dependencies are missing.
data/input_samples.jsonloutputs/image/StoryDiffusion/index.jsonloutputs/image/StoryDiffusion/sample_001.prompt.txtoutputs/image/In-Context-LoRA/index.jsonloutputs/image/In-Context-LoRA/workflows/sample_001.film-storyboard.workflow.jsonoutputs/baseline_run_summary.json
No generated image is claimed for StoryDiffusion or IC-LoRA in the current environment; their current outputs are preparation artifacts only.
No generated SDM-v2 image is claimed yet. The expected smoke-test outputs after a successful authorized model load are:
outputs/image/sdm_v2/sample_001.png
outputs/image/sdm_v2/index.jsonl
HF_TOKENorHUGGINGFACE_HUB_TOKENif Hugging Face requires authentication for SDM-v2 or FLUX-related assets.- For
stabilityai/stable-diffusion-2-1-base, accept the Hugging Face model license/agreement with the same account used by the token if the repository is gated. - CUDA GPU/VRAM for practical SDM-v2, StoryDiffusion, and IC-LoRA generation.
- StoryDiffusion dependency environment if using the official Gradio/Notebook flow.
- ComfyUI + FLUX + IC-LoRA weights for IC-LoRA image generation.
OPENAI_API_KEYonly if you later rerun DOC/OpenAI-dependent text generation; it was not used here.
Replace data/input_samples.jsonl with the full dataset JSONL using the same schema, then run:
python scripts/run_sdm_v2.py --input data/full_dataset.jsonl --output_dir outputs/image/sdm_v2 --seed 42 --num_inference_steps 20 --height 512 --width 512 --attention_slicing
python scripts/run_storydiffusion.py --input data/full_dataset.jsonl --output_dir outputs/image/StoryDiffusion --prepare_only
python scripts/run_iclora.py --input data/full_dataset.jsonl --output_dir outputs/image/In-Context-LoRA --prepare_onlyFor the SSH GPU environment:
cd /root/autodl-tmp/baselines/DN-main
export HF_HOME=/root/autodl-tmp/hf-cache
export HF_TOKEN=your_huggingface_token
/root/autodl-tmp/baselines/sdm_v2_venv/bin/python scripts/run_sdm_v2.py \
--input data/full_dataset.jsonl \
--output_dir outputs/image/sdm_v2 \
--seed 42 \
--num_inference_steps 20 \
--height 512 \
--width 512 \
--attention_slicingAfter installing image-generation dependencies and making the SDM-v2 weights available from Hugging Face or cache, rerun the corresponding wrapper or official workflow to produce actual images and JSONL indexes.