Skip to content

add deepseek_r1_distill_qwen-32b_in_process llm config#355

Draft
ArneBinder wants to merge 5 commits into
mainfrom
llm/add-deepseek_r1_distill_qwen-32b_in_process
Draft

add deepseek_r1_distill_qwen-32b_in_process llm config#355
ArneBinder wants to merge 5 commits into
mainfrom
llm/add-deepseek_r1_distill_qwen-32b_in_process

Conversation

@ArneBinder
Copy link
Copy Markdown
Contributor

@ArneBinder ArneBinder commented Feb 5, 2026

this implements #354

EDIT: We should wait for #282 #397 since the new llm seems to work just with max_model_len=65536 which causes much more "too long" errors. But also this needs to be verified (i.e. run again with full max_model_len of 128k, but exclude H100-TRAILS).

@ArneBinder ArneBinder self-assigned this Feb 5, 2026
@ArneBinder
Copy link
Copy Markdown
Contributor Author

ArneBinder commented Feb 5, 2026

test it

./run_in_process.sh \
-pa "H100-SLT,H100-Trails,H100,A100-80GB" \
-u "-m kibad_llm.predict \
name=355_faktencheck_core_with_persona \
experiment/predict=faktencheck_core_fields_schema_with_evidence \
pdf_directory=/ds/text/kiba-d/dev-set-100 \
extractor/llm=deepseek_r1_distill_qwen-32b_in_process \
seed=42,1337,7331 \
--multirun"

start @ screen -r kibad-llm

JOB_NAME kiba-d_cd4bcd12-5d01-417b-8fc4-d39825edba2b
=============================================
srun: jobinfo: version v1.0.0
srun: Required node not available (down, drained or reserved)
srun: job 2513133 queued and waiting for resources

Monitor this job here: http://monitoring.pegasus.kl.dfki.de/d/slurm-job-details/job-details?var-jobid=2513133&from=1770253012000

crashed with OOM, restart with max_model_len: 32768 (as recommend in HF docu)

JOB_NAME kiba-d_96821ca4-d984-439d-9af7-fdd42f971ca1
=============================================
srun: jobinfo: version v1.0.0
srun: Required node not available (down, drained or reserved)
srun: job 2513141 queued and waiting for resources
srun: job 2513141 has been allocated resources
Job 2513141: Running on node(s) serv-3342

Monitor this job here: http://monitoring.pegasus.kl.dfki.de/d/slurm-job-details/job-details?var-jobid=2513141&from=1770253776000

"no reasonign parser configure...", cancel job.
restart with reasoning_parser: "deepseek_r1"

JOB_NAME kiba-d_6f690001-6873-4e2a-a3b0-3806331af5bb
=============================================
srun: jobinfo: version v1.0.0
srun: Required node not available (down, drained or reserved)
srun: job 2513143 queued and waiting for resources
srun: job 2513143 has been allocated resources
Job 2513143: Running on node(s) serv-3342

Monitor this job here: http://monitoring.pegasus.kl.dfki.de/d/slurm-job-details/job-details?var-jobid=2513143&from=1770253997000

[2026-02-05 03:51:01,459][HYDRA] Contents of /netscratch/binder/projects/kibad-llm/logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/job_return_value.md:

click to see content
branch commit_hash is_dirty output_file output_file_absolute overrides.experiment/predict overrides.extractor/llm overrides.name overrides.pdf_directory overrides.seed time_extraction time_pdf_conversion
seed=1337 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-45-25_225700/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-45-25_225700/predictions.jsonl faktencheck_core_fields_schema_with_evidence deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 1337 2026.33 0.0049665
seed=42 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-13-25_545878/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-13-25_545878/predictions.jsonl faktencheck_core_fields_schema_with_evidence deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 42 1831.31 0.00509654
seed=7331 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_03-20-41_715822/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_03-20-41_715822/predictions.jsonl faktencheck_core_fields_schema_with_evidence deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 7331 1740.66 0.00519758

f1

uv run -m kibad_llm.evaluate \
name=355_faktencheck_core_with_persona  \
experiment/evaluate=faktencheck_core_f1_micro_flat \
prediction_logs=logs/355_faktencheck_core_with_persona \
+hydra.callbacks.save_job_return.multirun_markdown_group_by=prediction.overrides.extractor/llm \
--multirun

[2026-02-05 03:59:39,685][HYDRA] Contents of /netscratch/binder/projects/kibad-llm/logs/355_faktencheck_core_with_persona/evaluate/multiruns/2026-02-05_03-59-37/job_return_value.md:

click to see result
prediction.overrides.extractor/llm ALL.f1.mean ALL.f1.std ALL.precision.mean ALL.precision.std ALL.recall.mean ALL.recall.std ALL.support.mean ALL.support.std AVG.f1.mean AVG.f1.std AVG.precision.mean AVG.precision.std AVG.recall.mean AVG.recall.std AVG.support.mean AVG.support.std biodiversity_level.f1.mean biodiversity_level.f1.std biodiversity_level.precision.mean biodiversity_level.precision.std biodiversity_level.recall.mean biodiversity_level.recall.std biodiversity_level.support.mean biodiversity_level.support.std ecosystem_type.term.f1.mean ecosystem_type.term.f1.std ecosystem_type.term.precision.mean ecosystem_type.term.precision.std ecosystem_type.term.recall.mean ecosystem_type.term.recall.std ecosystem_type.term.support.mean ecosystem_type.term.support.std habitat.f1.mean habitat.f1.std habitat.precision.mean habitat.precision.std habitat.recall.mean habitat.recall.std habitat.support.mean habitat.support.std prediction.job_return_value.time_extraction.mean prediction.job_return_value.time_extraction.std prediction.job_return_value.time_pdf_conversion.mean prediction.job_return_value.time_pdf_conversion.std taxa.german_name.f1.mean taxa.german_name.f1.std taxa.german_name.precision.mean taxa.german_name.precision.std taxa.german_name.recall.mean taxa.german_name.recall.std taxa.german_name.support.mean taxa.german_name.support.std taxa.scientific_name.f1.mean taxa.scientific_name.f1.std taxa.scientific_name.precision.mean taxa.scientific_name.precision.std taxa.scientific_name.recall.mean taxa.scientific_name.recall.std taxa.scientific_name.support.mean taxa.scientific_name.support.std taxa.species_group.f1.mean taxa.species_group.f1.std taxa.species_group.precision.mean taxa.species_group.precision.std taxa.species_group.recall.mean taxa.species_group.recall.std taxa.species_group.support.mean taxa.species_group.support.std overrides.dataset.predictions.log overrides.experiment/evaluate overrides.name overrides.prediction_logs prediction.job_return_value.branch prediction.job_return_value.commit_hash prediction.job_return_value.is_dirty prediction.job_return_value.output_file prediction.job_return_value.output_file_absolute prediction.overrides.experiment/predict prediction.overrides.name prediction.overrides.pdf_directory prediction.overrides.seed
deepseek_r1_distill_qwen-32b_in_process 0.216 0.008 0.298 0.013 0.169 0.009 792 0 0.233 0.011 0.314 0.016 0.198 0.013 132 0 0.211 0.034 0.204 0.031 0.219 0.038 67 0 0.2 0.043 0.176 0.039 0.233 0.047 53 0 0.362 0.017 0.605 0.027 0.258 0.015 138 0 1866.1 145.975 0.005 0 0.043 0.011 0.087 0.03 0.029 0.007 231 0 0.236 0.037 0.331 0.047 0.184 0.036 197 0 0.343 0.023 0.48 0.028 0.267 0.02 106 0 ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/0', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/1', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/2'] ['faktencheck_core_f1_micro_flat', 'faktencheck_core_f1_micro_flat', 'faktencheck_core_f1_micro_flat'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['logs/355_faktencheck_core_with_persona', 'logs/355_faktencheck_core_with_persona', 'logs/355_faktencheck_core_with_persona'] ['llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process'] ['41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555'] [np.False_, np.False_, np.False_] ['predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-13-25_545878/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-45-25_225700/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_03-20-41_715822/predictions.jsonl'] ['/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-13-25_545878/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_02-45-25_225700/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_02-13-25/2026-02-05_03-20-41_715822/predictions.jsonl'] ['faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100'] ['42', '1337', '7331']

error

uv run -m kibad_llm.evaluate \
name=355_faktencheck_core_with_persona  \
experiment/evaluate=prediction_errors \
prediction_logs=logs/355_faktencheck_core_with_persona \
+hydra.callbacks.save_job_return.multirun_markdown_group_by=prediction.overrides.extractor/llm \
--multirun
click to see result
prediction.overrides.extractor/llm prediction.job_return_value.time_pdf_conversion prediction.job_return_value.time_extraction no_error with_error ValueError JSONDecodeError prediction.job_return_value.commit_hash prediction.job_return_value.branch prediction.job_return_value.is_dirty prediction.overrides.name prediction.overrides.experiment/predict prediction.overrides.pdf_directory prediction.overrides.seed overrides.dataset.predictions.log overrides.name overrides.experiment/evaluate overrides.prediction_logs
deepseek_r1_distill_qwen-32b_in_process 0.00508688 1866.1 60.6667 39.3333 39 1 ['41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555'] ['llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process'] [np.False_, np.False_, np.False_] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence'] ['/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100'] ['42', '1337', '7331'] ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/0', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/1', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_02-13-25/2'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['prediction_errors', 'prediction_errors', 'prediction_errors'] ['logs/355_faktencheck_core_with_persona', 'logs/355_faktencheck_core_with_persona', 'logs/355_faktencheck_core_with_persona']

@ArneBinder
Copy link
Copy Markdown
Contributor Author

ArneBinder commented Feb 5, 2026

max_model_len=65536

./run_in_process.sh \
-pa "H100-SLT,H100-Trails,H100,A100-80GB" \
-u "-m kibad_llm.predict \
name=355_faktencheck_core_with_persona \
experiment/predict=faktencheck_core_fields_schema_with_evidence \
pdf_directory=/ds/text/kiba-d/dev-set-100 \
extractor/llm=deepseek_r1_distill_qwen-32b_in_process \
extractor.llm.vllm_kwargs.max_model_len=65536 \
seed=42,1337,7331 \
--multirun"

JOB_NAME kiba-d_e124fe18-4f92-44ff-a286-45c819d2779e
=============================================
srun: jobinfo: version v1.0.0
srun: Required node not available (down, drained or reserved)
srun: job 2513297 queued and waiting for resources

[2026-02-05 06:15:15,219][HYDRA] Contents of /netscratch/binder/projects/kibad-llm/logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/job_return_value.md:

click to see content
branch commit_hash is_dirty output_file output_file_absolute overrides.experiment/predict overrides.extractor.llm.vllm_kwargs.max_model_len overrides.extractor/llm overrides.name overrides.pdf_directory overrides.seed time_extraction time_pdf_conversion
seed=1337 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl faktencheck_core_fields_schema_with_evidence 65536 deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 1337 2091.4 0.00252892
seed=42 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl faktencheck_core_fields_schema_with_evidence 65536 deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 42 2413.82 0.0211366
seed=7331 llm/add-deepseek_r1_distill_qwen-32b_in_process 41d6de7 False predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl /netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl faktencheck_core_fields_schema_with_evidence 65536 deepseek_r1_distill_qwen-32b_in_process 355_faktencheck_core_with_persona /ds/text/kiba-d/dev-set-100 7331 2061.69 0.00323695

metrics

uv run -m kibad_llm.evaluate \
name=355_faktencheck_core_with_persona  \
experiment/evaluate=faktencheck_core_f1_micro_flat \
prediction_logs=logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19 \
+hydra.callbacks.save_job_return.multirun_markdown_group_by=prediction.overrides.extractor/llm \
--multirun

[2026-02-05 11:10:26,774][HYDRA] Contents of /netscratch/binder/projects/kibad-llm/logs/355_faktencheck_core_with_persona/evaluate/multiruns/2026-02-05_11-10-23/job_return_value.md:

click to see results
prediction.overrides.extractor/llm ALL.f1.mean ALL.f1.std ALL.precision.mean ALL.precision.std ALL.recall.mean ALL.recall.std ALL.support.mean ALL.support.std AVG.f1.mean AVG.f1.std AVG.precision.mean AVG.precision.std AVG.recall.mean AVG.recall.std AVG.support.mean AVG.support.std biodiversity_level.f1.mean biodiversity_level.f1.std biodiversity_level.precision.mean biodiversity_level.precision.std biodiversity_level.recall.mean biodiversity_level.recall.std biodiversity_level.support.mean biodiversity_level.support.std ecosystem_type.term.f1.mean ecosystem_type.term.f1.std ecosystem_type.term.precision.mean ecosystem_type.term.precision.std ecosystem_type.term.recall.mean ecosystem_type.term.recall.std ecosystem_type.term.support.mean ecosystem_type.term.support.std habitat.f1.mean habitat.f1.std habitat.precision.mean habitat.precision.std habitat.recall.mean habitat.recall.std habitat.support.mean habitat.support.std prediction.job_return_value.time_extraction.mean prediction.job_return_value.time_extraction.std prediction.job_return_value.time_pdf_conversion.mean prediction.job_return_value.time_pdf_conversion.std taxa.german_name.f1.mean taxa.german_name.f1.std taxa.german_name.precision.mean taxa.german_name.precision.std taxa.german_name.recall.mean taxa.german_name.recall.std taxa.german_name.support.mean taxa.german_name.support.std taxa.scientific_name.f1.mean taxa.scientific_name.f1.std taxa.scientific_name.precision.mean taxa.scientific_name.precision.std taxa.scientific_name.recall.mean taxa.scientific_name.recall.std taxa.scientific_name.support.mean taxa.scientific_name.support.std taxa.species_group.f1.mean taxa.species_group.f1.std taxa.species_group.precision.mean taxa.species_group.precision.std taxa.species_group.recall.mean taxa.species_group.recall.std taxa.species_group.support.mean taxa.species_group.support.std overrides.dataset.predictions.log overrides.experiment/evaluate overrides.name overrides.prediction_logs prediction.job_return_value.branch prediction.job_return_value.commit_hash prediction.job_return_value.is_dirty prediction.job_return_value.output_file prediction.job_return_value.output_file_absolute prediction.overrides.experiment/predict prediction.overrides.extractor.llm.vllm_kwargs.max_model_len prediction.overrides.name prediction.overrides.pdf_directory prediction.overrides.seed
deepseek_r1_distill_qwen-32b_in_process 0.215 0.019 0.255 0.037 0.186 0.01 792 0 0.232 0.016 0.277 0.026 0.217 0.013 132 0 0.202 0.023 0.17 0.018 0.249 0.031 67 0 0.157 0.019 0.122 0.014 0.22 0.029 53 0 0.454 0.031 0.631 0.04 0.355 0.025 138 0 2188.97 195.293 0.009 0.011 0.039 0.018 0.069 0.038 0.027 0.011 231 0 0.204 0.039 0.257 0.084 0.173 0.018 197 0 0.333 0.008 0.411 0.019 0.28 0.011 106 0 ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/0', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/1', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/2'] ['faktencheck_core_f1_micro_flat', 'faktencheck_core_f1_micro_flat', 'faktencheck_core_f1_micro_flat'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19'] ['llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process'] ['41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555'] [np.False_, np.False_, np.False_] ['predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl'] ['/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl'] ['faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence'] ['65536', '65536', '65536'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100'] ['42', '1337', '7331']

errors

uv run -m kibad_llm.evaluate \
name=355_faktencheck_core_with_persona  \
experiment/evaluate=prediction_errors \
prediction_logs=logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19 \
+hydra.callbacks.save_job_return.multirun_markdown_group_by=prediction.overrides.extractor/llm \
--multirun

[2026-02-05 11:12:55,302][HYDRA] Contents of /netscratch/binder/projects/kibad-llm/logs/355_faktencheck_core_with_persona/evaluate/multiruns/2026-02-05_11-12-54/job_return_value.md:

click to see results
prediction.overrides.extractor/llm ValueError.mean ValueError.std no_error.mean no_error.std prediction.job_return_value.time_extraction.mean prediction.job_return_value.time_extraction.std prediction.job_return_value.time_pdf_conversion.mean prediction.job_return_value.time_pdf_conversion.std with_error.mean with_error.std overrides.dataset.predictions.log overrides.experiment/evaluate overrides.name overrides.prediction_logs prediction.job_return_value.branch prediction.job_return_value.commit_hash prediction.job_return_value.is_dirty prediction.job_return_value.output_file prediction.job_return_value.output_file_absolute prediction.overrides.experiment/predict prediction.overrides.extractor.llm.vllm_kwargs.max_model_len prediction.overrides.name prediction.overrides.pdf_directory prediction.overrides.seed
deepseek_r1_distill_qwen-32b_in_process 25 0 75 0 2188.97 195.293 0.009 0.011 25 0 ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/0', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/1', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19/2'] ['prediction_errors', 'prediction_errors', 'prediction_errors'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19', 'logs/355_faktencheck_core_with_persona/predict/multiruns/2026-02-05_04-19-19'] ['llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process', 'llm/add-deepseek_r1_distill_qwen-32b_in_process'] ['41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555', '41d6de70cc55673611d916bdc090a53e4040e555'] [np.False_, np.False_, np.False_] ['predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl', 'predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl'] ['/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_04-19-20_188148/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-03-37_378682/predictions.jsonl', '/netscratch/binder/projects/kibad-llm/predictions/355_faktencheck_core_with_persona/2026-02-05_04-19-19/2026-02-05_05-39-45_174485/predictions.jsonl'] ['faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence', 'faktencheck_core_fields_schema_with_evidence'] ['65536', '65536', '65536'] ['355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona', '355_faktencheck_core_with_persona'] ['/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100', '/ds/text/kiba-d/dev-set-100'] ['42', '1337', '7331']

@ArneBinder ArneBinder marked this pull request as draft February 5, 2026 09:35
@ArneBinder ArneBinder added the enhancement New feature or request label Mar 31, 2026
@ArneBinder ArneBinder force-pushed the llm/add-deepseek_r1_distill_qwen-32b_in_process branch from 5156b18 to e3f43c4 Compare April 27, 2026 11:54
@ArneBinder ArneBinder force-pushed the llm/add-deepseek_r1_distill_qwen-32b_in_process branch from a3cdaf6 to 5ef5797 Compare April 30, 2026 14:59
@ArneBinder
Copy link
Copy Markdown
Contributor Author

ArneBinder commented Apr 30, 2026

with chunking

./run_in_process.sh \
-sr \
-r llm/add-deepseek_r1_distill_qwen-32b_in_process \
-pa "H100-SLT,H100-Trails,H100,A100-80GB" \
-u "-m kibad_llm.predict \
name=355_faktencheck_core \
experiment/predict=faktencheck_core_fields_schema_with_chunking \
pdf_directory=/ds/text/kiba-d/dev-set-100 \
extractor/llm=deepseek_r1_distill_qwen-32b_in_process \
seed=42,1337,7331 \
--multirun"

started at screen -r kibad-llm-5

>>> Syncing git refs (git fetch --prune --tags) in /netscratch/binder/projects/kibad-llm
From github-kibad-llm:DFKI-NLP/kibad-llm
 - [deleted]           (none)     -> origin/build_schema_description/improve-newline-handling
 - [deleted]           (none)     -> origin/feat/mkdocs-auto-docs
remote: Enumerating objects: 34, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 34 (delta 17), reused 29 (delta 13), pack-reused 0 (from 0)
Unpacking objects: 100% (34/34), 5.24 KiB | 4.00 KiB/s, done.
   096090bf..c988c775  prediction_results/add-organism_trends_with_chunking -> origin/prediction_results/add-organism_trends_with_chunking
 + a3cdaf67...5ef57979 llm/add-deepseek_r1_distill_qwen-32b_in_process -> origin/llm/add-deepseek_r1_distill_qwen-32b_in_process  (forced update)
>>> Validating git ref: llm/add-deepseek_r1_distill_qwen-32b_in_process                                                                                             
=============================================                                                                                                                       
>>> USING PARTITION H100-SLT,H100-Trails,H100,A100-80GB                                                                                                             
>>> MAX TIME 1-00:00:00                                                                                                                                             
>>> SUBMITTED Thu Apr 30 05:05:05 PM CEST 2026                                                                                                                      
>>> UV_ARGS --cache-dir /netscratch/binder/cache/uv -m kibad_llm.predict name=355_faktencheck_core experiment/predict=faktencheck_core_fields_schema_with_chunking pdf_directory=/ds/text/kiba-d/dev-set-100 extractor/llm=deepseek_r1_distill_qwen-32b_in_process seed=42,1337,7331 --multirun                                         
>>> JOB_NAME kiba-d_fc63e910-202a-4507-b01e-a26e09834766                                                                                                            
>>> GIT_REF llm/add-deepseek_r1_distill_qwen-32b_in_process                                                                                                         
=============================================                                                                                                                       
srun: jobinfo: version v1.0.0                                                                                                                                       
srun: job 2869628 queued and waiting for resources                                                                                                                  
srun: job 2869628 has been allocated resources                                                                                                                      
Job 2869628: Running on node(s) serv-3310                                                                                                                          
Job 2869628: Started at 2026-04-30 18:00:12+0200                                                                                                                    
Monitor this job here: http://monitoring.pegasus.kl.dfki.de/d/slurm-job-details/job-details?var-jobid=2869628&from=1777564812000                                    
>>> Using git ref: llm/add-deepseek_r1_distill_qwen-32b_in_process                                                                                                  
>>> Creating snapshot checkout in: /netscratch/binder/tmp/kiba-d_fc63e910-202a-4507-b01e-a26e09834766/repo                                                          
Cloning into '/netscratch/binder/tmp/kiba-d_fc63e910-202a-4507-b01e-a26e09834766/repo'...                                                                           
done.                                                                                                                                                               
fatal: ambiguous argument 'llm/add-deepseek_r1_distill_qwen-32b_in_process^{commit}': unknown revision or path not in the working tree.                             
Use '--' to separate paths from revisions, like this:                                                                                                               
'git <command> [<revision>...] -- [<file>...]'                                                                                                                      
srun: error: serv-3310: task 0: Exited with exit code 128                                               

crashed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant