Skip to content

DFKI/SLM-Efficiency-EvalSuite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

SLM-Efficiency-EvalSuite

About the Paper

Title

Mapping the Efficiency Landscape of Small Language Models

Authors

Fabian Reichwald, Lukas Schiesser, Christiane Plociennik, Leonhard Kunz, Simon Pukrop, Martin Ruskowski, Oliver Thomas

Abstract

Large language models (LLMs) dominate both everyday and specialized applications, but their high computational demand, energy consumption, and privacy risks are increasingly critiqued. Small language models (SLMs) mitigate these drawbacks and are gaining momentum in scenarios where full LLM capabilities are not required, such as agents, industrial systems, or edge devices. Nevertheless, a systematic comparison of model capabilities, energy usage, and scaling behavior has not been conducted yet. We evaluate 70+ SLMs from 2023–2025 on five task-specific benchmarks and compare them with two popular LLMs, revealing key trade-offs between energy, performance, and model selection. Our findings challenge common assumptions: First, smaller models are not automatically more efficient, and energy increases do not guarantee performance gains. Second, newer SLMs show clear improvements in performance–energy trade-offs, though the progress begins to plateau. Last, the efficiency landscape forms a clear Pareto frontier: initial energy increases yield substantial gains, but the last percentage points of performance need orders of magnitude more energy. These results highlight diminishing returns of scaling and emphasize the need for informed, task-aware model selection rather than size-driven choices.

Conference

International Joint Conference on Artificial Intelligence (IJCAI) 2026

Repository Structure

.
├── requirements.txt
└── scripts/
    ├── 1_load_benchmark_data.py
    ├── 2_1_inference_SLM.py
    ├── 2_2_inference_vLLM.py
    ├── 3_1_evaluation.py
    ├── 3_2_evaluation_summary.py
    └── utils/
  • requirements.txt lists the Python dependencies.
  • scripts/1_load_benchmark_data.py downloads and samples the benchmark datasets, then writes them into a unified CSV format.
  • scripts/2_1_inference_SLM.py runs Hugging Face model inference and records outputs, runtime metadata, and CodeCarbon energy measurements.
  • scripts/2_2_inference_vLLM.py runs inference through a vLLM server and records outputs, runtime metadata, and CodeCarbon energy measurements.
  • scripts/3_1_evaluation.py scores model outputs against the benchmark references.
  • scripts/3_2_evaluation_summary.py aggregates evaluated result files into compact metric summaries.
  • scripts/utils/ contains shared helpers, the evaluated SLM list, model release dates, and benchmark-specific scoring logic for XSum, MMLU-Redux, GSM8K, Berkeley Function Calling, BigCodeBench, and CoNLL-2003.

About

This repository contains the code of a paper submitted to the IJCAI 2026

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages