feat(server): add frequency/presence penalty sampling parameters by howard0su · Pull Request #250 · Luce-Org/lucebox-hub

howard0su · 2026-05-22T05:33:24Z

Add OpenAI-compatible frequency_penalty and presence_penalty to the C++ server's sampling pipeline. Implementation is model-independent and reusable across all backends (qwen35, qwen3, gemma4, laguna).

Changes:

Add freq_pen, pres_pen fields to SamplerCfg with needs_logit_processing() helper
Implement penalty logic in sample_logits() (applied before top-k/softmax)
Parse frequency_penalty, presence_penalty, repetition_penalty, rep_window from HTTP JSON
Fix temp=0 bypass bug: penalties now apply even with greedy decoding
Update all backends to use needs_logit_processing() instead of raw temp check
Spec decode correctly disabled when penalties are active
Add 19 unit tests (1364 assertions total)
Add 14 HTTP integration tests for sampling parameters
Add API.md documenting supported/TODO parameters with priority levels

Sampling chain: rep_penalty → freq/pres_penalty → top_k → softmax(temp) → top_p → draw

Add OpenAI-compatible frequency_penalty and presence_penalty to the C++ server's sampling pipeline. Implementation is model-independent and reusable across all backends (qwen35, qwen3, gemma4, laguna). Changes: - Add freq_pen, pres_pen fields to SamplerCfg with needs_logit_processing() helper - Implement penalty logic in sample_logits() (applied before top-k/softmax) - Parse frequency_penalty, presence_penalty, repetition_penalty, rep_window from HTTP JSON - Fix temp=0 bypass bug: penalties now apply even with greedy decoding - Update all backends to use needs_logit_processing() instead of raw temp check - Spec decode correctly disabled when penalties are active - Add 19 unit tests (1364 assertions total) - Add 14 HTTP integration tests for sampling parameters - Add API.md documenting supported/TODO parameters with priority levels Sampling chain: rep_penalty → freq/pres_penalty → top_k → softmax(temp) → top_p → draw Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cubic-dev-ai

3 issues found across 10 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="dflash/scripts/test_server_integration.py">

<violation number="1" location="dflash/scripts/test_server_integration.py:232">
P2: Flaky probabilistic integration test: `test_temperature_high_produces_variance` asserts `len(results) >= 2` across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.</violation>

<violation number="2" location="dflash/scripts/test_server_integration.py:279">
P2: `test_top_p_restrictive` asserts deterministic output without a fixed seed, which can cause flaky failures</violation>

<violation number="3" location="dflash/scripts/test_server_integration.py:343">
P2: Test assertion for frequency penalty is too weak to detect regressions</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-05-22T05:38:19Z

+    def test_temperature_high_produces_variance(self):
+        """High temperature with different seeds should produce different outputs."""
+        base = {
+            "model": MODEL_NAME,
+            "messages": [{"role": "user", "content": "Name a color."}],
+            "stream": False,
+            "max_tokens": 8,
+            "temperature": 1.5,
+        }
+        results = set()
+        for seed in [1, 2, 3, 4, 5]:
+            r = post_json("/v1/chat/completions", {**base, "seed": seed})
+            assert r.status_code == 200
+            results.add(r.json()["choices"][0]["message"]["content"].strip())
+        # With high temp and different seeds, we expect at least some variance
+        assert len(results) >= 2, f"Expected variance, got: {results}"


P2: Flaky probabilistic integration test: test_temperature_high_produces_variance asserts len(results) >= 2 across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 232: <comment>Flaky probabilistic integration test: `test_temperature_high_produces_variance` asserts `len(results) >= 2` across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.</comment> <file context> @@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self): + content = r.json()["choices"][0]["message"]["content"] + assert len(content) > 0 + + def test_temperature_high_produces_variance(self): + """High temperature with different seeds should produce different outputs.""" + base = { </file context>

Suggested change

def test_temperature_high_produces_variance(self):

"""High temperature with different seeds should produce different outputs."""

base = {

"model": MODEL_NAME,

"messages": [{"role": "user", "content": "Name a color."}],

"stream": False,

"max_tokens": 8,

"temperature": 1.5,

}

results = set()

for seed in [1, 2, 3, 4, 5]:

r = post_json("/v1/chat/completions", {**base, "seed": seed})

assert r.status_code == 200

results.add(r.json()["choices"][0]["message"]["content"].strip())

# With high temp and different seeds, we expect at least some variance

assert len(results) >= 2, f"Expected variance, got: {results}"

# Removed: testing statistical variance with small sample size is inherently flaky

# Temperature correctness is covered by test_temperature_nonzero_accepted,

# test_temperature_zero_is_deterministic, and test_seed_reproducibility.

cubic-dev-ai · 2026-05-22T05:38:19Z

+        text_no_pen = r_no_pen.json()["choices"][0]["message"]["content"].lower()
+        text_pen = r_pen.json()["choices"][0]["message"]["content"].lower()
+        # The penalized version should have fewer repetitions of "hello"
+        assert text_no_pen.count("hello") >= text_pen.count("hello")


P2: Test assertion for frequency penalty is too weak to detect regressions

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 343: <comment>Test assertion for frequency penalty is too weak to detect regressions</comment> <file context> @@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self): + text_no_pen = r_no_pen.json()["choices"][0]["message"]["content"].lower() + text_pen = r_pen.json()["choices"][0]["message"]["content"].lower() + # The penalized version should have fewer repetitions of "hello" + assert text_no_pen.count("hello") >= text_pen.count("hello") + + def test_presence_penalty_accepted(self): </file context>

cubic-dev-ai · 2026-05-22T05:38:19Z

+        assert r.status_code == 200
+        assert len(r.json()["choices"][0]["message"]["content"]) > 0
+
+    def test_top_p_restrictive(self):


P2: test_top_p_restrictive asserts deterministic output without a fixed seed, which can cause flaky failures

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 279: <comment>`test_top_p_restrictive` asserts deterministic output without a fixed seed, which can cause flaky failures</comment> <file context> @@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self): + assert r.status_code == 200 + assert len(r.json()["choices"][0]["message"]["content"]) > 0 + + def test_top_p_restrictive(self): + """Very low top_p should make output more deterministic.""" + body = { </file context>

cubic-dev-ai Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): add frequency/presence penalty sampling parameters#250

feat(server): add frequency/presence penalty sampling parameters#250
howard0su wants to merge 1 commit into
Luce-Org:mainfrom
howard0su:temp_cpp

howard0su commented May 22, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot May 22, 2026

Uh oh!

cubic-dev-ai Bot May 22, 2026

Uh oh!

cubic-dev-ai Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

howard0su commented May 22, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant