Skip to content

feat(server): add frequency/presence penalty sampling parameters#250

Open
howard0su wants to merge 1 commit into
Luce-Org:mainfrom
howard0su:temp_cpp
Open

feat(server): add frequency/presence penalty sampling parameters#250
howard0su wants to merge 1 commit into
Luce-Org:mainfrom
howard0su:temp_cpp

Conversation

@howard0su
Copy link
Copy Markdown
Contributor

Add OpenAI-compatible frequency_penalty and presence_penalty to the C++ server's sampling pipeline. Implementation is model-independent and reusable across all backends (qwen35, qwen3, gemma4, laguna).

Changes:

  • Add freq_pen, pres_pen fields to SamplerCfg with needs_logit_processing() helper
  • Implement penalty logic in sample_logits() (applied before top-k/softmax)
  • Parse frequency_penalty, presence_penalty, repetition_penalty, rep_window from HTTP JSON
  • Fix temp=0 bypass bug: penalties now apply even with greedy decoding
  • Update all backends to use needs_logit_processing() instead of raw temp check
  • Spec decode correctly disabled when penalties are active
  • Add 19 unit tests (1364 assertions total)
  • Add 14 HTTP integration tests for sampling parameters
  • Add API.md documenting supported/TODO parameters with priority levels

Sampling chain: rep_penalty → freq/pres_penalty → top_k → softmax(temp) → top_p → draw

Add OpenAI-compatible frequency_penalty and presence_penalty to the C++
server's sampling pipeline. Implementation is model-independent and
reusable across all backends (qwen35, qwen3, gemma4, laguna).

Changes:
- Add freq_pen, pres_pen fields to SamplerCfg with needs_logit_processing() helper
- Implement penalty logic in sample_logits() (applied before top-k/softmax)
- Parse frequency_penalty, presence_penalty, repetition_penalty, rep_window from HTTP JSON
- Fix temp=0 bypass bug: penalties now apply even with greedy decoding
- Update all backends to use needs_logit_processing() instead of raw temp check
- Spec decode correctly disabled when penalties are active
- Add 19 unit tests (1364 assertions total)
- Add 14 HTTP integration tests for sampling parameters
- Add API.md documenting supported/TODO parameters with priority levels

Sampling chain: rep_penalty → freq/pres_penalty → top_k → softmax(temp) → top_p → draw

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 10 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="dflash/scripts/test_server_integration.py">

<violation number="1" location="dflash/scripts/test_server_integration.py:232">
P2: Flaky probabilistic integration test: `test_temperature_high_produces_variance` asserts `len(results) >= 2` across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.</violation>

<violation number="2" location="dflash/scripts/test_server_integration.py:279">
P2: `test_top_p_restrictive` asserts deterministic output without a fixed seed, which can cause flaky failures</violation>

<violation number="3" location="dflash/scripts/test_server_integration.py:343">
P2: Test assertion for frequency penalty is too weak to detect regressions</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment on lines +232 to +247
def test_temperature_high_produces_variance(self):
"""High temperature with different seeds should produce different outputs."""
base = {
"model": MODEL_NAME,
"messages": [{"role": "user", "content": "Name a color."}],
"stream": False,
"max_tokens": 8,
"temperature": 1.5,
}
results = set()
for seed in [1, 2, 3, 4, 5]:
r = post_json("/v1/chat/completions", {**base, "seed": seed})
assert r.status_code == 200
results.add(r.json()["choices"][0]["message"]["content"].strip())
# With high temp and different seeds, we expect at least some variance
assert len(results) >= 2, f"Expected variance, got: {results}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Flaky probabilistic integration test: test_temperature_high_produces_variance asserts len(results) >= 2 across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 232:

<comment>Flaky probabilistic integration test: `test_temperature_high_produces_variance` asserts `len(results) >= 2` across only 5 generations with different seeds, but legitimate sampling behavior may still produce identical outputs for all seeds — especially for a prompt like "Name a color." where one token may dominate the distribution. This creates a flaky test that can fail on correct behavior.</comment>

<file context>
@@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self):
+        content = r.json()["choices"][0]["message"]["content"]
+        assert len(content) > 0
+
+    def test_temperature_high_produces_variance(self):
+        """High temperature with different seeds should produce different outputs."""
+        base = {
</file context>
Suggested change
def test_temperature_high_produces_variance(self):
"""High temperature with different seeds should produce different outputs."""
base = {
"model": MODEL_NAME,
"messages": [{"role": "user", "content": "Name a color."}],
"stream": False,
"max_tokens": 8,
"temperature": 1.5,
}
results = set()
for seed in [1, 2, 3, 4, 5]:
r = post_json("/v1/chat/completions", {**base, "seed": seed})
assert r.status_code == 200
results.add(r.json()["choices"][0]["message"]["content"].strip())
# With high temp and different seeds, we expect at least some variance
assert len(results) >= 2, f"Expected variance, got: {results}"
# Removed: testing statistical variance with small sample size is inherently flaky
# Temperature correctness is covered by test_temperature_nonzero_accepted,
# test_temperature_zero_is_deterministic, and test_seed_reproducibility.

text_no_pen = r_no_pen.json()["choices"][0]["message"]["content"].lower()
text_pen = r_pen.json()["choices"][0]["message"]["content"].lower()
# The penalized version should have fewer repetitions of "hello"
assert text_no_pen.count("hello") >= text_pen.count("hello")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Test assertion for frequency penalty is too weak to detect regressions

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 343:

<comment>Test assertion for frequency penalty is too weak to detect regressions</comment>

<file context>
@@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self):
+        text_no_pen = r_no_pen.json()["choices"][0]["message"]["content"].lower()
+        text_pen = r_pen.json()["choices"][0]["message"]["content"].lower()
+        # The penalized version should have fewer repetitions of "hello"
+        assert text_no_pen.count("hello") >= text_pen.count("hello")
+
+    def test_presence_penalty_accepted(self):
</file context>

assert r.status_code == 200
assert len(r.json()["choices"][0]["message"]["content"]) > 0

def test_top_p_restrictive(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: test_top_p_restrictive asserts deterministic output without a fixed seed, which can cause flaky failures

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At dflash/scripts/test_server_integration.py, line 279:

<comment>`test_top_p_restrictive` asserts deterministic output without a fixed seed, which can cause flaky failures</comment>

<file context>
@@ -209,6 +209,241 @@ def test_temperature_zero_is_deterministic(self):
+        assert r.status_code == 200
+        assert len(r.json()["choices"][0]["message"]["content"]) > 0
+
+    def test_top_p_restrictive(self):
+        """Very low top_p should make output more deterministic."""
+        body = {
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant