fix(classify): honor raw_scores flag (return logits, softmax only when raw_scores=False) by Anai-Guo · Pull Request #662 · michaelfeil/infinity

Anai-Guo · 2026-06-04T07:08:22Z

Problem

/classify ignores the raw_scores parameter — softmax probabilities are always returned (#658).

The classifiers run an HF text-classification pipeline(..., top_k=None), which applies softmax internally by default. The handler's post-processing block was a no-op:

if raw_scores:
    # perform softmax on scores
    pass

So the flag never had any effect, and raw logits were never reachable.

Fix

Mirror the existing rerank path (model returns raw scores, activation applied conditionally in the handler):

SentenceClassifier / OptimumClassifier: call the pipeline with function_to_apply="none" so it emits raw logits.
BatchHandler.classify: apply a numerically-stable softmax only when raw_scores is False.

raw_scores=True now returns logits; the default raw_scores=False path is unchanged (softmax is monotonic, so the descending order the pipeline produces is preserved, and the resulting probabilities are identical).

🤖 Generated with Claude Code

…max conditionally The text-classification pipeline applies softmax internally, so the `/classify` endpoint always returned softmax probabilities regardless of the `raw_scores` flag (the post-processing block was a no-op). Pass function_to_apply="none" so the model emits raw logits, then apply softmax in the batch handler only when raw_scores is False, mirroring the rerank path. Fixes michaelfeil#658

gemini-code-assist

Code Review

This pull request updates the classification pipeline to retrieve raw logits from the underlying models by passing function_to_apply="none" to the pipelines, and manually computes the softmax probabilities in batch_handler.py when raw_scores is False. The review feedback highlights a potential breaking change where the default behavior of classify now returns raw logits instead of probabilities, and suggests changing the default value of raw_scores to False to preserve backward compatibility. Additionally, it is recommended to optimize the softmax computation using pure Python math.exp to avoid NumPy overhead on small lists, along with adding defensive checks for empty predictions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T07:09:44Z

-        if raw_scores:
-            # perform softmax on scores
-            pass
+        if not raw_scores:


The default value of raw_scores in the classify method signature (line 224) is currently True.

Previously, because the pipeline applied softmax internally and the if raw_scores: block was a no-op, calling classify with default arguments returned softmax probabilities. With this change, calling classify with default arguments will now return raw logits, which is a breaking change for the default behavior of the API.

To preserve the original default behavior of returning softmax probabilities, please change the default value of raw_scores in the signature of classify (line 224) to False (matching the behavior of rerank).

gemini-code-assist · 2026-06-04T07:09:44Z

+            # the model returns raw logits; convert them to probabilities
+            for prediction in classifications:
+                logits = np.array([label["score"] for label in prediction])
+                exp = np.exp(logits - logits.max())
+                probs = exp / exp.sum()
+                for label, prob in zip(prediction, probs):
+                    label["score"] = float(prob)


Using NumPy array operations inside a loop on the main event loop introduces unnecessary overhead, especially since classification tasks typically have a very small number of classes (e.g., 2 to 10). We can optimize this by using pure Python with math.exp, which is significantly faster for small lists and avoids NumPy overhead.

Additionally, we should add defensive checks to handle cases where prediction might be empty to prevent potential runtime errors.

import math # the model returns raw logits; convert them to probabilities for prediction in classifications: scores = [label["score"] for label in prediction] if not scores: continue max_score = max(scores) exps = [math.exp(s - max_score) for s in scores] sum_exps = sum(exps) for label, exp_val in zip(prediction, exps): label["score"] = exp_val / sum_exps if sum_exps > 0 else 0.0

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94af43e02f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T07:11:02Z

+                logits = np.array([label["score"] for label in prediction])
+                exp = np.exp(logits - logits.max())
+                probs = exp / exp.sum()


Preserve sigmoid probabilities for multi-label classifiers

For multi-label sequence classifiers such as the documented GoEmotions model, labels are independent and the Transformers pipeline previously normalized raw logits with sigmoid rather than across-label softmax. This new default raw_scores=false path always divides by the sum over every label, forcing each prediction's scores to sum to 1 and suppressing valid co-occurring labels whenever more than one class applies, so /classify now returns non-HF probabilities for those models.

Useful? React with 👍 / 👎.

greptile-apps · 2026-06-04T07:13:30Z

Greptile Summary

This PR fixes a long-standing bug where the /classify endpoint's raw_scores flag was a no-op, always returning softmax probabilities. Both classifier backends now pass function_to_apply="none" to get raw logits, and the BatchHandler applies a numerically-stable softmax conditionally when raw_scores=False.

torch.py / optimum.py: encode_core now passes function_to_apply="none" to the HF pipeline so all downstream post-processing happens in one place.
batch_handler.py: The inverted condition (if not raw_scores) now correctly applies a log-sum-exp-stabilised softmax, mirroring the rerank path. However, the softmax is applied unconditionally for all model types — for multi-label classifiers the HF pipeline would normally apply sigmoid (independent per-label scores), not softmax (mutually-exclusive scores), so those models will silently return incorrect probabilities when raw_scores=False.

Confidence Score: 3/5

The flag inversion is correct for single-label classifiers, but multi-label models will silently receive softmax-transformed scores instead of the per-label sigmoid the HF pipeline would have applied.

The raw_scores flag fix works correctly for single-label text classifiers, which are the common case. However, by bypassing the pipeline's built-in activation-function selection with function_to_apply="none" and always applying softmax in the handler, any multi-label classification model (where problem_type == "multi_label_classification") will silently produce wrong probabilities — softmax forces mutually exclusive outputs, but multi-label tasks require independent sigmoid-scaled scores per class.

libs/infinity_emb/infinity_emb/inference/batch_handler.py — the softmax applied unconditionally needs to account for multi-label model types

Important Files Changed

Filename	Overview
libs/infinity_emb/infinity_emb/inference/batch_handler.py	Fixes the classify raw_scores flag with numerically-stable softmax, but unconditionally applies softmax for all model types, breaking multi-label classifiers that require sigmoid instead.
libs/infinity_emb/infinity_emb/transformer/classifier/torch.py	Correctly passes `function_to_apply="none"` to the HF pipeline so raw logits are returned for post-processing; trailing newline removed.
libs/infinity_emb/infinity_emb/transformer/classifier/optimum.py	Correctly passes `function_to_apply="none"` to the ONNX pipeline so raw logits are returned; trailing newline removed.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server as infinity_server
    participant Engine as BatchHandler.classify
    participant Backend as SentenceClassifier / OptimumClassifier

    Client->>Server: "POST /classify {input, raw_scores}"
    Server->>Engine: classify(sentences, raw_scores)
    Engine->>Backend: _schedule(PredictSingle items)
    Backend->>Backend: "encode_core() pipeline(function_to_apply=none)"
    Note over Backend: Returns raw logits (no activation)
    Backend-->>Engine: classifications (raw logits)
    alt "raw_scores == False"
        Engine->>Engine: stable softmax over logits per prediction
        Note over Engine: softmax is wrong for multi-label models (should be sigmoid)
    end
    Engine-->>Server: classifications, usage
    Server-->>Client: JSON response

_{Reviews (1): Last reviewed commit: "fix(classify): honor raw_scores by retur..." | Re-trigger Greptile}

greptile-apps · 2026-06-04T07:13:33Z

+        if not raw_scores:
+            # the model returns raw logits; convert them to probabilities
+            for prediction in classifications:
+                logits = np.array([label["score"] for label in prediction])
+                exp = np.exp(logits - logits.max())
+                probs = exp / exp.sum()
+                for label, prob in zip(prediction, probs):
+                    label["score"] = float(prob)


Softmax incorrectly applied to multi-label classifiers

The HF text-classification pipeline applies sigmoid (not softmax) for models with problem_type == "multi_label_classification" or num_labels == 1. By calling function_to_apply="none" in both classifier backends and then unconditionally applying softmax here, a multi-label classifier will produce wrong probabilities — labels are forced to be mutually exclusive (sum to 1) rather than independently scored per-class. The correct fix would check the model's config and apply sigmoid or softmax accordingly, mirroring the pipeline's own default logic.

greptile-apps · 2026-06-04T07:13:35Z

            return_length=False,
        ).encodings
-        return [len(t.tokens) for t in tks]
+        return [len(t.tokens) for t in tks]


Missing newline at end of file. Both changed classifier files (torch.py and optimum.py) lost their trailing newline, which will cause diff noise in future patches and may break some POSIX-compliant tooling.

Suggested change

return [len(t.tokens) for t in tks]

return [len(t.tokens) for t in tks]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Anai-Guo · 2026-06-07T00:59:57Z

Good catch on the multi-label case. You're right: HF's text-classification pipeline picks function_to_apply per model — softmax for single-label (problem_type == "single_label_classification" / num_labels > 1) but sigmoid for multi-label (problem_type == "multi_label_classification" / num_labels == 1). My current patch forces function_to_apply="none" in both backends and then unconditionally re-applies softmax in BatchHandler.classify, so a multi-label model would now return softmax instead of sigmoid when raw_scores=False — a regression for that model class. The single-label path (the common /classify use case) is unaffected.

The clean fix is to stop second-guessing the pipeline: thread raw_scores down to the backend and set function_to_apply="none" only when raw logits are requested, otherwise leave it unset so the pipeline applies its own (correct) default. That lets me drop the manual softmax in classify() entirely and handles both model types correctly.

@michaelfeil happy to push that version if you'd prefer it over the current approach — just want to confirm the threading-through-the-scheduler direction is acceptable before reworking it.

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(classify): honor raw_scores flag (return logits, softmax only when raw_scores=False)#662

fix(classify): honor raw_scores flag (return logits, softmax only when raw_scores=False)#662
Anai-Guo wants to merge 1 commit into
michaelfeil:mainfrom
Anai-Guo:fix/classify-raw-scores

Anai-Guo commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

greptile-apps Bot commented Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

Anai-Guo commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return [len(t.tokens) for t in tks]
	return [len(t.tokens) for t in tks]

Conversation

Anai-Guo commented Jun 4, 2026

Problem

Fix

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 4, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Anai-Guo commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant