-
Notifications
You must be signed in to change notification settings - Fork 191
fix(classify): honor raw_scores flag (return logits, softmax only when raw_scores=False) #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -244,9 +244,14 @@ async def classify( | |
| items = [PredictSingle(sentence=s) for s in sentences] | ||
| classifications, usage = await self._schedule(items) | ||
|
|
||
| if raw_scores: | ||
| # perform softmax on scores | ||
| pass | ||
| if not raw_scores: | ||
| # the model returns raw logits; convert them to probabilities | ||
| for prediction in classifications: | ||
| logits = np.array([label["score"] for label in prediction]) | ||
| exp = np.exp(logits - logits.max()) | ||
| probs = exp / exp.sum() | ||
|
Comment on lines
+250
to
+252
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For multi-label sequence classifiers such as the documented GoEmotions model, labels are independent and the Transformers pipeline previously normalized raw logits with sigmoid rather than across-label softmax. This new default Useful? React with 👍 / 👎. |
||
| for label, prob in zip(prediction, probs): | ||
| label["score"] = float(prob) | ||
|
Comment on lines
+248
to
+254
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using NumPy array operations inside a loop on the main event loop introduces unnecessary overhead, especially since classification tasks typically have a very small number of classes (e.g., 2 to 10). We can optimize this by using pure Python with Additionally, we should add defensive checks to handle cases where import math
# the model returns raw logits; convert them to probabilities
for prediction in classifications:
scores = [label["score"] for label in prediction]
if not scores:
continue
max_score = max(scores)
exps = [math.exp(s - max_score) for s in scores]
sum_exps = sum(exps)
for label, exp_val in zip(prediction, exps):
label["score"] = exp_val / sum_exps if sum_exps > 0 else 0.0
Comment on lines
+247
to
+254
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The HF |
||
|
|
||
| return classifications, usage | ||
|
|
||
|
|
@@ -621,4 +626,4 @@ def _postprocess_batch(self): | |
| self._postprocess_queue.task_done() | ||
| except Exception as ex: | ||
| logger.exception(ex) | ||
| raise ValueError("Postprocessor crashed") | ||
| raise ValueError("Postprocessor crashed") | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -73,7 +73,13 @@ def encode_pre(self, sentences: list[str]): | |||||
|
|
||||||
| def encode_core(self, features): | ||||||
| """runs plain inference, on cpu/gpu""" | ||||||
| return self._pipe(features, batch_size=256, truncation=True, padding=True) | ||||||
| return self._pipe( | ||||||
| features, | ||||||
| batch_size=256, | ||||||
| truncation=True, | ||||||
| padding=True, | ||||||
| function_to_apply="none", | ||||||
| ) | ||||||
|
|
||||||
| def encode_post(self, classes) -> dict[str, float]: | ||||||
| """runs post encoding such as normalization""" | ||||||
|
|
@@ -88,4 +94,4 @@ def tokenize_lengths(self, sentences: list[str]) -> list[int]: | |||||
| return_attention_mask=False, | ||||||
| return_length=False, | ||||||
| ).encodings | ||||||
| return [len(t.tokens) for t in tks] | ||||||
| return [len(t.tokens) for t in tks] | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of
raw_scoresin theclassifymethod signature (line 224) is currentlyTrue.Previously, because the pipeline applied softmax internally and the
if raw_scores:block was a no-op, callingclassifywith default arguments returned softmax probabilities. With this change, callingclassifywith default arguments will now return raw logits, which is a breaking change for the default behavior of the API.To preserve the original default behavior of returning softmax probabilities, please change the default value of
raw_scoresin the signature ofclassify(line 224) toFalse(matching the behavior ofrerank).