refactor embedding infonce loss by Jintao-Huang · Pull Request #9179 · modelscope/ms-swift

Jintao-Huang · 2026-04-22T06:01:30Z

No description provided.

gemini-code-assist

Code Review

This pull request refactors the InfonceLoss class into InfoNCELoss, modularizing the implementation into helper methods for configuration parsing, distributed data gathering, and specific loss calculation paths. The review feedback highlights a critical bug in the distributed gathering logic where torch.stack should be replaced with torch.cat to maintain the expected 1D label structure. Other suggestions include improving efficiency by using F.cross_entropy instead of instantiating loss classes within loops, removing redundant squeeze(1) calls that could cause errors with single-dimension embeddings, and addressing potential hangs in the Megatron path due to varying label shapes across ranks.

gemini-code-assist · 2026-04-22T06:03:51Z

+            all_sentences[idx] = all_sentences[idx].detach().to(sentences.device)
+        sentences = torch.cat(all_sentences, dim=0)
+        labels = [tensor.to(sentences.device) for tensor in labels]
+        labels = torch.stack(labels, dim=0)


The labels tensors gathered from different ranks should be concatenated using torch.cat instead of torch.stack. torch.stack adds a new dimension, resulting in a 2D tensor (e.g., [world_size, batch_size_per_gpu]), which will break the logic in _parse_multi_negative_sentences that expects a 1D tensor to identify split points. Concatenating them preserves the 1D structure required for correct indexing of the global sentences tensor.

Suggested change

labels = torch.stack(labels, dim=0)

labels = torch.cat(labels, dim=0)

gemini-code-assist · 2026-04-22T06:03:51Z

+            all_labels = [labels.new_empty_like(labels) for _ in range(world_size)]
+            dist.all_gather(all_labels, labels, group=dp_group)


In the Megatron path, labels are gathered using dist.all_gather with new_empty_like(labels), which assumes that labels have the same shape across all ranks. If the number of samples or negatives varies per rank, this will cause a hang or crash. Consider gathering the shapes of labels first (as done for sentences in lines 143-150) or using gather_object if performance allows for the labels tensor.

gemini-code-assist · 2026-04-22T06:03:51Z

+            # [D] * [neg+1, D]
+            similarity = torch.matmul(tensor[0], tensor[1:].T) / temperature
+            target = torch.tensor(0, device=tensor.device)
+            loss += nn.CrossEntropyLoss()(similarity, target)


Instantiating nn.CrossEntropyLoss() inside a loop is inefficient. It is better to use the functional interface F.cross_entropy which is already imported.

Suggested change

loss += nn.CrossEntropyLoss()(similarity, target)

loss += F.cross_entropy(similarity.unsqueeze(0), target.unsqueeze(0))

gemini-code-assist · 2026-04-22T06:03:51Z

+        temperature = config['temperature']
+        sentences = torch.stack(split_tensors, dim=0)  # [B, neg+2, D]
+        # base q->d similarities (includes own positive and all in-batch documents)
+        queries = sentences[:, 0].squeeze(1)  # [B, D]


The squeeze(1) call on sentences[:, 0] is unnecessary and potentially risky. sentences[:, 0] already has the shape [B, D]. If the embedding dimension D happens to be 1, squeeze(1) will reduce the tensor to shape [B], which will cause the subsequent torch.matmul to fail or produce incorrect results.

Suggested change

queries = sentences[:, 0].squeeze(1) # [B, D]

queries = sentences[:, 0] # [B, D]

gemini-code-assist · 2026-04-22T06:03:51Z

+        # Optional d+->d (doc-doc) similarity; exclude self-positive column per row
+        dd_matrix = None
+        if config['include_dd']:
+            pos_docs = sentences[:, 1].squeeze(1)  # [B, D]


Similar to the queries extraction, squeeze(1) on sentences[:, 1] is unnecessary and could lead to issues if the embedding dimension is 1.

Suggested change

pos_docs = sentences[:, 1].squeeze(1) # [B, D]

pos_docs = sentences[:, 1] # [B, D]

gemini-code-assist · 2026-04-22T06:03:51Z

+                logits_parts.append(dd_vec)
+
+            logits_row = torch.cat(logits_parts, dim=-1) / temperature
+            loss += nn.CrossEntropyLoss()(logits_row.unsqueeze(0), target.unsqueeze(0))


Instantiating nn.CrossEntropyLoss() inside a loop is inefficient. Use F.cross_entropy instead.

Suggested change

loss += nn.CrossEntropyLoss()(logits_row.unsqueeze(0), target.unsqueeze(0))

loss += F.cross_entropy(logits_row.unsqueeze(0), target.unsqueeze(0))

refactor infonce loss

7cd69aa

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor embedding infonce loss#9179

refactor embedding infonce loss#9179
Jintao-Huang wants to merge 1 commit into
modelscope:mainfrom
Jintao-Huang:fix_refactor_infonce_loss

Jintao-Huang commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	labels = torch.stack(labels, dim=0)
	labels = torch.cat(labels, dim=0)

		all_labels = [labels.new_empty_like(labels) for _ in range(world_size)]
		dist.all_gather(all_labels, labels, group=dp_group)

	loss += nn.CrossEntropyLoss()(similarity, target)
	loss += F.cross_entropy(similarity.unsqueeze(0), target.unsqueeze(0))

	queries = sentences[:, 0].squeeze(1) # [B, D]
	queries = sentences[:, 0] # [B, D]

	pos_docs = sentences[:, 1].squeeze(1) # [B, D]
	pos_docs = sentences[:, 1] # [B, D]

	loss += nn.CrossEntropyLoss()(logits_row.unsqueeze(0), target.unsqueeze(0))
	loss += F.cross_entropy(logits_row.unsqueeze(0), target.unsqueeze(0))

Conversation

Jintao-Huang commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant