add DepthScore#3318
Conversation
…est cases through utils properly
…s missing, fixes ddp Skipped exception failures
…ohaib-Ahmed21/torchmetrics into feature/854_depth_score_metric
|
@bhimrazy @justusschock this PR is ready for review. Kindly approve the workflows and review it, thanks! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #3318 +/- ##
=======================================
- Coverage 37% 36% -1%
=======================================
Files 364 351 -13
Lines 20098 20273 +175
=======================================
- Hits 7520 7396 -124
- Misses 12578 12877 +299 🚀 New features to boost your workflow:
|
|
Thank you @Sohaib-Ahmed21 for this pr. Will take a look... |
justusschock
left a comment
There was a problem hiding this comment.
why a are we doing all the operations here in numpy rather than pytorch? it should be easier if we don't need to do conversions, no?
| return torch.stack(out, dim=0) | ||
|
|
||
|
|
||
| def cov_matrix(x: np.ndarray, robust: bool = False) -> np.ndarray: |
There was a problem hiding this comment.
what's the issue with torch.cov?
sklearn is a heavy requirement just for this.
| containing `"input_ids"` and `"attention_mask"`. | ||
| target: Reference sentence(s) as `str`, `Sequence[str]`, multi-reference | ||
| `Sequence[Sequence[str]]`, or tokenized dict containing `"input_ids"` and `"attention_mask"`. | ||
| model_name_or_path: Hugging Face model name/path used when `model` is not provided. |
There was a problem hiding this comment.
can we unify this with model? if model is a string, we use that as name or path and if it's a callable we'll use it straight away?
|
Thanks for the review @justusschock. Will address this as soon as possible. |
What does this PR do?
Adds the
DepthScoremetric for evaluating text generation. The implementation follows the BERTScore architecture and logic closely, as both metrics:The key difference is that DepthScore measures the distance between embedding distributions using depth-based statistical methods (integrated rank-weighted depth, Wasserstein distance, MMD, etc.) instead of token-level cosine similarity.
Fixes #854
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃
📚 Documentation preview 📚: https://torchmetrics--3318.org.readthedocs.build/en/3318/