Word-level embeddings #110

Mystic-Slice · 2025-06-23T15:09:07Z

Mystic-Slice
Jun 23, 2025
Maintainer

The system currently considers each sentence to represent a single value. This is a good tradeoff between granularity/accuracy and the speed of recommendation.

World-level embeddings could provide more granular and accurate classification. This is a discussion to explore that idea.

santanavagner · 2025-08-26T15:48:47Z

santanavagner
Aug 26, 2025
Maintainer

Hi @Mystic-Slice,

Thank you for starting this thread.

We need to gather evidence that by using word-level embedding we would also be able to identify more accurately the input prompts.

How do you see this applied to the datasets and API endpoints we have today?

1 reply

Mystic-Slice Aug 26, 2025
Maintainer Author

Hi @santanavagner
I opened this thread based on a suggestion from a community member, to know more about his idea.

@ArionDas You are welcome to share your thoughts here. Thanks!

ArionDas · 2025-08-29T21:30:04Z

ArionDas
Aug 29, 2025

Thank you @Mystic-Slice.

Hello, @santanavagner, so here's some context on how I came to know about this project.
I attended a call where @Mystic-Slice presented the work through Cohere Labs, where he showed the embedding value space. Each value in that space represents an entire sentence.

Now, when a user puts in some prompt, I vividly remember (from the presentation), the application currently recommends whole sentences in place of faulty (negative sentiment, foul words, etc) sentences (given by the user). This is because we are using sentence-level embeddings currently.

Two problems could arise here (correct me if I'm wrong):

we miss out the context of some positive sentiment / useful information in some user's sentence to remove (let's say) a few faulty words somewhere in that sentence.
the sentence we recommend maybe close to what a user would've meant, but maybe it could miss out on certain essential "jargons" specific to their domain of interest (we don't have such words in our static dataset of sentences).

I had proposed, why not use word-level embeddings instead of entire sentences, so the application can be more dynamic, and recommend swapping faulty useful words instead of removing entire sentences.

Happy to discuss more on the same, and now that I have a healthy knowledge of the repository, I'll be happy to help to implement it out as well, if it seems feasible.

Thank you.

0 replies

santanavagner · 2025-09-01T20:17:17Z

santanavagner
Sep 1, 2025
Maintainer

Hi @ArionDas ,

Got it. I agree with you and results from our latest paper are aligned with this, i.e., users don't like when the system removes the whole sentence in case one or more words are identified as harmful.

Actually, this idea on swapping terms is being implemented by @luanssouza in a feature for recommending a reprhasing of the sentenced identified as harmful (#17) instead of removing it.

@luanssouza ,

Please consider sharing the working code you have so @ArionDas can collaborate with you on issue #17.

Thank you all!

Cheers

2 replies

luanssouza Sep 2, 2025
Collaborator

Hello, @santanavagner and @ArionDas!

Interesting idea @ArionDas. The way I am doing the negative sentence rephrasing goes in a different direction: I am using prompts to ask LLMs to rephrase negative sentences considering our social values.

The work I have been doing so far can be found in the brach rephrase of our repository.

Best regards,
Luan

ArionDas Sep 3, 2025

Hey @luanssouza,
I went through your code where you used an LLM to generate more dynamic, "word-level" recommendations.
Very nicely designed task with all the tags, etc.

Is there a place I could test it out maybe? (on the UI I mean - because I can't host the LLM on my personal machine.)

Maybe, then we can connect and discuss about doing something about recommending "domain-specific jargons"?
Or you can let me know if you have something else on your mind, and if you want me to help you out.

cc: @santanavagner @Mystic-Slice

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word-level embeddings #110

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Word-level embeddings #110

Uh oh!

Uh oh!

Mystic-Slice Jun 23, 2025 Maintainer

Replies: 3 comments · 3 replies

Uh oh!

santanavagner Aug 26, 2025 Maintainer

Uh oh!

Mystic-Slice Aug 26, 2025 Maintainer Author

Uh oh!

ArionDas Aug 29, 2025

Uh oh!

santanavagner Sep 1, 2025 Maintainer

Uh oh!

luanssouza Sep 2, 2025 Collaborator

Uh oh!

ArionDas Sep 3, 2025

Mystic-Slice
Jun 23, 2025
Maintainer

Replies: 3 comments 3 replies

santanavagner
Aug 26, 2025
Maintainer

Mystic-Slice Aug 26, 2025
Maintainer Author

ArionDas
Aug 29, 2025

santanavagner
Sep 1, 2025
Maintainer

luanssouza Sep 2, 2025
Collaborator