Fix embedding-based classification documentation #597

SebastianZettAA · 2024-03-08T12:29:34Z

The doc string is correct for class EmbeddingBasedClassify (Scores will be between 0 and 1 but do not have to add up to one.), but not in the classification example notebook classification.ipynb.

Two followup questions here:

The prompt-based classification is denoted as single-label classification, whereas the embedding-based classification as multi-label. I do not see a real reason why this differentiation is made, in my opinion both approaches (log-prob scores vs. cosine similarity scores) could be used for both classification tasks (i.e., only assign the class with the highest score, vs. assigning multiple classes surpassing a to be defined threshold).
Is there a reason why the prompt-based classification scores are normalized (to sum up to 1), but those of embedding-based classification not?

MerlinKallenbornTNG · 2024-04-08T06:28:42Z

Aloha,
what's the status of this PR? Looks like nothing happened since two weeks. Can it be merged or has it become obsolete?

SebastianZettAA · 2024-04-08T07:51:49Z

I opened the PR when trying to use the classification task(s) for a client project when I still was on the customer team. At that time I did not feel responsible nor entitled to work more directly on the IL than asking above questions.
Since I was told @NickyHavoc primarily worked on this topic I suggest we talk today or tomorrow about this in person?

SebastianZettAA self-assigned this Mar 8, 2024

NiklasKoehneckeAA force-pushed the fix-embedding-classification-docu branch from f9269f5 to 83131ab Compare March 28, 2024 09:15

SebastianZettAA added 2 commits April 16, 2024 08:55

fix documentation w.r.t. possible range of embedding scores

66456eb

fix typo in classification notebook documentation

48f4f63

FlorianSchepersAA force-pushed the fix-embedding-classification-docu branch 2 times, most recently from 57e0d29 to 48f4f63 Compare April 16, 2024 07:07

FlorianSchepersAA merged commit bdb5bf8 into main Apr 16, 2024
4 checks passed

FlorianSchepersAA deleted the fix-embedding-classification-docu branch April 16, 2024 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix embedding-based classification documentation #597

Fix embedding-based classification documentation #597

SebastianZettAA commented Mar 8, 2024

MerlinKallenbornTNG commented Apr 8, 2024

SebastianZettAA commented Apr 8, 2024

Fix embedding-based classification documentation #597

Fix embedding-based classification documentation #597

Conversation

SebastianZettAA commented Mar 8, 2024

MerlinKallenbornTNG commented Apr 8, 2024

SebastianZettAA commented Apr 8, 2024