-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F13-114 simple safety classification task #758
Conversation
639e53b
to
a18b596
Compare
a18b596
to
8d8e8ee
Compare
9eea9af
to
3218b3f
Compare
3218b3f
to
d338920
Compare
d338920
to
79e6739
Compare
|
||
ALL_FLAGS = "\n-".join([e.value for e in UnsafeOutputFlag]) | ||
|
||
INSTRUCTION_CONFIGS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompts are WIP and subject to findings from https://aleph-alpha.atlassian.net/wiki/spaces/EN/pages/605716507/Eliminate+all+of+our+key+customers+safety-related+blockers+IL+Safety+Module
There exists a PromptBasedClassify task, however it comes with some caveats and the tests looked less promissing than with this custom prompt logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this "safety-classification" task will be quite large / complex in the end, because solving this task is very hard in itself. Have you looked at alternatives to implement this? Afaik there are whole python libraries who try to achieve this, like guardrails etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'core' the correct module for this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably put it into use-case, but this is not 100% sure and up to discussion, since just having this safety classification is not a use case itself. However, since this is very close to the prompt based classify, it might be better there.
@@ -8,6 +8,7 @@ | |||
|
|||
|
|||
### New Features | |||
- feature: New `SafetyClassifier` allows to flag safe/unsafe text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should mark this as beta or something? have you evaluated the performance yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to mark it as beta here, or maybe not even 'document' it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplicity of this task is due to time pressure to deliver something in this direction for the upcoming release, therefore we have not evaluated it yet.
There also exists a prompt based classifier which maybe could be adapted to also return MultiLabelClassifyOutput.
These are all topics up to future improvement.
class UnsafeOutputFlag(Enum): | ||
HATESPEECH = "hatespeech" | ||
SEXUAL = "sexual" | ||
RELIGIOUS = "religious" | ||
MEDICAL = "medical" | ||
SELF_HARM = "self_harm" | ||
ILLEGAL_ACTIVITIES = "illegal_activities" | ||
VIOLENCE = "violence" | ||
INSULT = "insult" | ||
PROFANITY = "profanity" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did you get these labels from, are they from some sort of requirement or did you just think of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was decided to not put this into IL as is now. |
Description
No description.
Before Merging
changelog.md
if necessary