F13-114 simple safety classification task #758

SebastianWolfschmidtAA · 2024-04-18T14:49:27Z

Description

No description.

Before Merging

[] Review the code changes
- Unused print / comments / TODOs
- Missing docstrings for functions that should have them
- Consistent variable names
- ...
[] Update changelog.md if necessary
Commit messages should contain a semantic label and the ticket number
- Consider squashing if this is not the case

…ication

MartinAchtnerAA · 2024-04-19T14:15:29Z

src/intelligence_layer/core/safety_classifier.py

+
+ALL_FLAGS = "\n-".join([e.value for e in UnsafeOutputFlag])
+
+INSTRUCTION_CONFIGS = {


The prompts are WIP and subject to findings from https://aleph-alpha.atlassian.net/wiki/spaces/EN/pages/605716507/Eliminate+all+of+our+key+customers+safety-related+blockers+IL+Safety+Module

There exists a PromptBasedClassify task, however it comes with some caveats and the tests looked less promissing than with this custom prompt logic

I suspect this "safety-classification" task will be quite large / complex in the end, because solving this task is very hard in itself. Have you looked at alternatives to implement this? Afaik there are whole python libraries who try to achieve this, like guardrails etc

MartinAchtnerAA · 2024-04-19T14:16:46Z

src/intelligence_layer/core/safety_classifier.py

Is 'core' the correct module for this task?

I would probably put it into use-case, but this is not 100% sure and up to discussion, since just having this safety classification is not a use case itself. However, since this is very close to the prompt based classify, it might be better there.

NiklasKoehneckeAA · 2024-04-19T14:32:20Z

CHANGELOG.md

@@ -8,6 +8,7 @@


 ### New Features
+- feature: New `SafetyClassifier` allows to flag safe/unsafe text


maybe we should mark this as beta or something? have you evaluated the performance yet?

I am happy to mark it as beta here, or maybe not even 'document' it for now.

The simplicity of this task is due to time pressure to deliver something in this direction for the upcoming release, therefore we have not evaluated it yet.

There also exists a prompt based classifier which maybe could be adapted to also return MultiLabelClassifyOutput.

These are all topics up to future improvement.

NiklasKoehneckeAA · 2024-04-19T14:36:36Z

src/intelligence_layer/core/safety_classifier.py

+class UnsafeOutputFlag(Enum):
+    HATESPEECH = "hatespeech"
+    SEXUAL = "sexual"
+    RELIGIOUS = "religious"
+    MEDICAL = "medical"
+    SELF_HARM = "self_harm"
+    ILLEGAL_ACTIVITIES = "illegal_activities"
+    VIOLENCE = "violence"
+    INSULT = "insult"
+    PROFANITY = "profanity"


where did you get these labels from, are they from some sort of requirement or did you just think of them?

Inspired by https://aleph-alpha.atlassian.net/wiki/spaces/EN/pages/605716507/Eliminate+all+of+our+key+customers+safety-related+blockers+IL+Safety+Module#Relevant-Policies-%2F-Labels-for-us%3A

MartinAchtnerAA · 2024-04-22T07:29:23Z

It was decided to not put this into IL as is now.

MartinAchtnerAA changed the title ~~F13 114 f 13 il safety classifier~~ F13-114 f 13 il safety classifier Apr 18, 2024

MartinAchtnerAA force-pushed the F13-114-f-13-il-safety-classifier branch 2 times, most recently from 639e53b to a18b596 Compare April 19, 2024 12:18

SebastianWolfschmidtAA and others added 3 commits April 19, 2024 14:43

feat: add basic logic and first tests for safety classifier

3706c49

feat: Use model 'luminous-nextgen-7b-control-384k' for safety classif…

1b1914d

…ication

feat: Add some test cases for safety classification

facbeba

MartinAchtnerAA force-pushed the F13-114-f-13-il-safety-classifier branch from a18b596 to 8d8e8ee Compare April 19, 2024 12:44

feat: Add german prompt for safety classifier

fba585c

MartinAchtnerAA force-pushed the F13-114-f-13-il-safety-classifier branch 2 times, most recently from 9eea9af to 3218b3f Compare April 19, 2024 13:44

MartinAchtnerAA changed the title ~~F13-114 f 13 il safety classifier~~ F13-114 simple safety classifier task Apr 19, 2024

MartinAchtnerAA force-pushed the F13-114-f-13-il-safety-classifier branch from 3218b3f to d338920 Compare April 19, 2024 13:46

feat: Add safety classification tests for longer text

79e6739

MartinAchtnerAA force-pushed the F13-114-f-13-il-safety-classifier branch from d338920 to 79e6739 Compare April 19, 2024 13:51

MartinAchtnerAA added 2 commits April 19, 2024 16:03

feat: Automatically create list of flags in prompt based on enum

f160008

feat: Add SafetyClassifier to CHANGELOG.md

091e6e6

MartinAchtnerAA changed the title ~~F13-114 simple safety classifier task~~ F13-114 simple safety classification task Apr 19, 2024

MartinAchtnerAA reviewed Apr 19, 2024

View reviewed changes

NiklasKoehneckeAA reviewed Apr 19, 2024

View reviewed changes

MartinAchtnerAA closed this Apr 22, 2024

NiklasKoehneckeAA deleted the F13-114-f-13-il-safety-classifier branch April 30, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F13-114 simple safety classification task #758

F13-114 simple safety classification task #758

SebastianWolfschmidtAA commented Apr 18, 2024 •

edited by MartinAchtnerAA

Loading

MartinAchtnerAA Apr 19, 2024

NiklasKoehneckeAA Apr 19, 2024

MartinAchtnerAA Apr 19, 2024

NiklasKoehneckeAA Apr 19, 2024

NiklasKoehneckeAA Apr 19, 2024

MartinAchtnerAA Apr 19, 2024

MartinAchtnerAA Apr 19, 2024

NiklasKoehneckeAA Apr 19, 2024

MartinAchtnerAA Apr 19, 2024

MartinAchtnerAA commented Apr 22, 2024


		ALL_FLAGS = "\n-".join([e.value for e in UnsafeOutputFlag])

		INSTRUCTION_CONFIGS = {

		@@ -8,6 +8,7 @@


		### New Features
		- feature: New `SafetyClassifier` allows to flag safe/unsafe text

F13-114 simple safety classification task #758

F13-114 simple safety classification task #758

Conversation

SebastianWolfschmidtAA commented Apr 18, 2024 • edited by MartinAchtnerAA Loading

Description

Before Merging

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MartinAchtnerAA commented Apr 22, 2024

SebastianWolfschmidtAA commented Apr 18, 2024 •

edited by MartinAchtnerAA

Loading