-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: unsafe content shown when continuing the chat #708
Comments
Chris and I offlined on this and agreed we'll wait until pipelines are ready to replace safety layers before addressing this. |
Some background Problem statementIn terms of safety layers, we currently save the offending human and/or AI message to the chat history. This is useful when viewing the transcript. When the AI response is the offending one, we 1. save the message to history, 2. generate a safe response and 3. return this safe response to the caller. This all happens in the process_input Two things to focus on:
To solve this we 1. need a way to distinguish offending messages from non-offending ones and be able to filter them out whenever needed, and we need to be able to show the offending message in the UI, so 2. we need some way to get hold of this message. The solutionThe solution is multi-faceted.
The points above are relatively easy and quick to implement. How we get that array of messages from the bot is where things start to get tricky. We have options The proper way (LOE: High)The This approach will require us to update many areas in the code that expects a single string response to now expect an object instead. Thus, the LOE is high for this one. A less hacky way (LOE: Low)*This approach doesn't require the task to return an array of messages, so number 2 above wouldn't be needed. The safe AI message should have in its metadata the ID of the message that it is the safe version of. This way we don't have to return an array of messages, but only the actual AI message ID. We can then use this to find the unsafe message as well if we needed to. The more hacky way (LOE: Low)In this approach, we save the unsafe message as an attribute on the @bderenzi Not sure if this changes anything regarding the decision to wait. cc @snopoke |
https://chatbots.dimagi.com/a/bmgf-demos/experiments/e/1693/session/16764/ -- you'll have to spoof as me.
Since we're saving unsafe user and bot messages to the transcript now, they are showing up when the user choose to "continue chat".
When in the original chat, it doesn't show the unsafe message. I assume it will also show unsafe human messages when resuming a chat, but didn't test
The text was updated successfully, but these errors were encountered: