-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prefer serde structs over custom functions #2127
Conversation
518847b
to
bf24b76
Compare
d43ef3d
to
c326ffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thansk for this PR looks much better imho.
There are some caveats wth #[serde(untagged)]
we should be super aware of, but at least now it's explicit rather than implicit.
The biggest issue with it is the lack of strictness in the deserialization.
} | ||
|
||
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq)] | ||
#[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[serde(untagged)]
is unfortunately quite dangerous.
It will stop on first match, which is in some cases not what you expect (String
matches a lot of thigns for instance).
This makes the entire thing order dependent in the struct and creates hard to reason about bugs.
I'll continue reading the PR, but since tokenizers
did it and we're having many issues with it, my current stance would be to stay away from it.
pub fn from_file<P: AsRef<Path>>(filename: P) -> Option<Self> { | ||
std::fs::read_to_string(filename) | ||
.ok() | ||
.and_then(|content| serde_json::from_str(&content).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of prefer the earlier version, I find it easier to reason about.
Changing from Option
to result would make it even more ergonomic (not sure we care though)
@@ -121,35 +139,6 @@ pub(crate) enum GrammarType { | |||
Regex(String), | |||
} | |||
|
|||
mod token_serde { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be exactly like #[serde(untagged)]
therefore no change in actual logic.
.into_iter() | ||
.map(|chunk| match chunk { | ||
MessageChunk::Text { text } => text, | ||
MessageChunk::ImageUrl { image_url } => format!("![]({})", image_url.url), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we remove that ? I see it's equivalent with what was before so it's ok, but I though we removed those.
@@ -635,7 +636,7 @@ async fn completions( | |||
)); | |||
} | |||
|
|||
if req.prompt.len() > info.max_client_batch_size { | |||
if req.prompt.0.len() > info.max_client_batch_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works.
Other options
let Prompt { text} = req.prompt
Or implement Deref
(if that .0
becomes ubiquitous and annoying; https://doc.rust-lang.org/std/ops/trait.Deref.html#when-to-implement-deref-or-derefmut
* fix: prefer enum for chat object * fix: adjust typo * fix: enum CompletionType not ObjectType * fix: adjust typo * feat: leverage serde for conditional deser * fix: adjust HubTokenizerConfig after rebase * fix: update create_post_processor logic for token type * fix: adjust unwrap syntax in template * Fixing the post processor. --------- Co-authored-by: Nicolas Patry <[email protected]>
* fix: prefer enum for chat object * fix: adjust typo * fix: enum CompletionType not ObjectType * fix: adjust typo * feat: leverage serde for conditional deser * fix: adjust HubTokenizerConfig after rebase * fix: update create_post_processor logic for token type * fix: adjust unwrap syntax in template * Fixing the post processor. --------- Co-authored-by: Nicolas Patry <[email protected]>
* fix: prefer enum for chat object * fix: adjust typo * fix: enum CompletionType not ObjectType * fix: adjust typo * feat: leverage serde for conditional deser * fix: adjust HubTokenizerConfig after rebase * fix: update create_post_processor logic for token type * fix: adjust unwrap syntax in template * Fixing the post processor. --------- Co-authored-by: Nicolas Patry <[email protected]>
**updated
This PR prefers an enum that wraps
ChatCompletionChunk
andChatCompletion
and uses the serdetag
macro to set the correct object value.unfortunately serde does not provide a
rename_all = "dot.case"
serde-rs/serde#2269 so each value has an explicit renamethe following enum is used
**update again
This PR has moved to leveraging serde where possible to remove custom deserialization functions and overall make the types more readable and idiomatic