Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat Template support for function calling and RAG #30621

Merged
merged 69 commits into from
Jun 11, 2024

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented May 2, 2024

This PR updates our support of chat templates to cover tool-use and RAG use-cases. Specifically, it does the following:

  • Defines a recommended JSON schema spec for tool use
  • Adds tools and documents kwargs to apply_chat_template, with accompanying docstrings
  • Updates the chat template documentation to match
  • Adds a get_json_schema function to automatically generate a schema from a Python function
  • Update apply_chat_template to automatically convert passed functions to JSON schemas

TODO:

  • Do we want to support Tuple?
  • In some cases, the helper generates a more complex anyOf when a simple list of types would be okay.
  • Community feedback before we make this a standard!
  • Fix up doc examples and docstring to match the newest format
  • Can we do an example template using return values?
  • Get a working Command-R template
  • Get a working Hermes-2-Pro template
  • Get a working Mixtral-8x22B template

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1 Rocketknight1 force-pushed the new_chat_template_args branch 2 times, most recently from 295b89d to a94f2d7 Compare May 9, 2024 17:58
@Rocketknight1 Rocketknight1 marked this pull request as ready for review May 9, 2024 18:16
@TheodoreGalanos
Copy link

RAG documents are defined by title and contents keys - is there anything else I'm missing?

If possible I would allow a metadata tag for RAG that can support an open ended dictionary.

It is highly unlikely production RAG systems will only use content and title, but at the same time it's hard to predict what metadata people might use.

I'm not actually sure this is possible or desirable ofc from the pov of a chat template that is.

@Rocketknight1
Copy link
Member Author

@TheodoreGalanos That's a good point, yes, but I'm not sure we can include it in the chat template tooling itself! Basically, the goal of this PR is to standardize things like tool and document definitions that are going to be universal for tool use and RAG LLMs. Document metadata doesn't really have a standard that we can support yet.

That said, if we start seeing LLMs whose templates support it, we might consider making it part of the tooling standard.

@Rocketknight1 Rocketknight1 requested a review from amyeroberts May 16, 2024 19:07
@Rocketknight1
Copy link
Member Author

Cancel this review request by the way - I realized I'll need to do some major refactors to support some of the existing tool use templates out there!

@amyeroberts
Copy link
Collaborator

@Rocketknight1 OK, just ping when you want me to look at it again

@Rocketknight1
Copy link
Member Author

@amyeroberts this should be ready for review now! I still have to write the Mixtral template, but I think the core code here is ready.

@Rocketknight1
Copy link
Member Author

cc @osanseviero - I thought about our chat yesterday, and I removed the decorator entirely. You're right that they can seem like confusing magic for users.

Instead, the new API lets you simply pass functions in the tools argument, with no decorator or anything. apply_chat_template will convert them into JSON schemas for you. You can also pass JSON schemas directly to the argument instead.

@Rocketknight1 Rocketknight1 force-pushed the new_chat_template_args branch from 9ec58d3 to 1c1f5ba Compare May 23, 2024 14:06
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and the extensive tests!

I'm a bit confused about the boundaries here of what's defined, what's in scope and the actual need for the tool descriptions. For example, the JSON schema doesn't include things like defaults (nor the functionality itself), so isn't a complete function definiton. So the function needs to be defined, and the JSON schema describes it? What if a schema is passed in for a function that isn't defined in scope? What if the name of a function is found but parts of the schema don't match e.g. the signature?

How are they actually used in apply_chat_template? I'm assuming these are needed to create the jinja templates? Is there a limit to what these functions can and can't do or return? Can the functions depend on one another? Why would we define these functions? For example, in the doc example with current_time, how would this be used in the chat template?


Yes, you can! The only argument that `apply_chat_template` requires is `messages`. However, you can pass any keyword
argument to `apply_chat_template` and it will be accessible inside the template. This gives you a lot of freedom to use
chat templates for many things. There are no restrictions on the names or the format of these arguments - you can pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No restrictions 😬

src/transformers/utils/chat_template_utils.py Show resolved Hide resolved
@@ -233,6 +233,178 @@ The sun.</s>

From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.

## Can I pass other arguments to the chat template?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better structured as ## Advanced features or something like that, and then detailing additional arguments, tools and documents in their own subsections. For three reasons:

  • It makes the page easier to navigate when scrolling
  • It's easier to search and find in the documentation when using the doc's search (it will find the headers before the internal text from my experience)
  • This will make more sense in the rendered contents summary on the side of the docs page

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Refactored the docs to put the recommended workflow up front, and to have more searchable headers

src/transformers/tokenization_utils_base.py Outdated Show resolved Hide resolved
tests/utils/test_chat_template_utils.py Show resolved Hide resolved
Comment on lines 228 to 232
if ... in get_args(hint):
raise ValueError(
"'...' is not supported in Tuple type hints. Use List[] types for variable-length"
" inputs instead."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of is though

In [1]: from typing import Tuple

In [2]: Tuple[int, int, ...]
Out[2]: typing.Tuple[int, int, ...]

Do we need to assert this within the JSON schema? If not, then I'd just let the users' environment control how they can or can't use the objects in typing

Copy link
Member Author

@Rocketknight1 Rocketknight1 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should be clearer with the error message! ... is a valid type hint, but we don't support autoconverting it to JSON schema because it can get very messy outside of the simple cases (e.g. Tuple[str, int, ...] requires a combination of items and prefixItems that no existing model templates support). I'll reword!

Comment on lines 218 to 227
if len(get_args(hint)) == 1:
raise ValueError(
"Tuple type hints should only be used when the argument has a fixed length and each "
f"element has a specific type. The hint {hint} indicates a Tuple of length 1. "
"This should be replaced with an unwrapped type hint instead like "
f"{get_args(hint)[0]}. Alternatively, if the "
"function can actually take a tuple with multiple elements, please either indicate "
f"each element type (e.g. Tuple[{get_args(hint)[0]}, {get_args(hint)[0]}]), "
f"or if the input can be variable length, use List[{get_args(hint)[0]}] instead."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a requirement of the JSON schema? I might want to be able to pass in a tuple of length 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, and we could support this! However, I suspect strongly that 99.9% of the time someone writes Tuple[str] as a type hint, they don't actually mean a tuple of length 1, and they actually meant "a tuple or list containing one or more strings".

I see this as a somewhat forceful nudge away from highly-likely-to-be-incorrect hints, as well as towards simpler function defs. However, we can remove the error and support this if you want!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I reworded the error to be a bit clearer)

src/transformers/utils/chat_template_utils.py Outdated Show resolved Hide resolved
In addition, if you pass functions in the `tools` argument, they will automatically be converted with this function:

```python
import datetime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question - but I'm assuming that this relies on all dependencies e.g. datetime to be in the global scope when calling apply_chat_template?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! apply_chat_template will never actually call the functions you pass it. It will only extract the signatures, type hints, and docstrings.

self.assertEqual(schema, expected_schema)

def test_multiple_complex_arguments(self):
def fn(x: List[Union[int, float]], y: Optional[Union[int, str]] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still detect nullable if it's optional but has a non-None default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Our logic handles the default values and type hint totally separately. Optional[type] is treated like Union[None, type]. The type hint only affects the nullable parameter.

Meanwhile, the default value is only used to set the required parameter. So an argument is nullable iff NoneType is a valid input type, and required iff it has no default value.

@Rocketknight1 Rocketknight1 force-pushed the new_chat_template_args branch 2 times, most recently from f6fcf04 to b93da8c Compare May 24, 2024 16:07
@Rocketknight1
Copy link
Member Author

@amyeroberts all comments should be addressed now! Failing tests are unrelated, I'm hoping a rebase fixes them soon.

@Rocketknight1 Rocketknight1 force-pushed the new_chat_template_args branch from b93da8c to a019b78 Compare May 27, 2024 12:05
@LysandreJik
Copy link
Member

I'd be interested in having @molbap's review as well!

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a preliminary review, I think I'll come back to read more of it later - thanks for writing the tests, they helped!

src/transformers/utils/chat_template_utils.py Show resolved Hide resolved
src/transformers/utils/chat_template_utils.py Show resolved Hide resolved
src/transformers/utils/chat_template_utils.py Outdated Show resolved Hide resolved
Comment on lines 22 to 37
description_re = re.compile(r"^(.*?)[\n\s]*(Args:|Returns:|Raises:|\Z)", re.DOTALL)
args_re = re.compile(r"\n\s*Args:\n\s*(.*?)[\n\s]*(Returns:|Raises:|\Z)", re.DOTALL)
args_split_re = re.compile(r"(?:^|\n)\s*(\w+)\s*(?:\([\w\s\[\],.*]+\))?:\s*(.*?)\s*(?=\n\s*\w|\Z)", re.DOTALL)
returns_re = re.compile(r"\n\s*Returns:\n\s*(.*?)[\n\s]*(Raises:|\Z)", re.DOTALL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On here I'd suggest commenting a bit the regexes, and give more explicit names perhaps - the third one is not trivial to read for me, re.VERBOSE would make it nicer!

# match the description of a function in google-style docstring
description_re = re.compile(r"^(.*?)[\n\s]*(Args:|Returns:|Raises:|\Z)", re.DOTALL)
# match the arguments after Args and before Returns, Raises or end of string
args_re = re.compile(r"\n\s*Args:\n\s*(.*?)[\n\s]*(Returns:|Raises:|\Z)", re.DOTALL)
args_split_re = re.compile(
r"""
...
(?:\(([^)]+)\))?    # match type hint in parentheses capturing content inside
...
""", 
re.DOTALL | re.VERBOSE
)
# matches returned content 
returns_re = re.compile(r"\n\s*Returns:\n\s*(.*?)[\n\s]*(Raises:|\Z)", re.DOTALL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! I simplified the regex a lot by removing support for type hints in the docstring (like x (str): some input). We always discarded these anyway, but it made the regex a lot more complicated and created a few bugs that the tests weren't picking up.

src/transformers/utils/chat_template_utils.py Outdated Show resolved Hide resolved
@CISC
Copy link
Contributor

CISC commented May 28, 2024

@Rocketknight1 This looks great, however I think it might be worth making the tools parameter identical to OpenAI, ie. move the schema down a level (inside function) and have type and function at the top level. They did this to be able to support different kinds of tools in the future.

You can easily extract and map the available functions in the template using tools|selectattr('type','eq','function')|map(attribute='function') like I do in Mistral-7B-Instruct-v0.3-SOTA-GGUF.

@Rocketknight1
Copy link
Member Author

Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.

@CISC
Copy link
Contributor

CISC commented May 28, 2024

Hi @CISC, yes, we considered that! We felt that it just added extra complexity, though, and noticed that Anthropic skips that extra layer for tool-use in Claude too. I suspect that later if we have "tools" that are not callable functions, we'll probably just put them in a separate argument.

TBH though that means it should be functions and not tools.

@Rocketknight1
Copy link
Member Author

@CISC Are there any examples of 'tools' that aren't functions out there? I'm curious about what those would look like!

@Rocketknight1 Rocketknight1 force-pushed the new_chat_template_args branch from 1251411 to 0ab8c7f Compare June 11, 2024 12:43
@Rocketknight1 Rocketknight1 merged commit edc1dff into main Jun 11, 2024
21 checks passed
@Rocketknight1 Rocketknight1 deleted the new_chat_template_args branch June 11, 2024 14:46
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 14, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
itazap pushed a commit that referenced this pull request Jun 17, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
itazap pushed a commit that referenced this pull request Jun 18, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
itazap pushed a commit that referenced this pull request Jun 20, 2024
* First draft, still missing automatic function conversion

* First draft of the automatic schema generator

* Lots of small fixes

* the walrus has betrayed me

* please stop committing your debug breakpoints

* Lots of cleanup and edge cases, looking better now

* Comments and bugfixes for the type hint parser

* More cleanup

* Add tests, update schema generator

* Update tests, proper handling of return values

* Small docstring change

* More doc updates

* More doc updates

* Add json_schema decorator

* Clean up the TODOs and finish the docs

* self.maxDiff = None to see the whole diff for the nested list test

* add import for add_json_schema

* Quick test fix

* Fix something that was bugging me in the chat template docstring

* Less "anyOf" when unnecessary

* Support return types for the templates that need them

* Proper return type tests

* Switch to Google format docstrings

* Update chat templating docs to match new format

* Stop putting the return type in with the other parameters

* Add Tuple support

* No more decorator - we just do it implicitly!

* Add enum support to get_json_schema

* Update docstring

* Add copyright header

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/chat_templating.md

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: amyeroberts <[email protected]>

* Add copyright header

* make fixup

* Fix indentation

* Reformat chat_template_utils

* Correct return value

* Make regexes module-level

* Support more complex, multi-line arg docstrings

* Update error message for ...

* Update ruff

* Add document type validation

* Refactor docs

* Refactor docs

* Refactor docs

* Clean up Tuple error

* Add an extra test for very complex defs and docstrings and clean everything up for it

* Document enum block

* Quick test fixes

* Stop supporting type hints in docstring to fix bugs and simplify the regex

* Update docs for the regex change

* Clean up enum regex

* Wrap functions in {"type": "function", "function": ...}

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Pablo Montalvo <[email protected]>

* Temporary tool calling commit

* Add type hints to chat template utils, partially update docs (incomplete!)

* Code cleanup based on @molbap's suggestion

* Add comments to explain regexes

* Fix up type parsing for unions and lists

* Add custom exception types and adjust tests to look for them

* Update docs with a demo!

* Docs cleanup

* Pass content as string

* Update tool call formatting

* Update docs with new function format

* Update docs

* Update docs with a second tool to show the model choosing correctly

---------

Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants