RFC: WIP: add dynamic cyrillic translit linguistic skill #768

booxter · 2024-04-25T19:12:43Z

This is an attempt to add support for a Cyrillic-to-Latin transliteration skill that would also be language aware.

This is not validated and not working well with the default prompt defined in cli repo. (The prompt produces new instructions that are not retaining the transliteration scheme.)

Whether we need to use a model to generate variations of the generated instructions is itself a topic for exploration. (Perhaps we could live with feeding the seed samples generated by the skill directly into fine tuning.) But if so, additional changes may be required on cli side to e.g. allow to use modified (or completely new?) prompts if needed for a particular skill.

This is work-in-progress and is posted as a discussion starter.

If your PR is related to a contribution to the taxonomy, please, fill
out the following questionnaire. If not, replace this whole text and the
following questionnaire with whatever information is applicable to your PR.

Describe the contribution to the taxonomy

...
...
...

Input given at the prompt

...

Response from the original model

...

Response from the fine-tuned model

...

Contribution checklist

The contribution was tested with ilab generate
No errors or warnings were produced by ilab generate
All commits are signed off (DCO)
The qna.yaml file contains at least 5 seed_examples
The qna.yaml file was linted and prettified (yaml-validator can do both)
An attribution.txt file in the same folder as the qna.yaml file
Content does not include PII or otherwise sensitive or confidential information
Content does not include anything documented in the project's Avoid these Topics guidelines

This is an attempt to add support for a Cyrillic-to-Latin transliteration skill that would also be language aware. This is not validated and not working well with the default prompt defined in cli repo. (The prompt produces new instructions that are not retaining the transliteration scheme.) Whether we need to use a model to generate variations of the generated instructions is itself a topic for exploration. (Perhaps we could live with feeding the seed samples generated by the skill directly into fine tuning.) But if so, additional changes may be required on `cli` side to e.g. allow to use modified (or completely new?) prompts if needed for a particular skill. This is work-in-progress and is posted as a discussion starter. Signed-off-by: Ihar Hrachyshka <[email protected]>

instruct-lab-bot · 2024-04-25T19:12:55Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

bjhargrave · 2024-08-20T14:43:14Z

I don't think this is in plan for the overall InstructLab project. So I think we should close this here until such time as dynamic qna.yaml generation is planned.

jjasghar · 2024-08-20T16:09:08Z

Agreed, this would actually probably better in the dev-doc repo too because this would require a much larger scope then just our taxonomy repository.

github-actions bot added documentation Improvements or additions to documentation triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 25, 2024

booxter mentioned this pull request Apr 25, 2024

Support Dynamic taxonomies (QNA document generators) instructlab/ui#140

Open

jjasghar removed the triage-needed (Auto labeled) skill is ready to be triaged label May 14, 2024

jjasghar closed this Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: WIP: add dynamic cyrillic translit linguistic skill #768

RFC: WIP: add dynamic cyrillic translit linguistic skill #768

booxter commented Apr 25, 2024

instruct-lab-bot bot commented Apr 25, 2024

bjhargrave commented Aug 20, 2024

jjasghar commented Aug 20, 2024

RFC: WIP: add dynamic cyrillic translit linguistic skill #768

RFC: WIP: add dynamic cyrillic translit linguistic skill #768

Conversation

booxter commented Apr 25, 2024

instruct-lab-bot bot commented Apr 25, 2024

bjhargrave commented Aug 20, 2024

jjasghar commented Aug 20, 2024