Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: WIP: add dynamic cyrillic translit linguistic skill #768

Closed
wants to merge 1 commit into from

Conversation

booxter
Copy link
Contributor

@booxter booxter commented Apr 25, 2024

This is an attempt to add support for a Cyrillic-to-Latin transliteration skill that would also be language aware.

This is not validated and not working well with the default prompt defined in cli repo. (The prompt produces new instructions that are not retaining the transliteration scheme.)

Whether we need to use a model to generate variations of the generated instructions is itself a topic for exploration. (Perhaps we could live with feeding the seed samples generated by the skill directly into fine tuning.) But if so, additional changes may be required on cli side to e.g. allow to use modified (or completely new?) prompts if needed for a particular skill.

This is work-in-progress and is posted as a discussion starter.

If your PR is related to a contribution to the taxonomy, please, fill
out the following questionnaire. If not, replace this whole text and the
following questionnaire with whatever information is applicable to your PR.

Describe the contribution to the taxonomy

  • ...
  • ...
  • ...

Input given at the prompt

   ...

Response from the original model

  ...

Response from the fine-tuned model

  ...

Contribution checklist

  • The contribution was tested with ilab generate
  • No errors or warnings were produced by ilab generate
  • All commits are signed off (DCO)
  • The qna.yaml file contains at least 5 seed_examples
  • The qna.yaml file was linted and prettified (yaml-validator can do both)
  • An attribution.txt file in the same folder as the qna.yaml file
  • Content does not include PII or otherwise sensitive or confidential information
  • Content does not include anything documented in the project's Avoid these Topics guidelines

This is an attempt to add support for a Cyrillic-to-Latin
transliteration skill that would also be language aware.

This is not validated and not working well with the default prompt
defined in cli repo. (The prompt produces new instructions that are not
retaining the transliteration scheme.)

Whether we need to use a model to generate variations of the generated
instructions is itself a topic for exploration. (Perhaps we could live
with feeding the seed samples generated by the skill directly into
fine tuning.) But if so, additional changes may be required on `cli`
side to e.g. allow to use modified (or completely new?) prompts if
needed for a particular skill.

This is work-in-progress and is posted as a discussion starter.

Signed-off-by: Ihar Hrachyshka <[email protected]>
@github-actions github-actions bot added documentation Improvements or additions to documentation triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 25, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@jjasghar jjasghar removed the triage-needed (Auto labeled) skill is ready to be triaged label May 14, 2024
@bjhargrave
Copy link
Contributor

I don't think this is in plan for the overall InstructLab project. So I think we should close this here until such time as dynamic qna.yaml generation is planned.

@jjasghar
Copy link
Member

Agreed, this would actually probably better in the dev-doc repo too because this would require a much larger scope then just our taxonomy repository.

@jjasghar jjasghar closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation skill (Auto labeled)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants