-
Notifications
You must be signed in to change notification settings - Fork 178
P2M‐Self‐Distill
We present the P2M-Self-Distill project, aimed at enhancing our existing framework. This involves improving input generation, output annotation, quality evaluation, model execution, and training.
Create new inputs based on instructions and a selected dataset.
- Utilizes diverse prompts, incorporating examples from the dataset for relevance.
- Gradually generates inputs with higher temperature settings, a key focus.
- Importantly, it continuously adds new examples from selected dataset without fine-tuning.
Similar to traditional few-shot learning, the OutputAnnotator generates multiple outputs for each input.
- It will be continuously finetuned.
- Only finetune on new examples of each round/ finetune on the whole dataset/cut off low score examples and finetune on high-quality ones.
Evaluate/Score/Rank the input-output pairs for each specific input.
- Different ways of evaluation: “self-consistency as the score” and “model evaluation.”
- Different models as the evaluator for “model evaluation”: “base model itself,” “larger open-source model,” or even “ChatGPT evaluator” for the worst cases.
An advanced version incorporating Quantize.
Combine RLAIF and Lora.
-
Initialize Framework: Establish base and mock classes.
-
Implement New Components
-
Set Up Benchmarks: Create benchmarks to evaluate new system performance.
-
Set Up Contrast Experiments: Conduct experiments on factors like "temperature scaling" and "self-consistency vs. model evaluation", and so on.
Our dedicated team comprises:
- Vijay
- Chenyang
- Mingdao
- Sherry
- Graham
- TBD
This proposal outlines the ambitious goals of the P2M-Self-Distill project and the essential components driving its success.