-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to Reproduce Results for Supervised Training on Echo Dataset with Mistral-7B-Instruct-v2 #154
Comments
I also have the same question. @ThonyPan : do you have the results on retrieval datasets, e.g., arguana and scifact? My local evaluation is ~15% lower than the number reported in the paper for llama3 8b instruct |
@ThonyPan May I ask if the split used is test or dev? |
I have tested the self-trained model on more datasets including some retrieval datasets. The result of SciFact is 74.05, with 4.81 decline while the result on ArguAna is 58.59, with 1.11 increment. The result I got is quite random compared to the reported one, with most of the subsets' result lower than expected. I’m not sure if this is caused by a package version issue or an oversight in dataset preprocessing. |
All the results are calculated on the test split of mteb dataset. |
Hi everyone, I have a similar issue when training the supervised Llama 3.1 based on the provided mntp version. I use the provided training config and only switched out the local path This is the training config I use on 8xA100:
|
I strongly believe the authors need to disclose their complete conda env, training configuration, and training logs, as the majority of people, including myself, have not been able to reproduce the results of the paper so far. |
@TianBaoGe @stefanhgm @ThonyPan - I completely agree. Unfortunately we could not do it earlier due to deadline rush and data purging rules of our University cluster. However, I have now started re-training to verify this issue and I am logging everything carefully. I will report back on the findings here. |
Hi @vaibhavad,
I’m currently working with an 8xA100 80G setup and attempting to replicate the supervised training process for the Mistral-7B-Instruct-v2 model as described. However, despite following the tutorial instructions, I haven’t been able to achieve the results reported in the paper on the echo dataset. Some of my outcomes are summarized in the table below.
I’ve noticed that in your data_loader code, there are references to files like allnli_split1.jsonl, which are not present in the echo-data that I downloaded. Could you clarify if further preprocessing was applied to the echo-data before training? If so, could you share details on the preprocessing steps? Alternatively, if there was no preprocessing involved, would it be possible for you to provide a Docker image version that can reproduce the reported results?
Environment Details:
• Model: Mistral-7B-Instruct-v2
• Hardware: 8xA100 80G GPUs
• Installed Version: 0.2.2 (installed via pip install -e . locally)
• Code: No modifications made to the original framework
Thank you very much for your help!
The text was updated successfully, but these errors were encountered: