Training Data? #2

shailja-thakur · 2023-09-30T01:07:21Z

Could you also please share the training data used in fine-tuning codegen models?

luke-avionics · 2023-10-12T06:15:40Z

I think as stated in their paper they use the dataset in this (it's not released when they publish their paper, so they might reproduce the data collection) paper: https://huggingface.co/datasets/shailja/Verilog_GitHub ... which might be from you? XD

There is some processing and filtering though ... I can not get the number of samples after the processing to 8502 as stated in their paper, which maybe due to the difference in raw data and pre-processing steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Data? #2

Training Data? #2

shailja-thakur commented Sep 30, 2023

luke-avionics commented Oct 12, 2023 •

edited

Loading

Training Data? #2

Training Data? #2

Comments

shailja-thakur commented Sep 30, 2023

luke-avionics commented Oct 12, 2023 • edited Loading

luke-avionics commented Oct 12, 2023 •

edited

Loading