You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you so much for providing this work, it is very inspiring and we are keen to use the resources and compare other newly proposed metrics.
However, I am not quite sure if I understand the paper and data correctly.
It seems that in Table 3, you split each unreasonable samples into 4 categories while in your provided data, there is a score of a list of 5 integers for each generation of each model (which I assume is the overall score by 5 annotators?) but there is no label for each story would unreasonable type it should belong to.
I am not quite sure if I have missed the details here how you decide which story belongs to which error type?
Also when you mention that you set reasonable and unreasonable samples with binary labels 1 and 0 in Section 4.2, does that mean all reasonable samples are considered four times for each problem types?
Like, for ROC, you have 46 Reasonable Samples as 1 and 22 Unreasonable Samples as 0 for Rept and then
46 Reasonable Samples as 1 for Unrel again and 319 Unreasonable Samples as 0 for Unrel type.
Any illustration on this would be much appreciated.
Thank you in advance!
The text was updated successfully, but these errors were encountered:
Hi,
Thank you so much for providing this work, it is very inspiring and we are keen to use the resources and compare other newly proposed metrics.
However, I am not quite sure if I understand the paper and data correctly.
It seems that in Table 3, you split each unreasonable samples into 4 categories while in your provided data, there is a score of a list of 5 integers for each generation of each model (which I assume is the overall score by 5 annotators?) but there is no label for each story would unreasonable type it should belong to.
I am not quite sure if I have missed the details here how you decide which story belongs to which error type?
Also when you mention that you set reasonable and unreasonable samples with binary labels 1 and 0 in Section 4.2, does that mean all reasonable samples are considered four times for each problem types?
Like, for ROC, you have 46 Reasonable Samples as 1 and 22 Unreasonable Samples as 0 for Rept and then
46 Reasonable Samples as 1 for Unrel again and 319 Unreasonable Samples as 0 for Unrel type.
Any illustration on this would be much appreciated.
Thank you in advance!
The text was updated successfully, but these errors were encountered: