Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the performance is very different to other paper? #4

Open
chiyuzhang94 opened this issue Dec 20, 2023 · 2 comments
Open

Why the performance is very different to other paper? #4

chiyuzhang94 opened this issue Dec 20, 2023 · 2 comments
Labels
question Further information is requested

Comments

@chiyuzhang94
Copy link

Hi Andreea,

I notice that the model performance reported in your paper is very different to the performance in original paper.
For example, MINER (Li et al. 2019) got AUC=69.61 on MIND-small dataset but your reported performance is only AUC=51.2.
Compared to other work reproduced MINER model, this performance is much lower than others. For example, this paper reported that their reproduced MINER model got AUC of 63.88.
In general, most GeneralRec models in your Table 1 got AUC < 52.00, which are largely different to the performance reported in other papers.
Could you give any comments on this?

@andreeaiana andreeaiana added the question Further information is requested label Dec 22, 2023
@andreeaiana
Copy link
Owner

andreeaiana commented Dec 22, 2023

Hi,

The data splits used in the other papers are most likely different than the one used by us. Neither the MINER paper, nor the one referenced by you explicitly mention which split of the MIND dataset they use, so I assume they used the test portion, without the publicly available labels. In contrast, as explained in our paper (Section 2.5), we use the MINDdev portion of the dataset as our test split, and further split the MINDtrain dataset into training and validation portions, respectively.

@chiyuzhang94
Copy link
Author

Hi,

Yes, I understand the different data split can lead to some variances but 10+ AUC differences is too large. The Dev and Test are come from same dataset and should not have dramatically shifting.
Have you verify the performance by running the official codes from the original paper (e.g., MINER) on your data splits?

Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jun 11, 2024
Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants