Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replicate Amazon131k-title Performance #13

Open
SvenStahlmann opened this issue Feb 21, 2023 · 7 comments
Open

replicate Amazon131k-title Performance #13

SvenStahlmann opened this issue Feb 21, 2023 · 7 comments

Comments

@SvenStahlmann
Copy link

Hello, can you give a short guide how to replicate your performance for the LF-AmazonTitles-131K dateset reported in the paper?

After running ./run_main.sh 0 SiameseXML++ LF-AmazonTitles-131K 0 108 i get the following results via the console:

Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24

which means the beste p@1 is 29.69%, am i correct? Did you use different config parameter in the paper or how can we reproduce the 41.42% p@1 in the paper?

@kunaldahiya
Copy link
Collaborator

Hi

Thanks for checking out, SiameseXML. Could you please share the (list of) files available in your data directory and the logs (saved in results dir). I'll take a look and see what's happening.

@SvenStahlmann
Copy link
Author

SvenStahlmann commented Feb 21, 2023

Thank you for your fast reply! Here you can see my data folder.

grafik

log_eval.txt does not have any p@1 values, however in the console i can see the following output:

...
Loading test data.
Fetching shortlist.
...
Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24 
...

Here are the training and eval logs in the results folder.
SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/log_eval.txt:

classifier
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08
shortlist
38.52,31.52,26.29,22.27,19.26
38.52,38.80,40.02,41.30,42.44
35.00,37.28,39.85,42.47,45.00
35.00,36.49,38.16,39.57,40.68
beta: 0.10
39.47,32.17,26.65,22.51,19.42
39.47,39.59,40.66,41.90,42.99
35.14,37.45,40.02,42.67,45.18
35.14,36.68,38.36,39.78,40.88
beta: 0.25
39.98,32.56,26.95,22.75,19.59
39.98,40.07,41.13,42.36,43.43
35.31,37.67,40.25,42.94,45.45
35.31,36.90,38.59,40.04,41.14
beta: 0.50
40.43,32.99,27.30,23.02,19.80
40.43,40.59,41.66,42.87,43.94
35.53,37.99,40.60,43.28,45.78
35.53,37.21,38.94,40.38,41.49
beta: 0.75
40.81,33.31,27.56,23.20,19.97
40.81,40.99,42.08,43.27,44.36
35.76,38.27,40.88,43.54,46.08
35.76,37.50,39.25,40.69,41.81
beta: 0.90
41.03,33.46,27.67,23.30,20.05
41.03,41.20,42.29,43.49,44.57
35.93,38.41,41.02,43.68,46.20
35.93,37.66,39.42,40.87,41.97
beta: 1.00
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08

------------------------------
Training time (sec): 2294.79
Model size (MB): 1161.09
Avg. Prediction time (msec): 1.79
------------------------------

SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/log_train.txt:

Loading training data.
Loading validation data.
Updating shortlist at epoch: 0
ANN train time: 869.65 sec
Epoch: 0, loss: 1.640305, time: 30.52 sec
Model saved after epoch: 0
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 36.48,30.14,25.22,21.46,18.61 (ens): 37.49,30.87,25.78,21.91,18.97 , loss: 1.558926, time: 14.12 sec
Epoch: 1, loss: 1.543295, time: 29.11 sec
Epoch: 2, loss: 1.476445, time: 30.52 sec
Epoch: 3, loss: 1.422906, time: 30.40 sec
Epoch: 4, loss: 1.379664, time: 31.10 sec
Epoch: 5, loss: 1.344310, time: 29.56 sec
Epoch: 6, loss: 1.315101, time: 31.20 sec
Epoch: 7, loss: 1.290727, time: 29.13 sec
Epoch: 8, loss: 1.270632, time: 29.50 sec
Epoch: 9, loss: 1.253221, time: 30.32 sec
Epoch: 10, loss: 1.238742, time: 30.53 sec
Model saved after epoch: 10
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 40.72,33.25,27.54,23.24,20.00 (ens): 39.41,32.28,26.77,22.64,19.53 , loss: 1.363372, time: 14.30 sec
Epoch: 11, loss: 1.225710, time: 31.28 sec
Epoch: 12, loss: 1.214930, time: 30.91 sec
Epoch: 13, loss: 1.205106, time: 30.88 sec
Adjusted learning rate to: 0.0001
Epoch: 14, loss: 1.190397, time: 31.23 sec
Epoch: 15, loss: 1.187621, time: 30.50 sec
Epoch: 16, loss: 1.184311, time: 30.30 sec
Epoch: 17, loss: 1.181125, time: 30.34 sec
Epoch: 18, loss: 1.178006, time: 30.19 sec
Epoch: 19, loss: 1.175337, time: 30.52 sec
Training time: 608.03 sec, Validation time: 28.42 sec, Shortlist time: 869.65 sec, Model size: 1161.09 MB
Saving model at: /home/sstahlmann/projects/SiameseXML/models/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/model_network.pkl

@SvenStahlmann
Copy link
Author

Do you have any updates?

@kunaldahiya
Copy link
Collaborator

Hi

Yes, I checked it out and the final results are fine - please see the numbers printed at the end (on terminal) or the log_eval.txt file. It reports numbers for different values of beta as different blocks. The four rows (in a block) mean - precision@{1-5}, nDCG@{1-5}, propensity scored precision@{1-5}, and propensity scored nDCG@{1-5}. So, the precision is 40%+. Moreover, the paper reports results with 512D embeddings which also provides a small boost.

The intermediate numbers (with 29% P@1) do not consider the filter file - will fix it. Thanks for pointing this out.

Apologies for the delay - I am travelling for a conference.

@pranjalks
Copy link

pranjalks commented Mar 12, 2024

Thank you for your fast reply! Here you can see my data folder.

grafik

log_eval.txt does not have any p@1 values, however in the console i can see the following output:

...
Loading test data.
Fetching shortlist.
...
Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24 
...

Here are the training and eval logs in the results folder. SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/log_eval.txt:

classifier
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08
shortlist
38.52,31.52,26.29,22.27,19.26
38.52,38.80,40.02,41.30,42.44
35.00,37.28,39.85,42.47,45.00
35.00,36.49,38.16,39.57,40.68
beta: 0.10
39.47,32.17,26.65,22.51,19.42
39.47,39.59,40.66,41.90,42.99
35.14,37.45,40.02,42.67,45.18
35.14,36.68,38.36,39.78,40.88
beta: 0.25
39.98,32.56,26.95,22.75,19.59
39.98,40.07,41.13,42.36,43.43
35.31,37.67,40.25,42.94,45.45
35.31,36.90,38.59,40.04,41.14
beta: 0.50
40.43,32.99,27.30,23.02,19.80
40.43,40.59,41.66,42.87,43.94
35.53,37.99,40.60,43.28,45.78
35.53,37.21,38.94,40.38,41.49
beta: 0.75
40.81,33.31,27.56,23.20,19.97
40.81,40.99,42.08,43.27,44.36
35.76,38.27,40.88,43.54,46.08
35.76,37.50,39.25,40.69,41.81
beta: 0.90
41.03,33.46,27.67,23.30,20.05
41.03,41.20,42.29,43.49,44.57
35.93,38.41,41.02,43.68,46.20
35.93,37.66,39.42,40.87,41.97
beta: 1.00
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08

------------------------------
Training time (sec): 2294.79
Model size (MB): 1161.09
Avg. Prediction time (msec): 1.79
------------------------------

SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/log_train.txt:

Loading training data.
Loading validation data.
Updating shortlist at epoch: 0
ANN train time: 869.65 sec
Epoch: 0, loss: 1.640305, time: 30.52 sec
Model saved after epoch: 0
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 36.48,30.14,25.22,21.46,18.61 (ens): 37.49,30.87,25.78,21.91,18.97 , loss: 1.558926, time: 14.12 sec
Epoch: 1, loss: 1.543295, time: 29.11 sec
Epoch: 2, loss: 1.476445, time: 30.52 sec
Epoch: 3, loss: 1.422906, time: 30.40 sec
Epoch: 4, loss: 1.379664, time: 31.10 sec
Epoch: 5, loss: 1.344310, time: 29.56 sec
Epoch: 6, loss: 1.315101, time: 31.20 sec
Epoch: 7, loss: 1.290727, time: 29.13 sec
Epoch: 8, loss: 1.270632, time: 29.50 sec
Epoch: 9, loss: 1.253221, time: 30.32 sec
Epoch: 10, loss: 1.238742, time: 30.53 sec
Model saved after epoch: 10
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 40.72,33.25,27.54,23.24,20.00 (ens): 39.41,32.28,26.77,22.64,19.53 , loss: 1.363372, time: 14.30 sec
Epoch: 11, loss: 1.225710, time: 31.28 sec
Epoch: 12, loss: 1.214930, time: 30.91 sec
Epoch: 13, loss: 1.205106, time: 30.88 sec
Adjusted learning rate to: 0.0001
Epoch: 14, loss: 1.190397, time: 31.23 sec
Epoch: 15, loss: 1.187621, time: 30.50 sec
Epoch: 16, loss: 1.184311, time: 30.30 sec
Epoch: 17, loss: 1.181125, time: 30.34 sec
Epoch: 18, loss: 1.178006, time: 30.19 sec
Epoch: 19, loss: 1.175337, time: 30.52 sec
Training time: 608.03 sec, Validation time: 28.42 sec, Shortlist time: 869.65 sec, Model size: 1161.09 MB
Saving model at: /home/sstahlmann/projects/SiameseXML/models/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/model_network.pkl

Hi @SvenStahlmann, can you please help me understand from where to get the following files:

  1. data_stats.json
  2. surrogate_mapping.txt
  3. valid_labels.txt

@kunaldahiya
Copy link
Collaborator

Hi @pranjalks

These files are auto generated by the code. Can you please copy-paste any error you might be facing?

@pranjalks
Copy link

Hi @pranjalks

These files are auto generated by the code. Can you please copy-paste any error you might be facing?

Hi @kunaldahiya,

I was able to run the code. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants