replicate Amazon131k-title Performance #13

SvenStahlmann · 2023-02-21T13:18:42Z

Hello, can you give a short guide how to replicate your performance for the LF-AmazonTitles-131K dateset reported in the paper?

After running ./run_main.sh 0 SiameseXML++ LF-AmazonTitles-131K 0 108 i get the following results via the console:

Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24

which means the beste p@1 is 29.69%, am i correct? Did you use different config parameter in the paper or how can we reproduce the 41.42% p@1 in the paper?

The text was updated successfully, but these errors were encountered:

kunaldahiya · 2023-02-21T14:10:13Z

Hi

Thanks for checking out, SiameseXML. Could you please share the (list of) files available in your data directory and the logs (saved in results dir). I'll take a look and see what's happening.

SvenStahlmann · 2023-02-21T14:34:23Z

Thank you for your fast reply! Here you can see my data folder.

log_eval.txt does not have any p@1 values, however in the console i can see the following output:

...
Loading test data.
Fetching shortlist.
...
Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24 
...

Here are the training and eval logs in the results folder.
SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/log_eval.txt:

classifier
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08
shortlist
38.52,31.52,26.29,22.27,19.26
38.52,38.80,40.02,41.30,42.44
35.00,37.28,39.85,42.47,45.00
35.00,36.49,38.16,39.57,40.68
beta: 0.10
39.47,32.17,26.65,22.51,19.42
39.47,39.59,40.66,41.90,42.99
35.14,37.45,40.02,42.67,45.18
35.14,36.68,38.36,39.78,40.88
beta: 0.25
39.98,32.56,26.95,22.75,19.59
39.98,40.07,41.13,42.36,43.43
35.31,37.67,40.25,42.94,45.45
35.31,36.90,38.59,40.04,41.14
beta: 0.50
40.43,32.99,27.30,23.02,19.80
40.43,40.59,41.66,42.87,43.94
35.53,37.99,40.60,43.28,45.78
35.53,37.21,38.94,40.38,41.49
beta: 0.75
40.81,33.31,27.56,23.20,19.97
40.81,40.99,42.08,43.27,44.36
35.76,38.27,40.88,43.54,46.08
35.76,37.50,39.25,40.69,41.81
beta: 0.90
41.03,33.46,27.67,23.30,20.05
41.03,41.20,42.29,43.49,44.57
35.93,38.41,41.02,43.68,46.20
35.93,37.66,39.42,40.87,41.97
beta: 1.00
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08

------------------------------
Training time (sec): 2294.79
Model size (MB): 1161.09
Avg. Prediction time (msec): 1.79
------------------------------

SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/log_train.txt:

Loading training data.
Loading validation data.
Updating shortlist at epoch: 0
ANN train time: 869.65 sec
Epoch: 0, loss: 1.640305, time: 30.52 sec
Model saved after epoch: 0
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 36.48,30.14,25.22,21.46,18.61 (ens): 37.49,30.87,25.78,21.91,18.97 , loss: 1.558926, time: 14.12 sec
Epoch: 1, loss: 1.543295, time: 29.11 sec
Epoch: 2, loss: 1.476445, time: 30.52 sec
Epoch: 3, loss: 1.422906, time: 30.40 sec
Epoch: 4, loss: 1.379664, time: 31.10 sec
Epoch: 5, loss: 1.344310, time: 29.56 sec
Epoch: 6, loss: 1.315101, time: 31.20 sec
Epoch: 7, loss: 1.290727, time: 29.13 sec
Epoch: 8, loss: 1.270632, time: 29.50 sec
Epoch: 9, loss: 1.253221, time: 30.32 sec
Epoch: 10, loss: 1.238742, time: 30.53 sec
Model saved after epoch: 10
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 40.72,33.25,27.54,23.24,20.00 (ens): 39.41,32.28,26.77,22.64,19.53 , loss: 1.363372, time: 14.30 sec
Epoch: 11, loss: 1.225710, time: 31.28 sec
Epoch: 12, loss: 1.214930, time: 30.91 sec
Epoch: 13, loss: 1.205106, time: 30.88 sec
Adjusted learning rate to: 0.0001
Epoch: 14, loss: 1.190397, time: 31.23 sec
Epoch: 15, loss: 1.187621, time: 30.50 sec
Epoch: 16, loss: 1.184311, time: 30.30 sec
Epoch: 17, loss: 1.181125, time: 30.34 sec
Epoch: 18, loss: 1.178006, time: 30.19 sec
Epoch: 19, loss: 1.175337, time: 30.52 sec
Training time: 608.03 sec, Validation time: 28.42 sec, Shortlist time: 869.65 sec, Model size: 1161.09 MB
Saving model at: /home/sstahlmann/projects/SiameseXML/models/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/model_network.pkl

SvenStahlmann · 2023-02-28T07:21:49Z

Do you have any updates?

kunaldahiya · 2023-03-01T08:58:05Z

Hi

Yes, I checked it out and the final results are fine - please see the numbers printed at the end (on terminal) or the log_eval.txt file. It reports numbers for different values of beta as different blocks. The four rows (in a block) mean - precision@{1-5}, nDCG@{1-5}, propensity scored precision@{1-5}, and propensity scored nDCG@{1-5}. So, the precision is 40%+. Moreover, the paper reports results with 512D embeddings which also provides a small boost.

The intermediate numbers (with 29% P@1) do not consider the filter file - will fix it. Thanks for pointing this out.

Apologies for the delay - I am travelling for a conference.

pranjalks · 2024-03-12T20:37:36Z

Thank you for your fast reply! Here you can see my data folder.

log_eval.txt does not have any p@1 values, however in the console i can see the following output:

...
Loading test data.
Fetching shortlist.
...
Prediction time (total): 241.77 sec., Prediction time (per sample): 1.79 msec., P@k(%): (knn): 27.16,26.04,23.41,20.68,18.37 (clf): 29.69,28.08,24.90,21.82,19.24 (ens): 29.69,28.08,24.90,21.82,19.24 
...

Here are the training and eval logs in the results folder. SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/log_eval.txt:

classifier
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08
shortlist
38.52,31.52,26.29,22.27,19.26
38.52,38.80,40.02,41.30,42.44
35.00,37.28,39.85,42.47,45.00
35.00,36.49,38.16,39.57,40.68
beta: 0.10
39.47,32.17,26.65,22.51,19.42
39.47,39.59,40.66,41.90,42.99
35.14,37.45,40.02,42.67,45.18
35.14,36.68,38.36,39.78,40.88
beta: 0.25
39.98,32.56,26.95,22.75,19.59
39.98,40.07,41.13,42.36,43.43
35.31,37.67,40.25,42.94,45.45
35.31,36.90,38.59,40.04,41.14
beta: 0.50
40.43,32.99,27.30,23.02,19.80
40.43,40.59,41.66,42.87,43.94
35.53,37.99,40.60,43.28,45.78
35.53,37.21,38.94,40.38,41.49
beta: 0.75
40.81,33.31,27.56,23.20,19.97
40.81,40.99,42.08,43.27,44.36
35.76,38.27,40.88,43.54,46.08
35.76,37.50,39.25,40.69,41.81
beta: 0.90
41.03,33.46,27.67,23.30,20.05
41.03,41.20,42.29,43.49,44.57
35.93,38.41,41.02,43.68,46.20
35.93,37.66,39.42,40.87,41.97
beta: 1.00
41.12,33.55,27.76,23.37,20.09
41.12,41.31,42.42,43.64,44.70
36.01,38.49,41.11,43.80,46.28
36.01,37.75,39.53,40.99,42.08

------------------------------
Training time (sec): 2294.79
Model size (MB): 1161.09
Avg. Prediction time (msec): 1.79
------------------------------

SiameseXML/results/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/log_train.txt:

Loading training data.
Loading validation data.
Updating shortlist at epoch: 0
ANN train time: 869.65 sec
Epoch: 0, loss: 1.640305, time: 30.52 sec
Model saved after epoch: 0
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 36.48,30.14,25.22,21.46,18.61 (ens): 37.49,30.87,25.78,21.91,18.97 , loss: 1.558926, time: 14.12 sec
Epoch: 1, loss: 1.543295, time: 29.11 sec
Epoch: 2, loss: 1.476445, time: 30.52 sec
Epoch: 3, loss: 1.422906, time: 30.40 sec
Epoch: 4, loss: 1.379664, time: 31.10 sec
Epoch: 5, loss: 1.344310, time: 29.56 sec
Epoch: 6, loss: 1.315101, time: 31.20 sec
Epoch: 7, loss: 1.290727, time: 29.13 sec
Epoch: 8, loss: 1.270632, time: 29.50 sec
Epoch: 9, loss: 1.253221, time: 30.32 sec
Epoch: 10, loss: 1.238742, time: 30.53 sec
Model saved after epoch: 10
P@k (knn): 37.84,30.98,25.86,21.98,19.03 (clf): 40.72,33.25,27.54,23.24,20.00 (ens): 39.41,32.28,26.77,22.64,19.53 , loss: 1.363372, time: 14.30 sec
Epoch: 11, loss: 1.225710, time: 31.28 sec
Epoch: 12, loss: 1.214930, time: 30.91 sec
Epoch: 13, loss: 1.205106, time: 30.88 sec
Adjusted learning rate to: 0.0001
Epoch: 14, loss: 1.190397, time: 31.23 sec
Epoch: 15, loss: 1.187621, time: 30.50 sec
Epoch: 16, loss: 1.184311, time: 30.30 sec
Epoch: 17, loss: 1.181125, time: 30.34 sec
Epoch: 18, loss: 1.178006, time: 30.19 sec
Epoch: 19, loss: 1.175337, time: 30.52 sec
Training time: 608.03 sec, Validation time: 28.42 sec, Shortlist time: 869.65 sec, Model size: 1161.09 MB
Saving model at: /home/sstahlmann/projects/SiameseXML/models/SiameseXML++/Astec/LF-AmazonTitles-131K/v_0_108/extreme/model_network.pkl

Hi @SvenStahlmann, can you please help me understand from where to get the following files:

data_stats.json
surrogate_mapping.txt
valid_labels.txt

kunaldahiya · 2024-03-13T18:47:37Z

Hi @pranjalks

These files are auto generated by the code. Can you please copy-paste any error you might be facing?

pranjalks · 2024-03-16T20:15:42Z

Hi @pranjalks

These files are auto generated by the code. Can you please copy-paste any error you might be facing?

Hi @kunaldahiya,

I was able to run the code. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replicate Amazon131k-title Performance #13

replicate Amazon131k-title Performance #13

SvenStahlmann commented Feb 21, 2023

kunaldahiya commented Feb 21, 2023

SvenStahlmann commented Feb 21, 2023 •

edited

Loading

SvenStahlmann commented Feb 28, 2023

kunaldahiya commented Mar 1, 2023

pranjalks commented Mar 12, 2024 •

edited

Loading

kunaldahiya commented Mar 13, 2024

pranjalks commented Mar 16, 2024

replicate Amazon131k-title Performance #13

replicate Amazon131k-title Performance #13

Comments

SvenStahlmann commented Feb 21, 2023

kunaldahiya commented Feb 21, 2023

SvenStahlmann commented Feb 21, 2023 • edited Loading

SvenStahlmann commented Feb 28, 2023

kunaldahiya commented Mar 1, 2023

pranjalks commented Mar 12, 2024 • edited Loading

kunaldahiya commented Mar 13, 2024

pranjalks commented Mar 16, 2024

SvenStahlmann commented Feb 21, 2023 •

edited

Loading

pranjalks commented Mar 12, 2024 •

edited

Loading