-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication on Quora dataset? #2
Comments
At first sight, I would say it is a memory problem, for such larger dataset there is not enough memory in the computer. |
I tried with smaller sample of Quora dataset (24k/6k/1k) and it still craches with the same error Parameter 8 to routine SGEMM NTCSGEMV SGER was incorrect |
Does it run with the datasets we provide? |
It runs with the provided datasets. I also installed the requirements. These the packages installed in my env: beautifulsoup4==4.5.3 And this is the dataset I'm trying to run it on: https://drive.google.com/open?id=1-TV22E2ZY-NqGHIYiFa5r1eF6bWOs1ar I generated my own quora.w2v with the following command: Any clue on why I am getting the error? |
Using a different dataset and embeddings falls outside the scope of this repository and our work. Nevertheless, if it runs with the provided dataset and embeddings I would say the problem must be the new dataset or the new embeddings. There are some issues with the train.tsv you provided:
I manage to train with 1000 samples from the dataset you provided by using the meta.w2v embeddings and changing the code to accept a smaller vocabulary. Check if all the vocabulary from the dataset is represented in the embeddings. |
Hi! I am trying to run an experiment on Quora dataset. I am using the dataset split provided by: https://github.com/zhiguowang/BiMPM and created a quora.w2v file similarly to askubuntu.w2v and meta.w2v. I got the following error:
Using Theano backend.
INFO:Reading training sentence pairs from data/quora/train.tsv:
/ 298204 Elapsed Time: 0:10:34 /home/andrada.pumnea/anaconda3/lib/python3.6/site-packages/bs4/init.py:219: UserWarning: "b'.'" looks like a filename, not markup. You shouldprobably open this file and pass the filehandle intoBeautiful Soup.
'Beautiful Soup.' % markup)
| 384347 Elapsed Time: 0:13:40
INFO:...read 384348 pairs in 820.31 seconds.
INFO:...class distribution: 0 = 245042 (63.8%) | 1 = 139306 (36.2%)
INFO:Reading validation sentence pairs from data/quora/dev.tsv:
| 9999 Elapsed Time: 0:00:21
INFO:...read 10000 pairs in 21.21 seconds.
INFO:...class distribution: 0 = 5000 (50.0%) | 1 = 5000 (50.0%)
INFO:Reading testing sentence pairs from data/quora/test.tsv:
| 9999 Elapsed Time: 0:00:21
INFO:...read 10000 pairs in 21.26 seconds.
INFO:...class distribution: 0 = 5000 (50.0%) | 1 = 5000 (50.0%)
INFO:Vectorizing data:
INFO:...fitted tokenizer in 14.60 seconds;
INFO:...found 103831 unique tokens;
INFO:Load embeddings from models/quora2.w2v:
INFO:...read 36111 word embeddings in 2.82 seconds;
INFO:...created embedding matrix with shape (103832, 200);
INFO:...cached matrix in file models/quora2.w2v.min.cache.npy.
INFO:Creating CNN model:
INFO:...model created.
INFO:Compiling model:
INFO:...model 0105d13fe81945018824e64905d8f7ad compiled with optimizer: <keras.optimizers.SGD object at 0x7fd9dd23cef0>, lr (sgd-only): 0.005, loss: mse.
Model summary:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, None) 0
input_2 (InputLayer) (None, None) 0
embedding_1 (Embedding) (None, None, 200) 20766400 input_1[0][0]
input_2[0][0]
convolution1d_1 (Convolution1D) (None, None, 300) 180300 embedding_1[0][0]
embedding_1[1][0]
globalmaxpooling1d_1 (GlobalMaxPo(None, 300) 0 convolution1d_1[0][0]
convolution1d_1[1][0]
activation_1 (Activation) (None, 300) 0 globalmaxpooling1d_1[0][0]
globalmaxpooling1d_1[1][0]
merge_1 (Merge) (None, 1) 0 activation_1[0][0]
activation_1[1][0]
Total params: 20946700
INFO:Train on 384348 samples, validate on 10000 samples
INFO:Epoch 1/1
2% (11127 of 384348) |### | Elapsed Time: 0:23:50 ETA: 13:16:51
Parameter 8 to routine SGEMM NTCSGEMV SGER was incorrect
Floating point exception (core dumped)
I am using Ubuntu 16.04.3.
Any idea why it happened and how it can be fixed?
The text was updated successfully, but these errors were encountered: