indexEror #9

franck-nkolongo · 2024-09-20T02:37:59Z

hello, I have a problem: reviews = list(review_data[2]) reviews = reviews[:5000] # only consider the first 5k reviews

IndexError: boolean index did not match indexed array along dimension 0; dimension is 5000 but corresponding boolean dimension is 1000.

this works with reviews = reviews[:1000]

deepbot86 · 2024-09-22T04:37:43Z

same here ..
` File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/topicgpt/TopicRepresentation.py:310, in extract_topics_no_new_vocab_computation(corpus, vocab, document_embeddings, clusterer, vocab_embeddings, n_topwords, topword_extraction_methods, consider_outliers)
306 dim_red_centroids = umap_mapper.transform(np.array(list(centroid_dict.values()))) # map the centroids to low dimensional space
308 dim_red_centroid_dict = {label: centroid for label, centroid in zip(centroid_dict.keys(), dim_red_centroids)}
--> 310 word_topic_mat = extractor.compute_word_topic_mat(corpus, vocab, labels, consider_outliers = consider_outliers) # compute the word-topic matrix of the corpus
311 if "tfidf" in topword_extraction_methods:
312 tfidf_topwords, tfidf_dict = extractor.extract_topwords_tfidf(word_topic_mat = word_topic_mat, vocab = vocab, labels = labels, top_n_words = n_topwords) # extract the top-words according to tfidf

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/topicgpt/ExtractTopWords.py:308, in ExtractTopWords.compute_word_topic_mat(self, corpus, vocab, labels, consider_outliers)
305 word_topic_mat = np.zeros((len(vocab), len((np.unique(labels)))))
307 for i, label in tqdm(enumerate(np.unique(labels)), desc="Computing word-topic matrix", total=len(np.unique(labels))):
--> 308 topic_docs = corpus_arr[labels == label]
309 topic_doc_string = " ".join(topic_docs)
310 topic_doc_words = word_tokenize(topic_doc_string)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 6969 but corresponding boolean dimension is 4999
`

franck-nkolongo · 2024-09-23T02:54:59Z

4999

I've found the solution, first you need to delete the directory (SaveEmeddings which includes the embeddings.pkl file). This file was initially made with 1000 data (in my case), in your case, you must have initially tried with a 4999 data set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexEror #9

indexEror #9

franck-nkolongo commented Sep 20, 2024 •

edited

Loading

deepbot86 commented Sep 22, 2024 •

edited

Loading

franck-nkolongo commented Sep 23, 2024

indexEror #9

indexEror #9

Comments

franck-nkolongo commented Sep 20, 2024 • edited Loading

deepbot86 commented Sep 22, 2024 • edited Loading

franck-nkolongo commented Sep 23, 2024

franck-nkolongo commented Sep 20, 2024 •

edited

Loading

deepbot86 commented Sep 22, 2024 •

edited

Loading