What is `W` Precisely? #27

rtrad89 · 2020-07-07T14:14:46Z

During topic learning, one needs to supply W: int, size of vocabulary.

I tried to fathom the meaning of W reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, but W is not an input there. However, it is data-dependent to me, so am I correct if I assume W to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reason W is not calculated from the corpus docs_pt automatically? I'm afraid I am missing something hence my question.

Thank you.

The text was updated successfully, but these errors were encountered:

zhongpeixiang · 2020-08-20T12:15:05Z

During topic learning, one needs to supply W: int, size of vocabulary.

I tried to fathom the meaning of W reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, but W is not an input there. However, it is data-dependent to me, so am I correct if I assume W to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reason W is not calculated from the corpus docs_pt automatically? I'm afraid I am missing something hence my question.

Thank you.

W denotes the vocab size.

W=`wc -l < $voca_pt` # vocabulary size

rtrad89 · 2020-08-20T12:17:39Z

During topic learning, one needs to supply W: int, size of vocabulary.
I tried to fathom the meaning of W reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, but W is not an input there. However, it is data-dependent to me, so am I correct if I assume W to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reason W is not calculated from the corpus docs_pt automatically? I'm afraid I am missing something hence my question.
Thank you.

W denotes the vocab size.
W=`wc -l < $voca_pt` # vocabulary size

Can you clarify this then please?

If so, any reason W is not calculated from the corpus docs_pt automatically? I'm afraid I am missing something hence my question.

zhongpeixiang · 2020-08-20T12:31:40Z

$voca_pt is the vocab file automatically calculated from $doc_pt. See

python indexDocs.py $doc_pt $dwid_pt $voca_pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is `W` Precisely? #27

What is `W` Precisely? #27

rtrad89 commented Jul 7, 2020

zhongpeixiang commented Aug 20, 2020

rtrad89 commented Aug 20, 2020

zhongpeixiang commented Aug 20, 2020

What is W Precisely? #27

What is W Precisely? #27

Comments

rtrad89 commented Jul 7, 2020

zhongpeixiang commented Aug 20, 2020

rtrad89 commented Aug 20, 2020

zhongpeixiang commented Aug 20, 2020

What is `W` Precisely? #27

What is `W` Precisely? #27