-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Result #2
Comments
Hi there, I am having some trouble implementating ESA as well (I got about 0.65 Spearman correlation) Instead of calculating Spearman correlation yourself, how about using the following web site Could you also share your Spearman correlation coefficient when you are done? |
amiradib, why do you think that line is incorrect? Yes, the instructions listed in Roadmap document are applied in this project. |
Wikiprep-ESA Spearman correlation detail for 20051105 dump from Gabrilovich et al: |
So Tanks, I Review to find problem. "why do you think that line is incorrect?" the error arise when Contents in "ctitle" variable has a unicode character, Like "â", with change this line, i bypass this problem, but the "\u" appent in first of all records ! |
Thanks a lot of sharing the Spearman correlation data! This gave me hints of where I got my code wrong. |
amiradib, I set the default encoding of Python as UTF-8 while I was working, that might be the difference that cause problems for you. http://diveintopython.org/xml_processing/unicode.html |
I also had to change those lines to .encode("utf-8") even thought my default encoding is UTF8, no idea why. I also changed line 82, because the test should fail for Gabrilovich too (I believe). Having done that I ran the following with no problem: python scanLinks.py <hgw.xml file from Wikiprep dump> java -cp esa-lucene.jar edu.wiki.index.ESAWikipediaIndexer java -cp esa-lucene.jar edu.wiki.demo.TestESAVectors However, the results of TestESAVectors doesn't seem right. I used Bank of America as the original paper did, but my results were wildly different: I'm not quite sure why they look nothing like the paper's results. Am I using the wrong function? Thanks, |
Hi David, Do you mind sharing to me whether or not I looked at the Java/Python code myself, and honestly I could not find anything Thanks, |
I ran TestWordsim353 and plugged my numbers into faraday's spreadsheet above. Rank correlation was 0.742. |
Thanks, David! |
You're welcome. Let me know if I can do anything else to help. I'm curious, are other people getting results similar to mine for TestESAVectors on "Bank of America"? |
I finally could run the code. Here is my result of running TestESAVector.java on "Bank of America". I set the following parameters to be 21139 North America 0.4362808871937762 David, would you mind sharing your parameter values to get 0.742 Spearman correlation? Setting LINK_ALPHA = 0.0f, and WINDOW_THRES= 0.005f gave me about 0.73. |
I don't believe I changed them at all. LINK_ALPHA = 0.5f |
i followed all your instructions but i encounter with an error in scandata.py (a Problem with unicode characters)
some how i correct the error (i'am not sure really Correct It !) -- i think the problem is with this Code
"articleBuffer.append((id, ctitle))" that i change it to
"articleBuffer.append((id,' \u ' + ctitle))"
But "\u" append to first of each record , i delete it from records with SQL commands --
But i didn't get a suitable result, My rate in wordsim353 is 0.4 but ESA was 0.75.
my correlation SPEARMAN_RANK_CORRELATION algorithm is :
public static double spearmanRankCorrelationCoefficient(Double[] a,
Double[] b) {
check(a, b);
SortedMap<Double,Double> ranking = new TreeMap<Double,Double>();
for (int i = 0; i < a.length; ++i) {
ranking.put(a[i], b[i]);
}
Please Guide me to get suitable result.
you listed some instruction in the address https://github.com/faraday/wikiprep-esa/wiki/roadmap , i want to know if you implement this instructions in your Code ?
Regards
The text was updated successfully, but these errors were encountered: