You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there is a method in the de.tudarmstadt.ukp.wikipedia.api.Page.java class
called "getOutlinkAnchors()" that "only returns the anchors that are not equal to the
title of the page they are pointing to".
There should be another method that returns all outlink anchors including the ones
that are equal to the title of the page they are pointing to
Why?
There are word-sense-disambiguation applications that need to know how often an anchor
is used for a certain page. They use this probability as a feature for disambiguation
algorithms and also as a baseline disambiguation, by choosing the most frequent sense
for a word.
example work:
http://www.cs.waikato.ac.nz/~ihw/papers/08-DNM-IHW-LearningToLinkWithWikipedia.pdf
http://www.di.unipi.it/~ferragin/cikm2010.pdf
http://cogcomp.cs.illinois.edu/papers/RatinovDoRo.pdf
proposed method:
public Map<String, Set<String>> getAllOutlinkAnchors()
throws WikiTitleParsingException
{
Map<String, Set<String>> outAnchors = new HashMap<String, Set<String>>();
ParsedPage pp = getParsedPage();
if (pp == null) {
return outAnchors;
}
for (Link l : pp.getLinks()) {
if (l.getTarget().length() == 0) {
continue;
}
String targetTitle = new Title(l.getTarget()).getPlainTitle();
if (!l.getType().equals(Link.type.EXTERNAL) && !l.getType().equals(Link.type.IMAGE)
&& !l.getType().equals(Link.type.AUDIO) && !l.getType().equals(Link.type.VIDEO)
&& !targetTitle.contains(":")) // Wikipedia titles only contain colons if they
// are categories or other meta data
{
String anchorText = l.getText();
Set<String> anchors;
if (outAnchors.containsKey(targetTitle)) {
anchors = outAnchors.get(targetTitle);
}
else {
anchors = new HashSet<String>();
}
anchors.add(anchorText);
outAnchors.put(targetTitle, anchors);
}
}
Reported by SamyAteia on 2012-04-05 18:31:16
The text was updated successfully, but these errors were encountered:
As we will not be developing the JWPL Parser any more, it has been moved into its own
module. JWPL will now be using the Sweble parser (www.sweble.org). I am currently migrating
the API methods that need Wiki markup parsing (like the anchor extractors) to the new
parser. I will change the semantics of the anchor extraction methods so that they will
return all anchors.
The old anchor extraction methods have been move to de.tudarmstadt.ukp.wikipedia.parser.LinkAnchorExtractor
in the parser module.
Reported by oliver.ferschke on 2012-05-29 10:21:48
Originally reported on Google Code with ID 85
Reported by
SamyAteia
on 2012-04-05 18:31:16The text was updated successfully, but these errors were encountered: