Wikifier documentation

<< Back to main page · About

To call the JSI Wikifier, send a HTTP GET request to a URL of the following form:

http://www.wikifier.org/annotate-article?text=...&lang=...&...

The server is currently still in development so it is occasionally down.

The following parameters are supported:

Output format

The Wikifier returns a JSON reponse of the following form:

{
  "annotations": [ ... ],
  "spaces":["", " ", " ", "."],
  "words":["New", "York", "City"],
  "ranges": [ ... ]
}

The spaces and words arrays show how the input document has been split into words. It is always the case that spaces has exactly 1 more element than words and that concatenating spaces[0] + words[0] + spaces[1] + words[1] + ... + spaces[N-1] + words[N-1] + spaces[N] (where N is the length of words) is exactly equal to the input document (the one that was passed as the &text=... parameter).

annotations is an array of objects of the following form:

{
  "title":"New York City",
  "url":"http:\/\/en.wikipedia.org\/wiki\/New_York_City",
  "lang":"en",
  "pageRank":0.102831, "cosine":0.662925,
  "enTitle":"New York City",
  "enUrl":"http:\/\/en.wikipedia.org\/wiki\/New_York_City",
  "wikiDataClasses": [
    {"itemId":"Q515", "enLabel":"city"},
    {"itemId":"Q1549591", "enLabel":"big city"},
    ...
  ],
  "wikiDataClassIds": ["Q515", "Q1549591", ...],
  "dbPediaTypes":["City", "Settlement", "PopulatedPlace", ...],
  "dbPediaIri":"http:\/\/dbpedia.org\/resource\/New_York_City",
  "supportLen":2.000000,
  "support": [
    {"wFrom":0.000000, "wTo":1.000000, "pMentionGivenSurface":0.122591, "pageRank":0.018634},
    {"wFrom":0.000000, "wTo":2.000000, "pMentionGivenSurface":0.483354, "pageRank":0.073469}
  ]
}

ranges is an array of objects of the following form:

{
    "wFrom": 0, "wTo": 1, "pageRank":0.018634, "pMentionGivenSurface":0.122591,
    "candidates": [
        {"title":"New York", "url":"http:\/\/en.wikipedia.org\/wiki\/New_York", "cosine":0.578839, "linkCount":63626, "pageRank":0.049533},
        {"title":"New York City", "url":"http:\/\/en.wikipedia.org\/wiki\/New_York_City", "cosine":0.662925, "linkCount":11589, "pageRank":0.102831},
        {"title":"New York (magazine)", "url":"http:\/\/en.wikipedia.org\/wiki\/New_York_(magazine)", "cosine":0.431092, "linkCount":2159, "pageRank":0.030795},
		...
    ]
}

The first four members are the same as in support; in this particular example, we have wFrom = 0 and wTo = 1, so this object refers to the phrase "New York". The candidates array is a list of all the pages in the Wikipedia that are pointed to by links (from other pages in the Wikipedia) whose anchor text is the same as this phrase; for each such page, we have an object giving its title, Wikipedia URL, cosine similarity with the input document, number of links with this anchor text pointing to this particular page, and the pagerank score of this candidate annotation. For phrases that generate too many candidates, some of these candidates might not participate in the pagerank computation; in that case pageRank is shown as -1 instead.

Sample code in Python 3

Note: the following sample uses POST; if your input document is short, you can also use GET instead.

import urllib.parse, urllib.request, json

def CallWikifier(text, lang="en", threshold=0.8):
    # Prepare the URL.
    data = urllib.parse.urlencode([
        ("text", text), ("lang", lang),
        ("pageRankSqThreshold", "%g" % threshold),
        ("wikiDataClasses", "true"), ("wikiDataClassIds", "false"),
        ("support", "true"), ("ranges", "false"),
        ("includeCosines", "false"), ("maxMentionEntropy", "3")
        ])
    url = "http://www.wikifier.org/annotate-article"
    # Call the Wikifier and read the response.
    req = urllib.request.Request(url, data=data.encode("utf8"), method="POST")
    with urllib.request.urlopen(req, timeout = 60) as f:
        response = f.read()
        response = json.loads(response.decode("utf8"))
    # Output the annotations.
    for annotation in response["annotations"]:
        print("%s (%s)" % (annotation["title"], annotation["url"]))

CallWikifier("Syria's foreign minister has said Damascus is ready " +
    "to offer a prisoner exchange with rebels.")