Skip to content
jbzdarkid edited this page Feb 11, 2016 · 3 revisions

A Page object corresponds to a page on the wiki. It can used to get the page text, get information about the page, or take actions on the page such as editing and deletion. Page objects are hashable and can be compared. They're considered to be equal if the wiki objects are equal and they have the same title, or if the title isn't set, the same pageid.

Constructor

Page has 8 constructor parameters, 1 of which is required.

  • site - A Wiki object (required)
  • title - The title of the page. Default: None
  • check - Whether or not to do API queries to set basic information about the page. Default: True
  • followRedir - Whether or not to follow redirects when doing the checks. Default: True
  • section - The name of a section on the page. See the note below about sections. Default: None
  • sectionnumber - The number of a section on the page. See the note below about sections. Default: None
  • pageid - The pageid value of the page
  • namespace - The namespace number of the page.

####Parameter combinations Certain parameters used in combination with others can cause unexpected behavior.

  • The namespace number should not be set if the namespace is already in the title. If a namespace number is given, it will assume the title does not contain a namespace prefix.
  • Either title or pageid must be set.
  • Do not set both the pageid and the title, unless you are absolutely sure both are correct. if check is True and they don't correspond to the same page, the pageid value will override the title. If check is False, the behavior may be unexpected.
  • Only 1 method should be used for setting the section. sectionnumber will override section, which will override an anchor in the title.
  • Even if check is False, followRedir still sets an instance variable that will be checked if setPageInfo() is ever called on the object.
  • Even if check is False, section name validation (which does an API query) will still be done

####Notes on sections

  • In addition to the section and sectionnumber parameters, a section can also be set by an anchor in the title (a '#' followed by the section name). The anchor will be stripped and used as the section name.
  • If a section is set, only getWikiText and edit apply to the section. All other functions apply to the entire page.
  • Validation is done on section names, but not on numbers.

Methods

setPageInfo

setPageInfo()

Gets basic information about the page, like the pageid and whether it exists. Also resolves redirects if followRedir was set to True during construction. Returns the Page object. This is called automatically from the constructor if check was set to True and it may be called by other functions if they need the information.

setNamespace

setNamespace(newns, recheck=True)

Change the namespace of the page. newns should be an integer or a Namespace. recheck will call setPageInfo() again with the new namespace. Setting recheck to False is not recommended. Returns the new namespace number.

setSection

setSection(section=None, number=None)

Set a section on the page using either a section name (section) or number. See the [notes on sections](#notes on sections).

canHaveSubpages

canHaveSubpages()

Whether or not a subpage can be made for the current page. This is based on which namespace it's in and the settings of the wiki.

isRedir

isRedir()

Whether or not the current page is a redirect.

isTalk

isTalk()

Whether the current page is in a talk (odd-numbered) namespace.

toggleTalk

toggleTalk(check=True, followRedir=True)

If the current page is a talk page, it returns a new Page object for the corresponding non-talk page. If the current page is a non-talk page, it returns a new Page object for the corresponding talk page. This differs from setNamespace in that it does not change the Page object it's called on. check and followRedir have the same meanings as in the Constructor.

getWikiText

getWikiText(expandtemplates=False, force=False)

Get the wikitext for the Page (or section, if set). If expandtemplates is True, templates in the text will be fully expanded. After calling this function, the text will be cached in the object. Set force to True to override the cache and get it from the server again.

getLinks

getLinks(force=False)

Returns a list of internal links on the page. After calling this function, the links will be cached in the object. Set force to True to override the cache and get them from the server again.

getProtection

getProtection(force=False)

Returns the current protection status of the page. After calling this function, the status will be cached in the object. Set force to True to override the cache and get it from the server again. The output is a dictionary like:

{'edit': {'cascading': False,
          'expiry': datetime.datetime(2015, 1, 1, 0, 53, 4),
          'level': 'sysop'},
 'move': {'cascading': False, 
          'expiry': 'infinity', 
          'level': 'sysop'}
}

Only protection levels that are actually set will be returned, so calling on an unprotected page will return an empty dict.

getTemplates

getTemplates(force=False)

Returns a list of all the templates used on the page. After calling this function, the templates will be cached in the object. Set force to True to override the cache and get them from the server again.

getCategories

getCategories(force=False)

Returns a list of all the categories the page is in. After calling this function, the categories will be cached in the object. Set force to True to override the cache and get them from the server again.

getHistory

getHistory(direction='older', content=True, limit='all')

Get the history of the page. direction is the order the revisions should be retrieved. If "older" (default), it will start with the newest revisions and get the older ones. If "newer", it will start with the oldest revisions and get the newer ones. It returns a list of dicts containing data about each revision. If content is True (the default), it will also get the wikitext of each revision. It will get the entire page, even if a section is set. limit is the maximum number of revisions or 'all' (default) to retrieve. The format of the result is:

{'*': 'Page content', # Only returned when content=True
 'comment': 'Edit summary', 
 'contentformat': 'text/x-wiki', # Only returned when content=True
 'contentmodel': 'wikitext', # Only returned when content=True
 'parentid': 1083209, # ID of previous revision
 'revid': 1083211, # Revision ID
 'sha1': '315748b8e6fb6343efed3c17b56edc2da1d9e8b5', # SHA1 hash of wikitext
 'size': 157, # Size, in bytes
 'timestamp': '2014-07-30T19:26:53Z', # Timestamp of edit
 'user': 'Example', # Username of editor
 'userid': 587508 # User ID of editor
}

getHistoryGen

getHistoryGen(direction='older', content=True, limit='all')

The interface for this is the same as getHistory, except instead of loading the entire history into memory, it gets 1 revision at a time and yields it as a generator function. This is a better option for pages with thousands of revisions or thousands of bytes of content in each revision.

edit

edit(*args, **kwargs)

Edits the page. The parameters have the same names as those used in the API and are listed in the API docs. The full list of valid options from the code is:

validargs = set(['text', 'summary', 'minor', 'notminor', 'bot', 'basetimestamp', 'starttimestamp',
    'recreate', 'createonly', 'nocreate', 'watch', 'unwatch', 'watchlist', 'prependtext', 'appendtext',
    'section', 'sectiontitle', 'captchaword', 'captchaid', 'contentformat', 'contentmodel'])

Depending on your version of MediaWiki and extensions, some of these may not be available. The title, CSRF token, and MD5 hash will be added automatically. To skip the MD5 hash, add a skipmd5 = True argument. Returns the result object from the API.

move

move(mvto, reason='', movetalk=False, noredirect=False, movesubpages=True, watch=False, unwatch=False, watchlist='preferences')

Moves the page to a page with the title mvto with an edit summary of reason. If noredirect is true, it will be done without leaving a redirect (this requires the "supressredirect" user right, generally granted to sysops and bots). If movesubpages is True, all subpages of the page will also be moved. The watch and unwatch paramters are deprecated in MediaWiki and are used mostly for compatibility with older versions. Use the watchlist parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".

If there are no errors, it updates the title/namespace variables of the Page object and returns the result object from the API.

protect

protect(restrictions={}, expirations={}, reason='', cascade=False, watch=False, watchlist='preferences')

Protects the page. restrictions is a dict of protection levels - {'edit':'autoconfirmed', 'move':'sysop'} - and expirations is a dict of expiration times in GNU date import format - {'edit':'3 weeks', 'move':'infinite'} - "infinite", "indefinite", and "never" are equivelent. Omitting an expiration is equivelent to setting it to "indefinite." Set a protection summary for the log in reason and set cascade to True to use cascading protection. The watch paramters is are deprecated in MediaWiki and is used mostly for compatibility with older versions. Use the watchlist parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".

Returns the result object from the API.

delete

delete(reason='', watch=False, unwatch=False, watchlist='preferences')

Deletes the page. reason is the summary for the log. The watch and unwatch paramters are deprecated in MediaWiki and are used mostly for compatibility with older versions. Use the watchlist parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".

If there are no errors, it resets instance variables to their settings for a non-existent page and returns the API result object.

Private methods

  • __getSection() - Gets section numbers from section names
  • __getHistoryInternal - Does the API queries for getHistory and getHistoryGen
  • __extractToList - Used by getLinks and similar functions to extract the relevant data from API results into a plain list

Instance variables

Public

  • site - The Wiki object passed to the constructor.
  • title - The title of the page, passed to the constructor or retrieved by setPageInfo()
  • pageid - The page_id database key, passed to the constructor or retrieved by setPageInfo()
  • followRedir - The value of followRedir given in the constructor, used by setPageInfo()
  • unprefixedtitle - The title without the namespace prefix
  • urltitle - The URL encoded title, with percent encoding and spaces replaced with underscores
  • wikitext - The text of the page, use getWikiText() to retrieve it
  • templates - A list of templates on the page, use getTemplates() to retrieve
  • links - A list of internal links on the page, use getLinks() to retrieve
  • categories - A list of categories on the page, use getCategories() to retrieve
  • exists - None if existence hasn't been checked with setPageInfo(), otherwise True or False if the page exists on the wiki
  • protection - A dictionary with page protection data, use getProtection() to retrieve it
  • namespace - The numerical namespace number of the page
  • section - The section number if set, otherwise None

Examples

Get the wikitext of a page, modify it, and edit it

from wikitools import wiki, page
site = wiki.Wiki("https://en.wikipedia.org/w/api.php")
p = page.Page(site, "Wikipedia:Sandbox")
pagetext = p.getWikiText()
pagetext += '\nTesting'
p.edit(text=pagetext, summary='Test', minor=True)

Search through the page history

from wikitools import wiki, page
site = wiki.Wiki("https://www.mediawiki.org/w/api.php")
p = page.Page(site, "API:Main page")
for revision in p.getHistoryGen():
    if revision['userid'] == 0:
        # Do something with edits by unregistered users