-
Notifications
You must be signed in to change notification settings - Fork 51
page.Page
A Page object corresponds to a page on the wiki. It can used to get the page text, get information about the page, or take actions on the page such as editing and deletion. Page objects are hashable and can be compared. They're considered to be equal if the wiki objects are equal and they have the same title, or if the title isn't set, the same pageid.
Page has 8 constructor parameters, 1 of which is required.
-
site
- A Wiki object (required) -
title
- The title of the page. Default: None -
check
- Whether or not to do API queries to set basic information about the page. Default: True -
followRedir
- Whether or not to follow redirects when doing the checks. Default: True -
section
- The name of a section on the page. See the note below about sections. Default: None -
sectionnumber
- The number of a section on the page. See the note below about sections. Default: None -
pageid
- The pageid value of the page -
namespace
- The namespace number of the page.
####Parameter combinations Certain parameters used in combination with others can cause unexpected behavior.
- The
namespace
number should not be set if the namespace is already in thetitle
. If a namespace number is given, it will assume the title does not contain a namespace prefix. - Either
title
orpageid
must be set. - Do not set both the
pageid
and thetitle
, unless you are absolutely sure both are correct. ifcheck
is True and they don't correspond to the same page, the pageid value will override the title. Ifcheck
is False, the behavior may be unexpected. - Only 1 method should be used for setting the section.
sectionnumber
will overridesection
, which will override an anchor in thetitle
. - Even if
check
is False,followRedir
still sets an instance variable that will be checked if setPageInfo() is ever called on the object. - Even if
check
is False, section name validation (which does an API query) will still be done
####Notes on sections
- In addition to the
section
andsectionnumber
parameters, a section can also be set by an anchor in the title (a '#' followed by the section name). The anchor will be stripped and used as the section name. - If a section is set, only getWikiText and edit apply to the section. All other functions apply to the entire page.
- Validation is done on section names, but not on numbers.
setPageInfo()
Gets basic information about the page, like the pageid and whether it exists. Also resolves redirects if followRedir was set to True during construction. Returns the Page object. This is called automatically from the constructor if check
was set to True and it may be called by other functions if they need the information.
setNamespace(newns, recheck=True)
Change the namespace of the page. newns
should be an integer or a Namespace. recheck will call setPageInfo()
again with the new namespace. Setting recheck
to False is not recommended. Returns the new namespace number.
setSection(section=None, number=None)
Set a section on the page using either a section name (section
) or number
. See the [notes on sections](#notes on sections).
canHaveSubpages()
Whether or not a subpage can be made for the current page. This is based on which namespace it's in and the settings of the wiki.
isRedir()
Whether or not the current page is a redirect.
isTalk()
Whether the current page is in a talk (odd-numbered) namespace.
toggleTalk(check=True, followRedir=True)
If the current page is a talk page, it returns a new Page object for the corresponding non-talk page. If the current page is a non-talk page, it returns a new Page object for the corresponding talk page. This differs from setNamespace in that it does not change the Page object it's called on. check
and followRedir
have the same meanings as in the Constructor.
getWikiText(expandtemplates=False, force=False)
Get the wikitext for the Page (or section, if set). If expandtemplates
is True, templates in the text will be fully expanded. After calling this function, the text will be cached in the object. Set force
to True to override the cache and get it from the server again.
getLinks(force=False)
Returns a list of internal links on the page. After calling this function, the links will be cached in the object. Set force
to True to override the cache and get them from the server again.
getProtection(force=False)
Returns the current protection status of the page. After calling this function, the status will be cached in the object. Set force
to True to override the cache and get it from the server again. The output is a dictionary like:
{'edit': {'cascading': False,
'expiry': datetime.datetime(2015, 1, 1, 0, 53, 4),
'level': 'sysop'},
'move': {'cascading': False,
'expiry': 'infinity',
'level': 'sysop'}
}
Only protection levels that are actually set will be returned, so calling on an unprotected page will return an empty dict.
getTemplates(force=False)
Returns a list of all the templates used on the page. After calling this function, the templates will be cached in the object. Set force
to True to override the cache and get them from the server again.
getCategories(force=False)
Returns a list of all the categories the page is in. After calling this function, the categories will be cached in the object. Set force
to True to override the cache and get them from the server again.
getHistory(direction='older', content=True, limit='all')
Get the history of the page. direction
is the order the revisions should be retrieved. If "older" (default), it will start with the newest revisions and get the older ones. If "newer", it will start with the oldest revisions and get the newer ones. It returns a list of dicts containing data about each revision. If content
is True (the default), it will also get the wikitext of each revision. It will get the entire page, even if a section is set. limit
is the maximum number of revisions or 'all' (default) to retrieve. The format of the result is:
{'*': 'Page content', # Only returned when content=True
'comment': 'Edit summary',
'contentformat': 'text/x-wiki', # Only returned when content=True
'contentmodel': 'wikitext', # Only returned when content=True
'parentid': 1083209, # ID of previous revision
'revid': 1083211, # Revision ID
'sha1': '315748b8e6fb6343efed3c17b56edc2da1d9e8b5', # SHA1 hash of wikitext
'size': 157, # Size, in bytes
'timestamp': '2014-07-30T19:26:53Z', # Timestamp of edit
'user': 'Example', # Username of editor
'userid': 587508 # User ID of editor
}
getHistoryGen(direction='older', content=True, limit='all')
The interface for this is the same as getHistory, except instead of loading the entire history into memory, it gets 1 revision at a time and yields it as a generator function. This is a better option for pages with thousands of revisions or thousands of bytes of content in each revision.
edit(*args, **kwargs)
Edits the page. The parameters have the same names as those used in the API and are listed in the API docs. The full list of valid options from the code is:
validargs = set(['text', 'summary', 'minor', 'notminor', 'bot', 'basetimestamp', 'starttimestamp',
'recreate', 'createonly', 'nocreate', 'watch', 'unwatch', 'watchlist', 'prependtext', 'appendtext',
'section', 'sectiontitle', 'captchaword', 'captchaid', 'contentformat', 'contentmodel'])
Depending on your version of MediaWiki and extensions, some of these may not be available. The title, CSRF token, and MD5 hash will be added automatically. To skip the MD5 hash, add a skipmd5 = True
argument. Returns the result object from the API.
move(mvto, reason='', movetalk=False, noredirect=False, movesubpages=True, watch=False, unwatch=False, watchlist='preferences')
Moves the page to a page with the title mvto
with an edit summary of reason
. If noredirect
is true, it will be done without leaving a redirect (this requires the "supressredirect" user right, generally granted to sysops and bots). If movesubpages
is True, all subpages of the page will also be moved. The watch
and unwatch
paramters are deprecated in MediaWiki and are used mostly for compatibility with older versions. Use the watchlist
parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".
If there are no errors, it updates the title/namespace variables of the Page object and returns the result object from the API.
protect(restrictions={}, expirations={}, reason='', cascade=False, watch=False, watchlist='preferences')
Protects the page. restrictions
is a dict of protection levels - {'edit':'autoconfirmed', 'move':'sysop'}
- and expirations
is a dict of expiration times in GNU date import format - {'edit':'3 weeks', 'move':'infinite'}
- "infinite", "indefinite", and "never" are equivelent. Omitting an expiration is equivelent to setting it to "indefinite." Set a protection summary for the log in reason
and set cascade
to True to use cascading protection. The watch
paramters is are deprecated in MediaWiki and is used mostly for compatibility with older versions. Use the watchlist
parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".
Returns the result object from the API.
delete(reason='', watch=False, unwatch=False, watchlist='preferences')
Deletes the page. reason
is the summary for the log. The watch
and unwatch
paramters are deprecated in MediaWiki and are used mostly for compatibility with older versions. Use the watchlist
parameter instead to set watchlist settings. The options are "preferences" (default, use the setting in your user preferences), "watch", "unwatch", or "nochange".
If there are no errors, it resets instance variables to their settings for a non-existent page and returns the API result object.
-
__getSection()
- Gets section numbers from section names -
__getHistoryInternal
- Does the API queries for getHistory and getHistoryGen -
__extractToList
- Used by getLinks and similar functions to extract the relevant data from API results into a plain list
-
site
- The Wiki object passed to the constructor. -
title
- The title of the page, passed to the constructor or retrieved by setPageInfo() -
pageid
- The page_id database key, passed to the constructor or retrieved by setPageInfo() -
followRedir
- The value of followRedir given in the constructor, used by setPageInfo() -
unprefixedtitle
- The title without the namespace prefix -
urltitle
- The URL encoded title, with percent encoding and spaces replaced with underscores -
wikitext
- The text of the page, use getWikiText() to retrieve it -
templates
- A list of templates on the page, use getTemplates() to retrieve -
links
- A list of internal links on the page, use getLinks() to retrieve -
categories
- A list of categories on the page, use getCategories() to retrieve -
exists
- None if existence hasn't been checked with setPageInfo(), otherwise True or False if the page exists on the wiki -
protection
- A dictionary with page protection data, use getProtection() to retrieve it -
namespace
- The numerical namespace number of the page -
section
- The section number if set, otherwise None
Get the wikitext of a page, modify it, and edit it
from wikitools import wiki, page
site = wiki.Wiki("https://en.wikipedia.org/w/api.php")
p = page.Page(site, "Wikipedia:Sandbox")
pagetext = p.getWikiText()
pagetext += '\nTesting'
p.edit(text=pagetext, summary='Test', minor=True)
Search through the page history
from wikitools import wiki, page
site = wiki.Wiki("https://www.mediawiki.org/w/api.php")
p = page.Page(site, "API:Main page")
for revision in p.getHistoryGen():
if revision['userid'] == 0:
# Do something with edits by unregistered users