-
Notifications
You must be signed in to change notification settings - Fork 4
/
PLUGINS
351 lines (240 loc) · 12.2 KB
/
PLUGINS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# Writing rawdog plugins
## Introduction
As provided, rawdog provides a fairly small set of features. In order to
make it do more complex jobs, rawdog can be extended using plugin
modules written in Python. This document is intended for developers who
want to extend rawdog by writing plugins.
Extensions work by registering hook functions which are called by
various bits of rawdog's core as it runs. These functions can modify
rawdog's internal state in various interesting ways. An arbitrary number
of functions can be attached to each hook; they are called in the order
they were attached. Hook functions take various arguments depending on
where they're called from, and returns a boolean value indicating
whether further functions attached to the same hook should be called.
The "plugindirs" config option gives a list of directories to search for
plugins; all Python modules found in those directories will be loaded by
rawdog. In practice, this means that you need to call your file
something ending in ".py" to have it recognised as a plugin.
## The plugins module
All plugins should import the `rawdoglib.plugins` module, which provides
the functions for registering and calling hooks, along with some
utilities for plugins. Many plugins will also want to import the
`rawdoglib.rawdog` module, which contains rawdog's core functionality,
much of which is reusable.
### rawdoglib.plugins.attach_hook(hook_name, function)
The attach_hook function adds a hook function to the hook of the given
name.
### rawdoglib.plugins.Box
The Box class is used to pass immutable types by reference to hook
functions; this allows several plugins to modify a value. It contains a
single `value` attribute for the value it is holding.
## Plugin storage
Since some plugins will need to keep state between runs, the Rawdog
object that most hook functions are provided with has a
`get_plugin_storage` method, which when called with a plugin identifier
for your plugin as an argument will give you a reference to a dictionary
which will be persisted in the rawdog state file. The dictionary is empty to
start with; you may store any pickleable objects you like in it. Plugin
identifiers should be strings based on your email address, in order to be
globally unique -- for example, `org.offog.ats.archive`.
After changing a plugin storage dictionary, you must call "rawdog.modified()"
to ensure that rawdog will write out its state file.
## Hooks
Most hook functions are called with "rawdog" and "config" as their first
two arguments; these are references to the aggregator's Rawdog and
Config objects.
If you need a hook that doesn't currently exist, please contact me.
The following hooks are supported:
### startup(rawdog, config)
Run when rawdog starts up, after the state file and config file have
been loaded, but before rawdog starts processing command-line arguments.
### shutdown(rawdog, config)
Run just before rawdog saves the state file and exits.
### config_option(config, name, value)
* name: the option name
* value: the option value
Called when rawdog encounters a config file option that it doesn't
recognise. The rawdoglib.rawdog.parse_* functions will probably be
useful when dealing with config options. You can raise ValueError to
have rawdog print an appropriate error message. You should return False
from this hook if name is an option you recognise.
Note that using config.log in this hook will probably not do what you
want, because the verbose flag may not yet have been turned on.
### config_option_arglines(config, name, value, arglines)
* name: the option name
* value: the option value
* arglines: a list of extra indented lines given after the option (which
can be used to supply extra arguments for the option)
As config_option for options that can handle extra argument lines.
If the options you are implementing should not have extra arguments,
then use the config_option hook instead.
### output_sort_articles(rawdog, config, articles)
* articles: the mutable list of (date, feed_url, sequence_number,
article_hash) tuples
Called to sort the list of articles to write. The default action here is
to just call the list's sort method; if you sort the list in a different
way, you should return False from this hook to prevent rawdog from
resorting it afterwards.
Later versions of rawdog may add more items at the end of the tuple;
bear this in mind when you're manipulating the items.
### output_write(rawdog, config, articles)
* articles: the mutable list of Article objects
Called immediately before output_sorted_filter; this hook is here for
backwards compatibility, and should not be used in new plugins.
### output_sorted_filter(rawdog, config, articles)
* articles: the mutable list of Article objects
Called after rawdog sorts the list of articles to write, but before it
removes duplicate and excessively old articles. This hook can be used to
implement alternate duplicate-filtering methods. If you return False
from this hook, then rawdog will not do its usual duplicate-removing
filter pass.
### output_write_files(rawdog, config, articles, article_dates)
* articles: the mutable list of Article objects
* article_dates: a dictionary mapping Article objects to the dates that
were used to sort them
Called when rawdog is about to write its output to files. This hook can
be used to implement alternative output methods.
If you return False from this hook, then rawdog will not write any
output itself (and the later output_ hooks will thus not be called). I
would suggest not returning False here unless you plan to call the
rawdog.write_output_file method from your hook implementation; failure
to do so will most likely break other plugins.
### output_items_begin(rawdog, config, f)
* f: a writable file object (__items__)
Called before rawdog starts expanding the items template. This set of
hooks can be used to implement alternative date (or other section)
headings.
### output_items_heading(rawdog, config, f, article, date)
* f: a writable file object (__items__)
* article: the Article object about to be written
* date: the Article's date for sorting purposes
Called before each item is written. If you return False from this hook,
then rawdog's normal time-based section headings will not be written.
### output_items_end(rawdog, config, f)
* f: a writable file object (__items__)
Called after all items are written.
### output_bits(rawdog, config, bits)
* bits: a dictionary of template parameters
Called before expanding the page template. This hook can be used to add
extra template parameters.
Note that template parameters should be valid HTML, with entities
escaped, even if they're URLs or similar. You can use rawdog's
`rawdoglib.rawdog.string_to_html` function to do this for you:
the_thing = "This can contain arbitary text & stuff"
bits["thing"] = string_to_html(the_thing, config)
It's also good idea for template parameter names to be valid Python
identifiers, so that plugins that replace the template system with
something smarter can make them into local variables.
### output_item_bits(rawdog, config, feed, article, bits)
* feed: the Feed containing this article
* article: the Article being templated
* bits: a dictionary of template parameters
Called before expanding the item template for an article. This hook can
be used to add extra template parameters.
(See the documentation for `output_bits` for some advice on adding
template parameters.)
### pre_update_feed(rawdog, config, feed)
* feed: the Feed about to be updated
Called before a feed's content is fetched. This hook can be used to
perform extra actions before fetching a feed. Note that if `usethreads`
is set to a positive number in the config file, this hook may be called
from a worker thread.
### mid_update_feed(rawdog, config, feed, content)
* feed: the Feed being updated
* content: the feedparser output from the feed (may be None)
Called after a feed's content has been fetched, but before rawdog's
internal state has been updated. This hook can be used to modify
feedparser's output.
### post_update_feed(rawdog, config, feed, seen_articles)
* feed: the Feed that has been updated
* seen_articles: a boolean indicating whether any articles were read
from the feed
Called after a feed is updated.
### article_seen(rawdog, config, article, ignore)
* article: the Article that has been received
* ignore: a Boxed boolean indicating whether to ignore the article
Called when an article is received from a feed. This hook can be used to
modify or ignore incoming articles.
### article_updated(rawdog, config, article, now)
* article: the Article that has been updated
* now: the current time
Called after an article has been updated (when rawdog receives an
article from a feed that it already has).
### article_added(rawdog, config, article, now)
* article: the Article that has been added
* now: the current time
Called after a new article has been added.
### article_expired(rawdog, config, article, now)
* article: the Article that will be expired
* now: the current time
Called before an article is expired.
### fill_template(template, bits, result)
* template: the template string to fill
* bits: a dictionary of template arguments
* result: a Boxed Unicode string for the result of template expansion
Called whenever template expansion is performed. If you set the value
inside result to something other than None, then rawdog will treat that
value as the result of template expansion (rather than performing its
normal expansion process); you can thus use this hook either for
manipulating template parameters, or for replacing the template system
entirely.
### tidy_args(config, args, baseurl, inline)
* args: a dictionary of keyword arguments for Tidy
* baseurl: the URL at which the HTML was originally found
* inline: a boolean indicating whether the output should be inline HTML
or a block element
When HTML is being sanitised by rawdog and the "tidyhtml" option is
enabled, this hook will be called just before Tidy is run (either via
PyTidyLib or via mx.Tidy). It can be used to add or modify Tidy options;
for example, to make it produce XHTML output.
### clean_html(config, html, baseurl, inline)
* html: a Boxed Unicode string containing the HTML being cleaned
* baseurl: the URL at which the HTML was originally found
* inline: a boolean indicating whether the output should be inline HTML
or a block element
Called whenever HTML is being sanitised by rawdog (after its existing
HTML sanitisation processes). You can use this to implement extra
sanitisation passes. You'll need to update the boxed value with the new,
cleaned string.
### add_urllib2_handlers(rawdog, config, feed, handlers)
* feed: the Feed to which the request will be made
* handlers: the mutable list of urllib2 *Handler objects that will be
passed to feedparser
Called before feedparser is used to fetch feed content. This hook can be
used to add additional urllib2 handlers to cope with unusual protocol
requirements; use `handlers.append` to add extra handlers.
### feed_fetched(rawdog, config, feed, feed_data, error, non_fatal)
* feed: the Feed that has just been fetched
* feed_data: the data returned from feedparser.parse
* error: the error string if an error occurred, or None if no error
occurred
* non_fatal: if error is not None, a boolean indicating whether the
error was fatal
Called after feedparser has been called to fetch the feed. This hook can
be used to manipulate the received feed data or implement custom error
handling.
## Obsolete hooks
The following hooks existed in previous versions of rawdog, but are no
longer supported:
* output_filter (since rawdog 2.12); use output_sorted_filter instead
* output_sort (since rawdog 2.12); use output_sort_articles instead
## Examples
### backwards.py
This is probably the simplest useful example plugin: it reverses the
sort order of the output.
import rawdoglib.plugins
def backwards(rawdog, config, articles):
articles.sort()
articles.reverse()
return False
rawdoglib.plugins.attach_hook("output_sort_articles", backwards)
### option.py
This plugin shows how to handle a config file option.
import rawdoglib.plugins
def option(config, name, value):
if name == "myoption":
print "Test plugin option:", value
return False
else:
return True
rawdoglib.plugins.attach_hook("config_option", option)