-
-
Notifications
You must be signed in to change notification settings - Fork 70
Custom Downloader
This page introduce how to setup your custom downloader.
If the builtin PDF downloaders are not suitable for your research, you can write your own downloader.
A PDF downloader consists of three main functions: preProcess
, queryProcess
, and downloadImpl
. The return values of the preProcess
function usually are three elements: queryUrl, headers, enable
. queryProcess
request the queryUrl
to get the real download url. This entityDraft
will go through all enabled downloaders until the PDF is downloaded. downloadImpl
firstly calls the preProcess
, then does the network requesting, and finally download the PDF.
Open the preference window, click the downloader tab, click the +
button.
The default downloadImpl
function is:
const { queryUrl, headers, enable } = this.preProcess(
entityDraft
);
if (enable) {
const agent = this.getProxyAgent();
const downloadUrl = await this.queryProcess(queryUrl, headers, entityDraft);
if (downloadUrl) {
this.sharedState.set("viewState.processInformation", "Downloading...");
const downloadedUrl = await downloadPDFs([downloadUrl], agent);
if (downloadedUrl.length > 0) {
entityDraft.mainURL = downloadedUrl[0];
return entityDraft;
} else {
return null;
}
} else {
return null;
}
} else {
return null;
}
Usually, it is unnecessary to modify this function.
Let's use the built-in ArXiv downloader as an example.
const enable = entityDraft.arxiv !== "" && this.getEnable("arxiv");
let queryUrl;
queryUrl = `https://arxiv.org/pdf/${entityDraft.arxiv}.pdf`;
const headers = {
"user-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36",
};
if (enable) {
this.sharedState.set(
"viewState.processInformation",
`Downloading PDF from ArXiv ...`
);
}
This function firstly determines whether this downloader should be enabled or not. Here, if the entityDraft
has a valid arxiv
property and you've enabled this downloader in the preference window, the enable
would be true
.
After that, we construct the queryUrl
.
Finally, we send a message to Paperlib that your scraper are going to download the PDF of this paper.
return queryUrl;
The queryProcess
is very easy to understand. It just request the queryUrl
to get the real download url.
Here we directly return the queryUrl
since it is the real download url.