Skip to content

Commit

Permalink
arxiv: use arxiv.org urls
Browse files Browse the repository at this point in the history
  • Loading branch information
DonHaul committed Oct 29, 2024
1 parent ff47b1c commit a3b6de5
Show file tree
Hide file tree
Showing 33 changed files with 86 additions and 86 deletions.
12 changes: 6 additions & 6 deletions docs/e2e_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Later on, we will add actual polling to see if the articles were harvested.
inspire_client.e2e.schedule_crawl(
spider='arXiv_single',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
identifier='oai:arXiv.org:1806.04664', # Non-core, will halt
)
Expand All @@ -98,9 +98,9 @@ After this all the requests (until disabling recording and/or switching the scen
current test session. Many of them (``test-indexer``, ``test-web-e2e.local``) are whitelisted and
not recorded. You might notice a few requests to ArXiv like so:

* ``GET http://export.arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai...``
* ``GET http://export.arxiv.org/pdf/1806.04664``
* ``GET http://export.arxiv.org/e-print/1806.04664``
* ``GET https://arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai...``
* ``GET https://arxiv.org/pdf/1806.04664``
* ``GET https://arxiv.org/e-print/1806.04664``

These are live interactions that are recorded, you can find them in
``tests/e2e/scenarios/arxiv_in_hp/ArxivService/``. If you need to re-record an interaction, simply
Expand Down Expand Up @@ -134,7 +134,7 @@ To make assertions we can use the ``inspire_client`` and more precisely its ``ho
inspire_client.e2e.schedule_crawl(
spider='arXiv_single',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
identifier='oai:arXiv.org:1806.04664',
)
Expand Down Expand Up @@ -184,7 +184,7 @@ We can then use the fixture in our test:
inspire_client.e2e.schedule_crawl(
spider='arXiv_single',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
identifier='oai:arXiv.org:1806.04664',
)
Expand Down
6 changes: 3 additions & 3 deletions inspirehep/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1232,9 +1232,9 @@
"CERN": "CDS Hidden",
"FERMILAB": "Fermilab"
}
ARXIV_PDF_URL = "http://export.arxiv.org/pdf/{arxiv_id}"
ARXIV_PDF_URL_ALTERNATIVE = "http://arxiv.org/pdf/{arxiv_id}"
ARXIV_TARBALL_URL = "http://export.arxiv.org/e-print/{arxiv_id}"
ARXIV_PDF_URL = "https://arxiv.org/pdf/{arxiv_id}"
ARXIV_PDF_URL_ALTERNATIVE = "https://export.arxiv.org/pdf/{arxiv_id}"
ARXIV_TARBALL_URL = "https://arxiv.org/e-print/{arxiv_id}"

ARXIV_CATEGORIES = {
'core': [
Expand Down
2 changes: 1 addition & 1 deletion inspirehep/modules/arxiv/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from __future__ import absolute_import, division, print_function


ARXIV_API_URL = 'http://export.arxiv.org/oai2'
ARXIV_API_URL = 'https://arxiv.org/oai2'

ARXIV_RESPONSE_CODES = {
'success': 200,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=physics&metadataPrefix=arXiv
url: https://arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=physics&metadataPrefix=arXiv
response:
body: !!binary |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/oai2?from=2018-03-26&verb=ListRecords&set=physics&metadataPrefix=arXiv
url: https://arxiv.org/oai2?from=2018-03-26&verb=ListRecords&set=physics&metadataPrefix=arXiv
response:
body: !!binary |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/e-print/1404.0579
url: https://arxiv.org/e-print/1404.0579
response:
body: !!binary |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/pdf/1412.0200
url: https://arxiv.org/pdf/1412.0200
response:
body: !!binary |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=q-bio&metadataPrefix=arXiv
url: https://arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=q-bio&metadataPrefix=arXiv
response:
body: !!binary |

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/e-print/1404.0579
url: https://arxiv.org/e-print/1404.0579
response:
body: !!binary |
H4sICDDSqFoCA3JldmlzZWQxNDAzMTgudGV4AOz9fWPbRpYmjv5fnwLdg0RkTCqS4nR33KP5Xdtp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/pdf/1404.0579
url: https://arxiv.org/pdf/1404.0579
response:
body: !!binary |
JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PC9MZW5ndGggNiAwIFIvRmlsdGVyIC9GbGF0ZURlY29k
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=physics&metadataPrefix=arXiv
url: https://arxiv.org/oai2?from=2018-03-25&verb=ListRecords&set=physics&metadataPrefix=arXiv
response:
body: !!binary |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/e-print/1806.05669
url: https://arxiv.org/e-print/1806.05669
response:
body: !!binary |
H4sICDDSqFoCA3JldmlzZWQxNDAzMTgudGV4AOz9fWPbRpYmjv5fnwLdg0RkTCqS4nR33KP5Xdtp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ request:
Connection: [keep-alive]
User-Agent: [python-requests/2.18.4]
method: GET
url: http://export.arxiv.org/pdf/1806.05669
url: https://arxiv.org/pdf/1806.05669
response:
body: !!binary |
JVBERi0xLjQKJcfsj6IKNSAwIG9iago8PC9MZW5ndGggNiAwIFIvRmlsdGVyIC9GbGF0ZURlY29k
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ request:
Host: [export.arxiv.org]
User-Agent: [python-requests/2.19.1]
method: GET
url: http://export.arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai%3AarXiv.org%3A1806.05669
url: https://arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai%3AarXiv.org%3A1806.05669
response:
body: !!binary |
H4sIAAAAAAAAA7VWXW/bNhR9168gjD60QGTZSRtkhqJiaJGmQIImbYZ1e6PlK4sLRaok5Y/9+p1L
Expand Down
10 changes: 5 additions & 5 deletions tests/e2e/test_arxiv_harvest.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def test_harvest_non_core_article_goes_in(inspire_client, mitm_client):
inspire_client.e2e.schedule_crawl(
spider='arXiv',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
sets='physics',
from_date='2018-03-25',
)
Expand Down Expand Up @@ -141,7 +141,7 @@ def test_harvest_core_article_goes_in(inspire_client, mitm_client):
inspire_client.e2e.schedule_crawl(
spider='arXiv',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
sets='physics',
from_date='2018-03-25',
)
Expand Down Expand Up @@ -192,7 +192,7 @@ def test_harvest_core_article_goes_in(inspire_client, mitm_client):
inspire_client.e2e.schedule_crawl(
spider='arXiv',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
sets='physics',
from_date='2018-03-26',
)
Expand Down Expand Up @@ -250,7 +250,7 @@ def test_harvest_core_article_manual_accept_goes_in(inspire_client, mitm_client)
inspire_client.e2e.schedule_crawl(
spider='arXiv',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
sets='q-bio',
from_date='2018-03-25',
)
Expand Down Expand Up @@ -318,7 +318,7 @@ def test_harvest_nucl_th_and_jlab_curation(inspire_client, mitm_client):
inspire_client.e2e.schedule_crawl(
spider='arXiv_single',
workflow='article',
url='http://export.arxiv.org/oai2',
url='https://arxiv.org/oai2',
identifier='oai:arXiv.org:1806.05669', # nucl-th record
)

Expand Down
2 changes: 1 addition & 1 deletion tests/integration/arxiv/fixtures/1305.0014.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-08-24T11:34:20Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.0014" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.0014" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<GetRecord>
<record>
<header>
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/arxiv/fixtures/1305.0014v1.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-08-24T11:44:53Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.0014v1" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.0014v1" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<error code="idDoesNotExist">
This OAI interface supports only the notion of an arXiv article and not access to individual versions. You must not include the 'v1' at the end of the identifier.
</error>
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/arxiv/fixtures/1305.9999.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-08-24T11:43:16Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.9999" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:1305.9999" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<error code="idDoesNotExist">
Identifier 'oai:arXiv.org:1305.9999' has correct form but does not correspond to an item in this repository
</error>
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/arxiv/fixtures/is-malformed.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-08-24T11:40:26Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:is-malformed" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:is-malformed" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<error code="idDoesNotExist">Malformed identifier `oai:arXiv.org:is-malformed'</error>
</OAI-PMH>
8 changes: 4 additions & 4 deletions tests/integration/arxiv/test_arxiv_views.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def log_in_as_scientist(users, app_client):
def test_arxiv_search_handles_the_response_when_the_request_is_valid(log_in_as_scientist, app_client):
with requests_mock.Mocker() as requests_mocker:
requests_mocker.register_uri(
'GET', 'http://export.arxiv.org/oai2',
'GET', 'https://arxiv.org/oai2',
text=pkg_resources.resource_string(
__name__, os.path.join('fixtures', '1305.0014.xml')),
)
Expand Down Expand Up @@ -97,7 +97,7 @@ def test_arxiv_search_handles_the_response_when_the_request_is_valid(log_in_as_s
def test_arxiv_search_handles_the_response_when_the_request_asks_for_a_malformed_id(log_in_as_scientist, app_client):
with requests_mock.Mocker() as requests_mocker:
requests_mocker.register_uri(
'GET', 'http://export.arxiv.org/oai2',
'GET', 'https://arxiv.org/oai2',
text=pkg_resources.resource_string(
__name__, os.path.join('fixtures', 'is-malformed.xml')),
)
Expand All @@ -116,7 +116,7 @@ def test_arxiv_search_handles_the_response_when_the_request_asks_for_a_malformed
def test_arxiv_search_handles_the_response_when_the_request_asks_for_a_non_existing_record(log_in_as_scientist, app_client):
with requests_mock.Mocker() as requests_mocker:
requests_mocker.register_uri(
'GET', 'http://export.arxiv.org/oai2',
'GET', 'https://arxiv.org/oai2',
text=pkg_resources.resource_string(
__name__, os.path.join('fixtures', '1305.9999.xml')),
)
Expand All @@ -135,7 +135,7 @@ def test_arxiv_search_handles_the_response_when_the_request_asks_for_a_non_exist
def test_arxiv_search_handles_the_response_when_the_request_asks_for_a_version(log_in_as_scientist, app_client):
with requests_mock.Mocker() as requests_mocker:
requests_mocker.register_uri(
'GET', 'http://export.arxiv.org/oai2',
'GET', 'https://arxiv.org/oai2',
text=pkg_resources.resource_string(
__name__, os.path.join('fixtures', '1305.0014v1.xml')),
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@
"filename":"2208.00828.pdf",
"fulltext":true,
"material":"preprint",
"original_url":"http://export.arxiv.org/pdf/2208.00828"
"original_url":"https://arxiv.org/pdf/2208.00828"
}
],
"references":[
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2015-11-05T16:30:03Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:1407.7587" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:1407.7587" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<GetRecord>
<record>
<header>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2015-11-05T16:30:03Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv.org:1511.01097" metadataPrefix="arXiv">http://export.arxiv.org/oai2</request>
<request verb="GetRecord" identifier="oai:arXiv.org:1511.01097" metadataPrefix="arXiv">https://arxiv.org/oai2</request>
<GetRecord>
<record>
<header>
Expand Down
4 changes: 2 additions & 2 deletions tests/integration/workflows/helpers/mocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

def fake_download_file(workflow, name, url):
"""Mock download_file_to_workflow func."""
if url == 'http://export.arxiv.org/e-print/1407.7587':
if url == 'https://arxiv.org/e-print/1407.7587':
workflow.files[name] = pkg_resources.resource_stream(
__name__,
os.path.join(
Expand All @@ -39,7 +39,7 @@ def fake_download_file(workflow, name, url):
)
)
return workflow.files[name]
elif url == 'http://export.arxiv.org/pdf/1407.7587':
elif url == 'https://arxiv.org/pdf/1407.7587':
workflow.files[name] = pkg_resources.resource_stream(
__name__,
os.path.join(
Expand Down
8 changes: 4 additions & 4 deletions tests/integration/workflows/test_article_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ def test_run_next_wf_is_not_starting_core_selection_wfs(
workflow = build_workflow(record, extra_data={"delay": 10})
mocked_external_services.register_uri(
"GET",
"http://export.arxiv.org/pdf/1802.08709.pdf",
"https://arxiv.org/pdf/1802.08709.pdf",
content=pkg_resources.resource_string(
__name__, os.path.join("fixtures", "1802.08709.pdf")
),
Expand All @@ -398,7 +398,7 @@ def test_run_next_wf_is_not_starting_core_selection_wfs(
)
mocked_external_services.register_uri(
"GET",
"http://export.arxiv.org/e-print/1802.08709.pdf",
"https://arxiv.org/e-print/1802.08709.pdf",
content=pkg_resources.resource_string(
__name__, os.path.join("fixtures", "1802.08709.pdf")
),
Expand Down Expand Up @@ -436,7 +436,7 @@ def test_run_next_wf_is_not_starting_core_selection_wfs(

mocked_external_services.register_uri(
"GET",
"http://export.arxiv.org/pdf/1802.08709.pdf",
"https://arxiv.org/pdf/1802.08709.pdf",
content=pkg_resources.resource_string(
__name__, os.path.join("fixtures", "1802.08709.pdf")
),
Expand All @@ -446,7 +446,7 @@ def test_run_next_wf_is_not_starting_core_selection_wfs(
)
mocked_external_services.register_uri(
"GET",
"http://export.arxiv.org/e-print/1802.08709.pdf",
"https://arxiv.org/e-print/1802.08709.pdf",
content=pkg_resources.resource_string(
__name__, os.path.join("fixtures", "1802.08709.pdf")
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ interactions:
Accept-Encoding: ['gzip, deflate']
Connection: [keep-alive]
method: GET
uri: http://export.arxiv.org/pdf/1407.7587
uri: https://arxiv.org/pdf/1407.7587
response:
body:
string: !!binary |
Expand Down Expand Up @@ -5419,7 +5419,7 @@ interactions:
Accept-Encoding: ['gzip, deflate']
Connection: [keep-alive]
method: GET
uri: http://export.arxiv.org/e-print/1407.7587
uri: https://arxiv.org/e-print/1407.7587
response:
body:
string: !!binary |
Expand Down Expand Up @@ -8590,7 +8590,7 @@ interactions:
Accept-Encoding: ['gzip, deflate']
Connection: [keep-alive]
method: GET
uri: http://export.arxiv.org/pdf/1407.7587
uri: https://arxiv.org/pdf/1407.7587
response:
body:
string: !!binary |
Expand Down
Loading

0 comments on commit a3b6de5

Please sign in to comment.