pakistan_ppra_*: 503 (2023-05) Parse HTML listings? #1014

yolile · 2023-05-20T01:11:23Z

https://www.ppra.org.pk/api/index.php/api/release,
https://www.ppra.org.pk/api/index.php/api/records and
https://www.ppra.org.pk/api/index.php/api

return 503

yolile · 2023-10-20T18:26:41Z

Note that if you go to https://www.ppra.org.pk/ and then click:

There is a list of tenders in OCDS format:

But the "download all" button doesn't work.

We could scrape https://www.ppra.org.pk/opendata.asp?PageNo=1 to get the list of links to download, e.g https://www.ppra.org.pk/ocds.asp?id=523047

jpmckinney · 2023-10-20T18:39:36Z

Looks like they have 87 pages currently: https://www.ppra.org.pk/opendata.asp?PageNo=87

Retrievable in full by software, either by using an HTML page listing bulk download URLs, or by using machine-readable data as the only input.

They don't really pass this one, as we mean a single HTML page listing (with links to bulk downloads).

If we think there's value, however, we can add it.

jpmckinney · 2024-08-19T15:53:45Z

@yolile Should we remove Pakistan? Scraping links from individual HTML pages seems to be the only way (https://www.ppra.org.pk/opendata.asp?PageNo=1) but that doesn't meet our minimum criteria for inclusion in Collect.

If we remove it from Collect, we can delete the Publication in the registry, since it has never succeeded in obtaining a collection.

yolile · 2024-08-19T19:13:46Z

Sounds good.

@allakulov, could you inform Carey about this so that we can decide whether to reach out to Pakistan and try to make them fix this?

allakulov · 2024-08-21T12:39:56Z

I have informed Carey and we are following up with PPRA. I'll keep you posted.

jpmckinney · 2024-10-29T20:21:53Z

@allakulov Any news?

yolile added existing spider unavailable The data source is entirely unavailable labels May 20, 2023

jpmckinney removed the unavailable The data source is entirely unavailable label Apr 10, 2024

jpmckinney changed the title ~~pakistan_ppra_* no longer available~~ pakistan_ppra_*: Parse HTML listings Apr 10, 2024

jpmckinney changed the title ~~pakistan_ppra_*: Parse HTML listings~~ pakistan_ppra_*: Parse HTML listings? Aug 19, 2024

jpmckinney added the unavailable The data source is entirely unavailable label Aug 19, 2024

allakulov closed this as completed Aug 21, 2024

allakulov reopened this Aug 21, 2024

jpmckinney changed the title ~~pakistan_ppra_*: Parse HTML listings?~~ pakistan_ppra_*: 503 (2023-05) Parse HTML listings? Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pakistan_ppra_*: 503 (2023-05) Parse HTML listings? #1014

pakistan_ppra_*: 503 (2023-05) Parse HTML listings? #1014

yolile commented May 20, 2023

yolile commented Oct 20, 2023

jpmckinney commented Oct 20, 2023

jpmckinney commented Aug 19, 2024 •

edited

Loading

yolile commented Aug 19, 2024

allakulov commented Aug 21, 2024

jpmckinney commented Oct 29, 2024

pakistan_ppra_*: 503 (2023-05) Parse HTML listings? #1014

pakistan_ppra_*: 503 (2023-05) Parse HTML listings? #1014

Comments

yolile commented May 20, 2023

yolile commented Oct 20, 2023

jpmckinney commented Oct 20, 2023

jpmckinney commented Aug 19, 2024 • edited Loading

yolile commented Aug 19, 2024

allakulov commented Aug 21, 2024

jpmckinney commented Oct 29, 2024

jpmckinney commented Aug 19, 2024 •

edited

Loading