Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Areas - the list of Collections IDs is incomplete #676

Open
min2ha opened this issue May 26, 2022 · 14 comments
Open

Collection Areas - the list of Collections IDs is incomplete #676

min2ha opened this issue May 26, 2022 · 14 comments
Assignees

Comments

@min2ha
Copy link
Contributor

min2ha commented May 26, 2022

Unassigned collections exist.

Attention is needed on Collection (example):
https://www.webarchive.org.uk/act/collections/369

Data source used for T&T on UI:
https://www.webarchive.org.uk/act/collections/allCollectionAreasAsJson/0

Collection Area is showing as blank on the Collection View pages!?

@min2ha
Copy link
Contributor Author

min2ha commented May 26, 2022

we have a list of IDs for area 'Currently Working On' as well, but we don't expose it:
{"key":2945,"title":"Currently Working On","url":"/act/taxonomy/2945","select":false,"children":null,"collections_ids":[2946,3064,3098,3188,3866,4089]}

@min2ha min2ha changed the title Collection Areas - the list of Collections IDs is too short Collection Areas - the list of Collections IDs is incomplete May 26, 2022
@nicolabingham
Copy link

@jasonwebber-bl both the New Media Writing Prize Collection and Climate Change are now visible under their respective higher-level categories on the website. Did you check them into the higher level categories (Collection Areas) in ACT as I didn't do this?

@crarugal
Copy link

crarugal commented Jun 1, 2022

Investigating Collection Area for "Science, Technology & Medicine":
https://www.webarchive.org.uk/act/collections/list?s=title

Supposedly with a count of 31

- however, only 29 collections are listed:

image
These are the 29 collections displayed, when "Science, Technology & Medicine" filter is applied
image

There are two missing collections in the filtered view, highlighted in yellow:
image

The reason for them not being shown, is because they sub sub-collections:
https://www.webarchive.org.uk/act/collections/2946
image

https://www.webarchive.org.uk/act/collections/4168
image

It's currently unclear how they are being tagged into the "Collection Areas". When you view the ACT record for the collection, the "Collection Areas" is blank. Even though it's displayed when filtering by "Science, Technology & Medicine"
Aging collection: https://www.webarchive.org.uk/act/collections/2367

image

Filtered view:
image

taxonomy table "Aging" collection info
image
image

These are the Collection Areas:
image
image

Where does the count of 31 come from?
It comes from the taxonomy_parents_all table:
image
image

SQL

Select
    taxonomy_parents_all.taxonomy_id,
    taxonomy_parents_all.parent_id
From
    taxonomy_parents_all
Where
    taxonomy_parents_all.taxonomy_id = 2938

This is why 31 results are returned:
image

@crarugal
Copy link

crarugal commented Jun 7, 2022

in the Topics and Themes page, when filtering by "Science, Technology & Medicine"only 15 collections are being presented, when there should be 31:
![image](https://user-images.githubusercontent.com/18530934/172342411-db477b53-1d67-4e7c-a552-c36237541ba8.png

The highlighted collections are the 15 being presented:. The remaining 16 are the ones that are missing, according to ACT
image

The 31 collections tagged into "Science, Technology & Medicine"
image

@crarugal
Copy link

crarugal commented Jun 7, 2022

https://www.webarchive.org.uk/act/collections/allCollectionAreasAsJson/7

If we look at the list that's being pulled
image

And compare that to the 31 collections in ACT (green highlight=json list, yellow highlight =15 presented, we can see that both lists are the same:
image

@jasonwebber-bl
Copy link

Collections that are live but not viewable (missing) on T&T:

910 - Brexit
2456 - Credit crunch
629 - District councils
469 - Easter rising
689 - Scottish elections 2016
3866 - Duke of edinburgh
331 - Family history
370 - Forth bridge
990 - FTSE 100
9 - Health and social
520 - IT Collection
65 - Scottish Ind
3064 - Startup
851 - Queens birthday 2016
60 - UK Gen election 2015
283 - UK response, typhoon
3098 - UK Retail
2778 - Unfinished business
471 - VE day

@nicolabingham
Copy link

@jasonwebber-bl I tried to tag UK General Election 2015 into a higher level subject category but it is already checked as belonging to 'Politics'

@nicolabingham
Copy link

I have unchecked and checked the collection as belonging to 'Politics'. Let's review tomorrow to see if it appears on the UI

@anjackson
Copy link
Contributor

Hmm, looking at the Solr database, it seems like the UK General Election 2015 collection is there and the data looks right, i.e. it looks like the other entries....

...
      {
        "id":"2798",
        "type":"collection",
        "name":"UK General Election 2019",
        "description":"A collection of websites representing the 2019 UK General Election. ",
        "collectionAreaId":[2941]},
      {
        "id":"60",
        "type":"collection",
        "name":"UK General Election 2015",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, focussing on the 2015 UK General Election which was held on 7 May 2015 to elect 650 members to the House of Commons. It was the first general election at the end of a fixed-term Parliament. \n",
        "collectionAreaId":[2941]},
      {
        "id":"2453",
        "type":"collection",
        "name":"UK General Election 2005",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, archived during and immediately after the UK general election campaign of 2005. The collection comprises a sample of candidate�s campaign sites and weblogs, local and national party sites, opinion polls, news and commentary, and the manifestos of a range of interest groups.",
        "collectionAreaId":[2941]},
      {
        "id":"1233",
        "type":"collection",
        "name":"UK General Election 2017",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, focussing on the United Kingdom general election of 2017 which took place on Thursday 8 June. Under the Fixed-term Parliaments Act 2011 an election had not been due until 7 May 2020, but a call by Prime Minister Theresa May for a snap election was ratified by the necessary supermajority in a 522-13 vote in the House of Commons on 19 April 2017.",
        "collectionAreaId":[2941]}
...

So it seems maybe this is an issue with how Solr/UKWA-UI is generating/processing the collection hierarchy.

@min2ha
Copy link
Contributor Author

min2ha commented Jun 9, 2022

so we have a data flow chain:
ACT DB -> middleware -> SOLR -> UI

it's about a time to check middleware then.

(Reminder: Collections' data organisation in ACT DB is hierarchical (i.e. tree structure: Collection in Collection etc.). Areas point to lists of top collections only. Top Collection may be assigned to more than one area)

@jasonwebber-bl
Copy link

Just to confirm: UK Gen Election 2015 (the collection that Nicola unticked and ticked again yesterday) didn't appear today.

@anjackson
Copy link
Contributor

@min2ha as per #676 (comment) as far as I can tell, the data in Solr looks right. I think the issue is in UKWA-UI.

@min2ha
Copy link
Contributor Author

min2ha commented Jun 9, 2022

@min2ha as per #676 (comment) as far as I can tell, the data in Solr looks right. I think the issue is in UKWA-UI.

Thanks Andy!
I'll check against full SOLR Collection data then, not data for testing only.

@anjackson
Copy link
Contributor

No worries @min2ha - I wrote it up as ukwa/ukwa-ui#353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants