Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Load Times When Editing an Item or Browsing DSpace #3584

Closed
aseyedia opened this issue Oct 28, 2024 · 22 comments · Fixed by #3677
Closed

Infinite Load Times When Editing an Item or Browsing DSpace #3584

aseyedia opened this issue Oct 28, 2024 · 22 comments · Fixed by #3677
Assignees
Labels
affects: main Issue impacts "main" (latest release). affects: 8.x Issue impacts 8.x releases bug error handling How errors are handled from REST API high priority performance / caching Related to performance, caching or embedded objects
Milestone

Comments

@aseyedia
Copy link
Contributor

aseyedia commented Oct 28, 2024

Describe the bug

This is a bug I have been struggling with for a long time before I realized I could replicate it on the demo site. Here it is.

Screen.Recording.2024-10-28.at.10.20.43.AM.mov

When browsing DSpace Angular as a logged-in administrator and particularly when navigating the "Edit Item" tab(s) (or even just clicking "Edit Item" once), we experience persistent issues with infinite load times and sluggishness in the DSpace frontend. Often this will require the user (admin) to refresh more than once to get the frontend to work, and this issue adds a considerable amount of time to the use of DSpace.

Sometimes, what I believe to be the same underlying issue will produce the "Error Loading Communities" error when navigating the homepage.

There's an idiosyncrasy with this bug: It will not (or will rarely) reproduce when Chrome DevTools is open. It doesn't matter if you pop the window out, or don't have the Network tab open. It just won't happen (as often). This makes debugging incredibly difficult because you can't even see what is being printed in the console. Regardless, I have taken the time to fix some of the issues showing up in the console (e.g. messages having to do with ePerson misconfiguration), and disabling some of the customizations we've made to our site, still to no avail. This also made it difficult to detect the issue with Angular DevTools. I couldn't find anything in the backend log either, even after tinkering with the log4j configuration.

Something else that makes this issue difficult to diagnose is that it tends to occur more frequently and is more severe on a production instance than a development instance of DSpace. For example, whereas I may only run into this issue once after refreshing, on the production instance, I will run into this issue and have to keep refreshing multiple times. Angular DevTools, which I have attempted to use to understand this issue better, does not work on a production instance, only development instances.

Environment

Like I said, this issue is reproducible on the demo site, but I'm including some potentially important information about our environment down below:

  • DSpace Version: 8.0
  • Web Browser: Google Chrome 129.0.0.0
  • User Operating System: macOS Sonoma 14.7

To Reproduce

Steps to reproduce the behavior:

  1. Do not open Chrome DevTools.
  2. Log in as admin.
  3. Click on any item.
  4. Click on 'Edit this Item.'
  5. If you haven't encountered the error yet, click around on the tabs until you eventually run into the infinite loading issue.

Expected behavior

I don't mind an occasional hiccup, but this bug is so persistent on our production instance that it's unrelenting and makes actually using DSpace incredibly cumbersome. Our admin has to repeatedly refresh the page in order to get it to work, and according to her, something that should only take 30 seconds will take 10 minutes.

So the expected behavior is the occasional hiccup and not the complete inability to consistently use the app.

Related work

I have made a couple of issues before I knew this was reproducible on the demo site, so not all of them contain strictly relevant information, but I have listed them below, as well as any PRs or other issues that I think might be of importance:

@aseyedia aseyedia added bug needs triage New issue needs triage and/or scheduling labels Oct 28, 2024
@github-project-automation github-project-automation bot moved this to 🆕 Triage in DSpace Backlog Oct 28, 2024
@tdonohue
Copy link
Member

tdonohue commented Oct 28, 2024

Thanks @aseyedia for the additional details!

I've finally reproduced it on https://demo.dspace.org using Chrome on Windows, and can confirm it only occurs when Chrome DevTools is NOT open. I reproduced it the same way that you did, by editing an Item & clicking on various tabs in the "Edit Item" page quickly in succession.

That said, I've noticed a small pattern. For me, the "hanging" behavior seems to always happen on the tab you click on immediately after clicking on the Bitstreams tab. So, if you click to "Bitstreams" immediately, then the next tab hangs. If you avoid clicking on "Bitstreams" then the hanging will not occur for a while (but I've still seen it pop up eventually).

I've also noticed that, if I open Chrome Dev Tools, and click on the "Bitstreams" tab, I see an EmptyError error in the Chrome DevTools Console that says:

Error {stack: 'Error\n    at https://demo.dspace.org/main.5c6b955e…mo.dspace.org/main.5c6b955e5daf2530.js:1:1882402)', name: 'EmptyError', message: 'no elements in sequence'}

This error seems to occur when the "Bitstreams" tab has no bitstreams listed. In other words, the Item has no Bitstreams.

I don't believe this is the actual bug in all scenarios, but it seems to be an example of one scenario where the page "hangs" indefinitely (because an error occurred in the DevTools). This implies to me that the DSpace UI is not handling or recovering from errors properly -- as I think the UI is getting "stuck" on the loading image when an error occurs. In other words, the UI is not overwhelmed or waiting on a response, but has gotten stuck on the loading image.

That said, it's still very odd to me that, when DevTools is open, the "hanging" never occurs.

Needs a volunteer to investigate further. Flagging a "high priority" as this definitely seems like a bug that can be very annoying. The only "workaround" appears to be to reload the page in the browser.

@tdonohue tdonohue added high priority error handling How errors are handled from REST API and removed needs triage New issue needs triage and/or scheduling labels Oct 28, 2024
@tdonohue tdonohue added performance / caching Related to performance, caching or embedded objects affects: main Issue impacts "main" (latest release). affects: 8.x Issue impacts 8.x releases labels Oct 28, 2024
@kshepherd
Copy link
Member

kshepherd commented Oct 29, 2024

A note that I've encountered this in 7.6 and 8.x as well, additionally when switching from full to simple item view. Most of my DSpace activity is in local, dev-mode instances. I have not managed to find any patterns yet but I'll keep the error @tdonohue noted in mind when I'm next looking

@aseyedia
Copy link
Contributor Author

Here's a video of this issue happening right off the bat on the demo site.

Screen.Recording.2024-10-29.at.10.45.53.AM.mov

It's not in the video, but all I did was log into the demo site as the admin and click on the first available item on the home page.

@nibou230
Copy link

nibou230 commented Nov 4, 2024

Hi, we have the same issue in our 8.0 installation. It's quite frequent and really annoying as it's unpredictable.

@jlipka
Copy link

jlipka commented Nov 5, 2024

Hi everybody.

Could someone please test the following change? With this change (which of course is not the cause of the problem, but triggers it), I always get all click sequences to work correctly without getting stuck.

File: src/app/shared/dso-page/dso-edit-menu-resolver.service.ts
this.correctionTypeDataService.findByItem(dso.uuid, false).pipe(
to
this.correctionTypeDataService.findByItem(dso.uuid, true).pipe(

With this change, the resolver can give a positive response to the router and the routing will continue. Otherwise it gets stuck here ("NavigationEnd" event is never reached), and the loading animation will stay visible.

The underlying problem could be this service (file: base-data.service.ts) and its caching. In particular, the 'findListByHref' and 'findByHref' methods seem to cause problems, and in the end lead to an endless chain of NGRX actions.

I appreciate any feedback.

@kshepherd
Copy link
Member

Could someone please test the following change? With this change (which of course is not the cause of the problem, but triggers it), I always get all click sequences to work correctly without getting stuck.

I've been working on the simple/full item page stalls, cleaning up some subscriptions etc., and I'm just testing your change above and so far it's working really well, both in clicking around the item status pages, clicking back to the item page, and switching from simple/full item pages (which also call the service method)

As you say, it may be a mitigation more than a 'fix' but I think it helps steer us in the direction of where things are going wrong...

@jlipka
Copy link

jlipka commented Nov 5, 2024

Another bug, directly related to the base-data.service.ts is the following:

.findById(action.payload.submissionId, true, false, followLink('item')).pipe(

If you look at the line mentioned above and change the ‘useCachedVersionIfAvailable’ property from true to false, there is an endless request loop if you open a submission and edit any section. (If necessary, open the DEV tools of the browser and follow the requests in the network tab, or look at the list of actions in the Redux devtool. There are thousands of actions triggered).

Of course, this is not about changing the property and the behaviour in the effects class, I just want to point out that the DataService does not seem to work correctly in the end. Unfortunately, I have not yet been able to find a solution to this.

@kshepherd
Copy link
Member

kshepherd commented Nov 5, 2024

@jlipka can you do some testing with the following skipWhile statements in findByHref and findByHrefList commented out, and with your other useCachedVersionifAvailable changes reverted?

    const response$: Observable<RemoteData<T>> = this.rdbService.buildSingle<T>(requestHref$, ...linksToFollow).pipe(
      // This skip ensures that if a stale object is present in the cache when you do a
      // call it isn't immediately returned, but we wait until the remote data for the new request
      // is created. If useCachedVersionIfAvailable is false it also ensures you don't get a
      // cached completed object
      skipWhile((rd: RemoteData<T>) => rd.isStale || (!useCachedVersionIfAvailable && rd.hasCompleted)),
....

In my testing just now, commenting out the skipWhile seems to also keep things stable.

I'm thinking that some remote data response is not (being marked as) completed, and this is where the indefinite loading comes -- setting useCachedVersionIfAvailable to true makes this rd.isStale || false so i think the is stale flag is OK.

(that is, if my testing is accurate and it provides the same 'fix' as forcing useCachedVersion to true...)

@kshepherd
Copy link
Member

I've also noticed that, if I open Chrome Dev Tools, and click on the "Bitstreams" tab, I see an EmptyError error in the Chrome DevTools Console that says:

Error {stack: 'Error\n    at https://demo.dspace.org/main.5c6b955e…mo.dspace.org/main.5c6b955e5daf2530.js:1:1882402)', name: 'EmptyError', message: 'no elements in sequence'}

This error seems to occur when the "Bitstreams" tab has no bitstreams listed. In other words, the Item has no Bitstreams.

Small update - it's no 'bundles' rather than no bitstreams, but still seems like a condition that should be handled more gracefully

@jlipka
Copy link

jlipka commented Nov 5, 2024

Can you do some testing with the following `skipWhile' statements commented out in findByHref and findByHrefList?

@kshepherd
This also leads to unwanted behaviour as far as I can see -- I hope my system is still stable after some testing:
Newly added relationships (e.g. a new author relationship) in the submission are not displayed in the UI after commenting out the "skipWhile" line. After a browser refresh, the new relationship becomes visible.

And, I'm not sure if removing this line ("skipWhile") alone is the solution... I think it was added for a good reason.

// This skip ensures that if a stale object is in the cache when you make a
// call, we don't return it immediately, but wait until the remote data for the new request has been
// is created.

@kshepherd
Copy link
Member

@jlipka thanks, my testing must have been off / coincidental. I agree that this skipWhile statement has an important function, I just saw it as a potential place for silent infinite 'skips' to occur

@tdonohue tdonohue moved this from 📋 To Do to 🏗 In Progress in DSpace 8.x and 7.6.x Maintenance Nov 7, 2024
@saschaszott
Copy link
Contributor

saschaszott commented Nov 7, 2024

I was able to repeatedly reproduce the bug on this item (https://demo.dspace.org/items/023b36c4-bf3d-403d-a9d2-0ccf1e22a488) when Chrome dev tools view is open.

In any case the last REST API request (GET) from the UI is https://demo.dspace.org/server/api/config/correctiontypes/search/findByItem?uuid=023b36c4-bf3d-403d-a9d2-0ccf1e22a488

The backend response (response code 200) contains this JSON

{
  "_embedded" : {
    "correctiontypes" : [ {
      "id" : "request-withdrawn",
      "topic" : "REQUEST/WITHDRAWN",
      "type" : "correctiontype",
      "_links" : {
        "self" : {
          "href" : "https://demo.dspace.org/server/api/config/correctiontypes/request-withdrawn"
        }
      }
    } ]
  },
  "_links" : {
    "self" : {
      "href" : "https://demo.dspace.org/server/api/config/correctiontypes/search/findByItem?uuid=023b36c4-bf3d-403d-a9d2-0ccf1e22a488"
    }
  },
  "page" : {
    "size" : 20,
    "totalElements" : 1,
    "totalPages" : 1,
    "number" : 0
  }
}

image

@saschaszott
Copy link
Contributor

The first operator of RxJS can throw an EmptyError (which can be seen in the JS console):

Delivers an EmptyError to the Observer's error callback if the Observable completes before any next notification was sent. This is how first() is different from take(1) which completes instead.

https://rxjs.dev/api/operators/first

@aseyedia
Copy link
Contributor Author

aseyedia commented Nov 7, 2024

I was able to repeatedly reproduce the bug on this item (https://demo.dspace.org/items/023b36c4-bf3d-403d-a9d2-0ccf1e22a488) when Chrome dev tools view is open.

Odd, I wonder why I had issues getting it to reproduce with the dev tools window... It was way less frequent.

@saschaszott
Copy link
Contributor

The problem still exists in the main branch. I was able to reproduce the odd behaviour on my local dev instance (running the latest code of branch main). I could not observe the EmptyError that occurs in the DS demo instance:

image

It is worth noting that we can see the same request pattern as above. The last HTTP request is /server/api/config/correctiontypes/search/findByItem?uuid=:id.

@saschaszott
Copy link
Contributor

saschaszott commented Nov 8, 2024

I cannot reproduce the bug, when I remove the last argument in combineLatest in method DSOEditMenuResolverService: getItemMenu and remove both entries withdrawn-item and reinstate-item from the result array (correction is used in both entries):

this.correctionTypeDataService.findByItem(dso.uuid, false).pipe(

Indeed, this modification is not a bugfix, but it shows the root cause of the bug.

@saschaszott
Copy link
Contributor

@atarix83 , as this is relatively new code (added in DS 8.0, PR #2871): Do you see any potential problems with the correctionTypeDataService in this context?

@artlowel
Copy link
Member

artlowel commented Nov 8, 2024

One problem is that it bypasses the cache completely, to render the menu. So we do a bunch of needless requests on the off chance that the state has changed. The better way to do it is to use the cached version, but invalidate the cache whenever the state can change.

That said, even if you bypass the cache, and do super quick re-refreshes the dataservices should still work in a predictable way.

My current theory about why this issue happens is that by switching tabs quickly you're doing that correctionType request so quickly that the second one fires before the server had the time to respond to the first one. There are already measures in place to deal with this, but it's possible they don't work in 100% of cases. We're looking in to it

@jlipka
Copy link

jlipka commented Nov 8, 2024

Another try to find a possible solution without changing logics etc:

distinctUntilKeyChanged('lastUpdated'),

Try to replace the above line with something like this

distinctUntilChanged(isEqual),.

I used the Lodash "isEqual" operator import.

  • Do you still have the problems navigation from one page to another?
  • Does this modification lead to other problems?

A side note:
On the other hand this change leads to more NGRX actions being triggered. In my case and example, opening a submission form leads to about 1050 actions instead of 850 triggered actions before (depends of the contents and relationships of your submission) - we might open another bug or research ticket due to duplicate actions being triggered multiple times (without any need I think?).

@saschaszott
Copy link
Contributor

@jlipka , @artlowel : there is a new PR #3585 that solves this issue (and #3393 as well).

@artlowel
Copy link
Member

artlowel commented Nov 8, 2024

@saschaszott I think that PR simply hides the symptoms, but doesn't fix the underlying issue.

@jlipka is on to something. We'd also concluded that distinctUntilKeyChanged line was involved. If you roll back this commit, it's fixed as well: 9e31f73

However that commit was added for a reason: to fix an issue where objects would be emitted again when their dependencies were updated (but there were no other changes). So we're looking into an alternative way to fix that problem now.

@saschaszott
Copy link
Contributor

saschaszott commented Nov 8, 2024

@artlowel , you're right. The changes in PR #3585 do not solve the problem. I've merged the changes from the PR into main and the "infinite load" problem still occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: main Issue impacts "main" (latest release). affects: 8.x Issue impacts 8.x releases bug error handling How errors are handled from REST API high priority performance / caching Related to performance, caching or embedded objects
Projects
Development

Successfully merging a pull request may close this issue.

7 participants