Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Traffic Causes Incorrect Handle Redirection (HTTP 200 instead of 301) #2554

Open
bauermeisterdifu opened this issue Oct 20, 2023 · 2 comments
Labels
bug component: SEO Search Engine Optimization help wanted Needs a volunteer to claim to move forward high priority performance / caching Related to performance, caching or embedded objects

Comments

@bauermeisterdifu
Copy link

During periods of high demand, handle redirects for existing items are not functioning as expected. Normally, handle requests are properly redirected with a 301 status code. However, under heavy load, handle requests are returned as HTTP 200 with an empty page content, which deviates from the expected behavior.

Steps to Reproduce:
I used the following Bash script to simulate high traffic. This script makes wget requests to DSpace handles and logs if a 200 status code is returned.

#!/bin/bash
# Log file to save 200 status codes
log_datei="wget_log.txt"

# Function to execute wget and check for status 200
check_status() {
    if wget --no-check-certificate -S -O /dev/null "https://dspace.<domain>.de/handle/<nr>" 2>&1 | grep -q 'HTTP/1.1 200 OK'; then
        echo "[$(date)] 200 Status code received" >> "$log_datei"
    fi
}

# Number of parallel requests
num_requests=40

# Main loop
while true; do
    for ((i=1; i<=$num_requests; i++)); do
        check_status &
    done
    wait
done

(Run the script and monitor the log file (wget_log.txt) for occurrences of the 200 status code.)

Expected Behavior:
Handle URLs should redirect with a 301 status code, pointing to the correct destination URL.

Actual Behavior:
During periods of high demand, handle requests return an empty page with an HTTP 200 status code.

Environment:
DSpace version: 7.6
Operating System: Debian 11
Web Server: Nginx 1.18.0
Database: Postgres v13.11
OpenJDK Version: 11.0.20

This problem renders Server Side Caching (SSR) unusable, since responses with code 200 are always cached.

@bauermeisterdifu bauermeisterdifu added bug needs triage New issue needs triage and/or scheduling labels Oct 20, 2023
@github-project-automation github-project-automation bot moved this to 🆕 Triage in DSpace Backlog Oct 20, 2023
@tdonohue
Copy link
Member

tdonohue commented Oct 20, 2023

This needs more details / investigation. Ideally, we'd find a way to check if an error is logged at the point where a 200 OK occurs. My immediate suspicion is that something in the SSR (server side rendering) process is erroring out, or maybe somehow improperly accessing the SSR cache?

May be related to changes in #2331 (which fixed #2265), simply because that PR is where we fixed the bug where the /handle/[prefix]/[suffix] redirects ALWAYS returned a 200 OK. They now properly return a 301, but maybe there's a scenario we missed that it still reverts to a 200 OK?

Moving this over to our 7.6.x maintenance board... but it will require both a volunteer & more investigation to locate the underlying problem. From looking at the code, it's really not obvious how a 301 Redirect could become a 200 OK under "high traffic".

@tdonohue tdonohue added help wanted Needs a volunteer to claim to move forward component: SEO Search Engine Optimization high priority performance / caching Related to performance, caching or embedded objects and removed needs triage New issue needs triage and/or scheduling labels Oct 20, 2023
@bram-atmire
Copy link
Member

On https://demo.dspace.org/ running DSpace 7.6.2-SNAPSHOT I can consistently still reproduce behaviour that we're getting HTTP 200 served when it shouldn't:

» curl --head https://demo.dspace.org/handle/10673/1150/browse
HTTP/2 200

» curl --head https://demo.dspace.org/handle/10673/1150/search-filter
HTTP/2 200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: SEO Search Engine Optimization help wanted Needs a volunteer to claim to move forward high priority performance / caching Related to performance, caching or embedded objects
Projects
Development

No branches or pull requests

3 participants