Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update robots.txt.ejs #2710

Merged
merged 1 commit into from
Apr 29, 2024
Merged

Update robots.txt.ejs #2710

merged 1 commit into from
Apr 29, 2024

Conversation

bram-atmire
Copy link
Member

@bram-atmire bram-atmire commented Dec 12, 2023

Fixes #2709

@tdonohue tdonohue added bug component: SEO Search Engine Optimization 1 APPROVAL pull request only requires a single approval to merge port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release labels Dec 12, 2023
@tdonohue tdonohue self-requested a review February 15, 2024 15:58
@tdonohue tdonohue added high priority performance / caching Related to performance, caching or embedded objects labels Mar 7, 2024
@tdonohue tdonohue added this to the 8.0 milestone Apr 29, 2024
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @bram-atmire ! Apologies it took so long to get back to this. But, this looks good to me. Also porting it to 7.x

@tdonohue tdonohue merged commit a8f65ce into main Apr 29, 2024
16 checks passed
@tdonohue tdonohue deleted the bram-atmire-patch-1-2709 branch April 29, 2024 21:29
@dspace-bot
Copy link
Contributor

Successfully created backport PR for dspace-7_x:

@tdonohue tdonohue removed the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label Apr 29, 2024
@alanorth
Copy link
Contributor

@bram-atmire Revisiting this. I was curious to validate whether Googlebot supported this syntax. I used google/robotstxt and was happy to see that Google's robots.txt parser does indeed recognize that it is not allowed to crawl those entity search facets:

$ ./robots robots.txt 'Googlebot' 'https://repository.edu/entities/person/808206c2-7e1a-4aeb-8950-ad37f67d6bf7?f.author=Prasanna,%20B.M.,equals&spc.page=1'
user-agent 'Googlebot' with URI 'https://repository.edu/entities/person/808206c2-7e1a-4aeb-8950-ad37f67d6bf7?f.author=Prasanna,%20B.M.,equals&spc.page=1': DISALLOWED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 APPROVAL pull request only requires a single approval to merge bug component: SEO Search Engine Optimization high priority performance / caching Related to performance, caching or embedded objects
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Avoid excess load of bots going into search facet links on entity pages
4 participants