Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(maven): Remove unnecessary HTML page fetches #32662

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

zharinov
Copy link
Collaborator

Changes

It doesn't seem necessary now when we're able to obtain all necessary information by querying the corresponding POM file of a particular package version.

Context

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • Both unit tests + ran on a real repository

@zharinov zharinov requested review from rarkins and viceice and removed request for rarkins November 21, 2024 18:57
@zharinov
Copy link
Collaborator Author

Currently, for every package we're fetching:

  • maven-metadata.xml
  • GET the latest POM file (for homepage/sourceUrl)
  • HEAD the POM file of the selected/filtered release (often matches with GET of the latest one)
  • [Maven Central only] HTML index page

If first 3 are done correctly, seems like we don't need the latter one.

@rarkins rarkins requested a review from Churro November 22, 2024 12:29
Copy link
Collaborator

@Churro Churro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, for every package we're fetching:

* `maven-metadata.xml`
* GET the latest POM file (for `homepage`/`sourceUrl`)
* HEAD the POM file of the selected/filtered release (often matches with GET of the latest one)
* [Maven Central only] HTML index page

If first 3 are done correctly, seems like we don't need the latter one.

If I'm not mistaken "GET the latest POM file (for homepage/sourceUrl)" is done after populating the list of releases, i.e.., any release that is unknown at this point won't be pulled. Hence, the "[Maven Central only] HTML index page" step is actually second, not last. Do you agree?

let releaseMap = await this.fetchReleasesFromMetadata(dependency, repoUrl);
releaseMap = await this.addReleasesFromIndexPage(
releaseMap,
const releaseMap = await this.fetchReleasesFromMetadata(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this could have an impact for those packages on Maven Central which have no maven-metadata.xml? See https://maven.apache.org/repository/central-metadata.html

To alleviate such cases, would it work to addReleasesFromIndexPage only if releaseMap is still empty after fetchReleasesFromMetadata instead of doing it always? It would also mean one index.html query for packages that don't exist on Maven Central.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but we recently discovered it didn't work for a while, and no one suffered anyways except the fact it always produced 404s

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And order is different, just like you described, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants