Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more details about http failures #270

Open
mernst opened this issue Apr 27, 2019 · 16 comments
Open

Provide more details about http failures #270

mernst opened this issue Apr 27, 2019 · 16 comments
Assignees
Labels

Comments

@mernst
Copy link
Contributor

mernst commented Apr 27, 2019

In a message like this one:

44 anchor href attribute checked, 1 broken external link found.
Unknown host with href=https://douglascayers.com/2015/05/30/how-to-set-custom-java-path-after-installing-jdk-8/

it would be helpful to give more details about the failure, such as the exact error code.
I am able to browse to that URL, and a HEAD request also seems works for me using curl, so I don't know exactly to diagnose the problem that htmlSanityCheck is suffering.

@gernotstarke gernotstarke self-assigned this Apr 27, 2019
@gernotstarke
Copy link
Member

right u are. Will look into it asap (might take a few days, as II'm currently on holiday)

@gernotstarke
Copy link
Member

(and I feel deeply honored that you care for my little library... although it by no means matches your high software quality standards...)

@gernotstarke
Copy link
Member

@mernst - could you please provide the context of the error message (at best the complete report)?

In the corresponding Checker subclass, the http status code is included in the output:

problem += """ ${href} returned statuscode ${responseCode}."""
checkingResults.addFinding(new Finding(problem))

And I verified that the URL you mentioned above returns status code 200...

Therefore, I need some additional info for debugging...
(alternatively, you can point me to the github repo where these checks cause problems).

regards, Gernot

@mernst
Copy link
Contributor Author

mernst commented Apr 28, 2019

Thanks for the quick response! To reproduce the error, please run:

git clone https://github.com/randoop/randoop.git
cd randoop
./gradlew htmlSanityCheck

There is just one error -- the one I quoted above.

Please let me know if you need any more information or if I can do anything to help.

Thanks for your help!

@gernotstarke
Copy link
Member

had to remove the html5Validator requirement - as the (normal) build requires that prerequisite...

actually there are two distinct problems here:

  1. overly short and ineffective error reporting, which does not help in identifying underlying problem.

  2. the URL https://douglascayers.com/2015/05/30/how-to-set-custom-java-path-after-installing-jdk-8/ leads to an unknown-host-exception, which is definitely wrong. I'll open a new issue for that (valid URL yields "unknown host with href" message #272)

I'm investigating...

@mernst
Copy link
Contributor Author

mernst commented Apr 28, 2019

had to remove the html5Validator requirement - as the (normal) build requires that prerequisite...

Currently, the Randoop build requires installation of html5validator. Sorry I didn't mention that.

I'm investigating replacing that by htmlSanityCheck.

@gernotstarke
Copy link
Member

Thanx to the hint from @double16 I'll try to come up with a more relaxed HttpsUrlConnection class, that does not try to validate the certificates...

@PirMei
Copy link

PirMei commented Dec 12, 2024

Hi
we're currently experiencing the same behaviour for an internal documentation project with ~2100 hrefs which get checked by htlmSanityCheck in a GitLabCI Pipeline.
Most of these hrefs point to the same host (e.g. https://doc-host.internal/differenturis.html) and most checks succeed.
However, in each job run there is at least one error of Unknown host for a random href (different one each time) despite all other checks for the same host succeeding.

First we have tried establishing a DNS cache in the pipeline because we suspected an intermittent DNS resolving issue.
Reading #272 , it seems that "Unknown host" in this context is not always a DNS issue.

In the linked issue there was a problem with certificate verification.
That should not be the case here, as it's always the same host (and port) with the same certificate being checked.

Additional error details for this kind of error would be much appreciated!

@ascheman
Copy link
Member

Thanks for adding your comment, @PirMei.

Additional error details for this kind of error would be much appreciated!

Is this a requirement for the component, that it should show more context information when the error occurs?

BTW: Which version of HSC are you using? Did you see that we come up with a version 2, which currently is more like a modularisation of the project? Find more info on https://hsc.aim42.org.

@ascheman
Copy link
Member

And one additional question @PirMei : How do you run it in your Gitlab CI pipeline? Do you use some container? If yes, what is the base (OS, JDK, Gradle, etc.)?

@PirMei
Copy link

PirMei commented Dec 12, 2024

@ascheman Thanks for the quick reply 👍

We're currently using htmlSanityCheck gradle plugin version 1.1.6 (as the latest stable version) in a self-created container based on Ubuntu 24.04 with JDK 17.0.12-tem (installed via sdkman.io).
I have not seen v2, so we might try running this with a rc of HSC v2 and I will report back later.

Is this a requirement for the component, that it should show more context information when the error occurs?

More details for HTTP errors would just make troubleshooting a whole lot easier.
We've already burnt too much time trying to determine, if this was a DNS error and establishing workarounds...

ascheman added a commit that referenced this issue Dec 13, 2024
as we would like to be able to check more detailed
error conditions when the problems occur (#270).
ascheman added a commit that referenced this issue Dec 13, 2024
as we would like to be able to check more detailed
error conditions when the problems occur (#270).

Improve JavaDocs also (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
as we would like to be able to check more detailed
error conditions when the problems occur (#270).

Improve JavaDocs also (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
Note that this is some kind of hack as the failure
elements already contained the (optional) suggestions,
and we added stacktraces by now. It would be even
better if the JUnit XML allowed for nested elements
instead of just writing CDATA elements (#270).

Align with code style also (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
If an UnknownHostException occurs (#270), we retry (configurable)
times, as this is sometimes due to DNS (caching) errors
with Java (or the underlying OS).

Fix typos and wording (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
If an UnknownHostException occurs (#270), we retry (configurable)
times, as this is sometimes due to DNS (caching) errors
with Java (or the underlying OS).

Fix typos and wording (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
If an UnknownHostException occurs (#270), we retry (configurable)
times, as this is sometimes due to DNS (caching) errors
with Java (or the underlying OS).

Fix typos and wording (#343).
@ascheman ascheman assigned ascheman and unassigned gernotstarke Dec 14, 2024
@ascheman
Copy link
Member

Meanwhile, I could assemble a first approach to improve this

  • I added a stack trace (printed by the ConsoleReporter, as well as the HtmlReporter and the JUnitReporter). However, I guess in the case of @PirMei it will not help much as we already can anticipate that it's a temporal DNS issue (perhaps caching).
  • I added a new configuration attribute (retries of type Integer) which will make as many retries to resolve the URL as specified (I recommend to make not more than 3). Currently this only retries for UnknownHostException.

Please give it a try, @PirMei: You have to either clone the repo and build locally, or refer to https://jitpack.io/#aim42/htmlSanityCheck/bugfix~270-improve-exception-handling-SNAPSHOT, i.e.,

Add JitPack repo to settings.gradle

pluginManagement {
    repositories {
        maven { url 'https://jitpack.io' } // Make this the first entry!
...
        gradlePluginPortal()
    }
}
...

Choose plugin version by changing the version to the branch, i.e.,

plugins {
    id 'org.aim42.htmlSanityCheck' version "bugfix~270-improve-exception-handling-SNAPSHOT"

@ascheman
Copy link
Member

BTW: If you need better/closer support, @PirMei, feel free to join a respective Slack and reach out to me (ascheman), e.g.,

Or find me on Matrix.

ascheman added a commit that referenced this issue Dec 14, 2024
as we would like to be able to check more detailed
error conditions when the problems occur (#270).

Improve JavaDocs also (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
Note that this is some kind of hack as the failure
elements already contained the (optional) suggestions,
and we added stacktraces by now. It would be even
better if the JUnit XML allowed for nested elements
instead of just writing CDATA elements (#270).

Align with code style also (#343).
ascheman added a commit that referenced this issue Dec 14, 2024
If an UnknownHostException occurs (#270), we retry (configurable)
times, as this is sometimes due to DNS (caching) errors
with Java (or the underlying OS).

Fix typos and wording (#343).
@PirMei
Copy link

PirMei commented Dec 16, 2024

"Unfortunately" the Unknown Host errors stopped immediately when my colleagues moved the documentation webserver to a different host which means I have no way to test this any more as my colleagues also don't want to burn more working hours investigating this, sorry :(

@ascheman
Copy link
Member

"Unfortunately" the Unknown Host errors stopped immediately when my colleagues moved the documentation webserver to a different host which means I have no way to test this any more as my colleagues also don't want to burn more working hours investigating this, sorry :(

Thanks a lot for the information, @PirMei. I was already guessing (and hoping) that it will turn out to be an infrastructure problem. Such flaky behaviour is hard to track (and even harder to circumvent) from the application layer.

However, as I have spend some time on it, I will soon merge the improved error handling (but currently have other priorities).

@PirMei
Copy link

PirMei commented Dec 17, 2024

After reading your replies again, I would like to add one more bit of information.
I don't think it was a temporal DNS issue (perhaps caching). in our case, since we established a local DNS cache (dnsmasq) inside the pipeline job. So if the job container could resolve the destination host successfully once, it would not have to ask upstream DNS servers for resolving the destination host again.

Since we only got the unknown host errors for a handful of hrefs but not for the vast majority of hrefs pointing to the same host even after establishing the DNS cache, I would specifically eliminate DNS problems.
Just my two cents, we currently won't test this further (with HSC 1.1.6 or a RC of HSC2) though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants