Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis builds timing out due to module tests running > 10 mins #1198

Closed
shardulm94 opened this issue Jul 13, 2020 · 5 comments · Fixed by #1210
Closed

Travis builds timing out due to module tests running > 10 mins #1198

shardulm94 opened this issue Jul 13, 2020 · 5 comments · Fixed by #1210

Comments

@shardulm94
Copy link
Contributor

shardulm94 commented Jul 13, 2020

Travis kills a build when there is no output received from the build for 10 mins. As we grow our test cases (especially Spark), we will hit this timeout limit as Gradle by default does not produce an output if tests are working as intended.

I hit this issue in #1189 when I updated read tests to also test the vectorized codepaths. https://travis-ci.org/github/apache/iceberg/jobs/707083672#L1408. The spark2 module tests seem to run for > 10 mins now on Travis nodes.

There are a few ways I think we can tackle this issue.

  1. Reduce the tests we have (Not ideal in the long term)
  2. Use travis_wait as recommended by Travis for long running tasks. https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received The issue I have with travis_wait is that it suppresses the log from the child command till it is complete. Also timeout supplied to travis_wait is for the whole child command (in our case ./gradlew check which already takes about 20 minutes today.
  3. Run a background process to output a line to the log every minute. This similar to what I have done in the past at https://github.com/linkedin/transport/blob/6fc9a9274c147e999b8b52ee286973a57bbbd7f2/.travis.yml#L21-L25. We can also use the travis_jigger command which does a similar thing (and can have a timeout too)
  4. Make Gradle print out the name of every test it ran so that it produces some output as it progresses through the tests by modifying
    events "failed"
    This will produce some noise when Gradle tests are run though (even locally, although unsure if there is a way we can apply this only for Travis)
@rdblue
Copy link
Contributor

rdblue commented Jul 13, 2020

I've been thinking about reducing the number of test classes to make tests run faster. When I run tests locally, the Spark start-up phase seems to take a long time, and we start a new session in lots of different tests. By merging some of those tests together, we could avoid that cost, making tests faster and not hitting this timeout issue.

@rdsr
Copy link
Contributor

rdsr commented Jul 14, 2020

I don't mind too much if tests take a little long to run, especially on travis. It doesn't hamper my dev cycle where I mostly run specific tests while I'm developing and testing my changes. For travis to not time out can we simply print standard error and standard out on the console through Gradle? This will dump the logs when running the tests. The configuration for enabling that is

// show standard out and standard error of the test JVM(s) on the console
  testLogging.showStandardStreams = true

And we should be able to enable this configuration on Travis by checking Travis specific env variables, if this produces too much noise on our local machines.

@shardulm94
Copy link
Contributor Author

shardulm94 commented Jul 14, 2020

Enabling stdout and stderr will produce way too much noise in the Jenkins log. If we have to go by the approach of making Gradle print something, printing the test name it ran (suggestion No. 4 from the original post) is pretty straightforward. If a single test method is taking > 10 mins, it is not an ideal test case.

@rdblue
Copy link
Contributor

rdblue commented Jul 15, 2020

I opened a PR with option 4: #1210

@rdblue rdblue linked a pull request Jul 15, 2020 that will close this issue
@rdblue
Copy link
Contributor

rdblue commented Jul 15, 2020

Fixed by #1210. Thanks for finding all of the options to fix this, @shardulm94!

@rdblue rdblue closed this as completed Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants