Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] More extensive pre-release testing #13661

Open
Tracked by #13648
alamb opened this issue Dec 5, 2024 · 8 comments
Open
Tracked by #13648

[DISCUSS] More extensive pre-release testing #13661

alamb opened this issue Dec 5, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 5, 2024

Is your feature request related to a problem or challenge?

Up to now, when we have made DataFusion releases, we have mostly focused on validated that DataFusion's own unit tests have passed, (see dev/release/README.md) but haven't tested the upgrade with other downstream projects (like ballista, ray, InfluxDB IOx, etc) until after we have release the code

This results sometimes in downstream users finding issues after release. Some recent examples

Describe the solution you'd like

I would like to improve the testing / release process for DataFusion releases to reduce the number of regressions found after release.

This would likely take the form of updating the dev/release/README.md

Describe alternatives you've considered

One idea mentioned by @Omega359 and @andygrove on #13525 (comment)

Upgrading our own subprojects (Ballista, Comet, DF Python, DF Ray) as part of the DataFusion release process makes a lot of sense to validate that the upgrade guide is complete.

Additional context

No response

@alamb alamb added the enhancement New feature or request label Dec 5, 2024
@alamb alamb changed the title [DISCUSS] More deliberate pre-release testing [DISCUSS] More extensive pre-release testing Dec 5, 2024
@alamb
Copy link
Contributor Author

alamb commented Dec 5, 2024

BTW I often create WIP PRs to test releases of sqlparser-rs and arow-rs with DataFusion before the release

For example:

This caught at least one regression before release recently (apache/datafusion-sqlparser-rs#1556, fixed by @goldmedal 🙏 )

@findepi
Copy link
Member

findepi commented Dec 5, 2024

I would like to improve the testing / release process for DataFusion releases to reduce the number of regressions found after release.

I agree with the goal, but i am concerned about the cost/overhead involved. We don't have infinite bandwith at disposal.
Additionally, orchestrating testing across several downstream projects creates new problems we didn't have before

  • which downstream projects can stop the release train? what kind of problems are able to stop the release train?
    • this is especially important question for closed-source downstream projects
  • how long are we willing to wait for external teams to report back? Downstream project maintainers will obviously be willing and motivated to help, but their availability cannot be assumed

So what if we focused instead on:

  • improving testing within DF itself? if a downstream project is concerned about stability of feature X, they can contribute to improve test coverage for feature X (eg [DISCUSSION] More SqlLogicTest test coverage for queries, including join queries #13470)
  • easy low-ceremony low-overhead (automated) releases. If we see a big problem / regression after a major release, this can be patched on a maintenance branch. A maintenance branch can release daily without human intervention.

@alamb
Copy link
Contributor Author

alamb commented Dec 5, 2024

I agree with the goal, but i am concerned about the cost/overhead involved. We don't have infinite bandwith at disposal.
Additionally, orchestrating testing across several downstream projects creates new problems we didn't have before

Yes, this is true. I imagine an incremental rollout type approach -- where we start with one project (datafusion-python is a natural example, and maybe we can get delta-rs to help too). In my (likely naieve) thinking the downstream projects will be willing to help as they are directly affected

@alamb
Copy link
Contributor Author

alamb commented Dec 5, 2024

easy low-ceremony low-overhead (automated) releases. If we see a big problem / regression after a major release, this can be patched on a maintenance branch. A maintenance branch can release daily without human intervention

It is my understanding that the apache voting / approval process prevents automated builds

Creating a maintenance branch is also a compelling idea where we can focus on stability / shoring up test coverage 🤔

@findepi
Copy link
Member

findepi commented Dec 7, 2024

It is my understanding that the apache voting / approval process prevents automated builds

That's my understanding too, but i hope this process isn't nonnegotiable.
Processes are there to serve the project & the community after all, not the other way around.

@alamb
Copy link
Contributor Author

alamb commented Dec 9, 2024

It is my understanding that the apache voting / approval process prevents automated builds

That's my understanding too, but i hope this process isn't nonnegotiable. Processes are there to serve the project & the community after all, not the other way around.

I think as long as we make it clear that nightly builds are not "official" releases from the ASF point of view, we could create / publish them. 🤔

@findepi
Copy link
Member

findepi commented Dec 10, 2024

That would work for me as long as these releases are the only once we publish. I would want automation for 'the releases' the people use. Especially if we have a maintenance branch, it would reasonable to release after every PR merge.
I don't know what problems the manual release process solves that cannot be solved with automated releases.

anyway, did we hijack the thread?

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

anyway, did we hijack the thread?

Yeah, we should probably file a separate discussion about more frequent releases if we want to pursue that option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants