Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation guidelines for contribution content #13703

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 9, 2024

Which issue does this PR close?

Rationale for this change

Given recent discussions on improving DataFusion stability (see #13648) I think it is time to document the criteria to add new features to DataFusion to help ensure everyone's expectations are aligned

What changes are included in this PR?

  1. Add summary of conclusion of [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) #12357 to the docs

Are these changes tested?

CI

Are there any user-facing changes?

@alamb
Copy link
Contributor Author

alamb commented Dec 9, 2024

Rather than state some hard rule, I went with the approach of guidance on what would need more / less discussion prior to acceptance. I felt this would give us flexibility but still given potential contributors guidance

2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
5. "Small" functional improvements to existing features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. "Small" functional improvements to existing features
5. "Small" functional improvements to existing features

Non breaking improvements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good qualification -- non breaking changes are certainly more likely to get approved in my experience. I tried to clarify

5. "Small" functional improvements to existing features
6. Additional APIs for extending DataFusion's capabilities

Contributions that likely require discussion prior to acceptance include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should note the preferred way to discuss, GH Discussion/ASF slack, etc

Contributions that likely require discussion prior to acceptance include:

1. New functionality that is part of the "standard sql"
2. New functions that aren't part of the "standard sql"
Copy link
Contributor

@comphead comphead Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following, 🤦 looks like anything related to "standard sql" requires discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording like this so nothing was "required" but to give a hint that major new features are likely to involve more discussion

Contributions that will likely involve more discussion (see Discussing New 
Features above) prior to acceptance include:

1. Major new functionality (even if it is part of the "standard sql")
2. New functions, especially if they aren't part of "standard sql"
3. New data sources (e.g. support for Apache ORC)

1. Bug fixes for existing features
2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 love benchmarks

docs/source/contributor-guide/index.md Outdated Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Dec 10, 2024

BTW We should strive to make datafusion easier to extend. This is not only about particular APIs but also about internal design. We may be principled or not about what kind of SQL variant we provide out of the box, but we should assume extending systems provide something else (eg Ballista speaks Spark SQL, which is different than Datafusion SQL). This is obviously a teaser to longer discussion (#12723).

I approved this PR based on my understand that the wording proposed here doesn't preclude -- and actually encourages -- ground work features serving the extensibility story (the issue linked above, #12604, etc.)

Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Oleks V <[email protected]>
@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

BTW We should strive to make datafusion easier to extend.

I 100% agree

I approved this PR based on my understand that the wording proposed here doesn't preclude -- and actually encourages -- ground work features serving the extensibility story (the issue linked above, #12604, etc.)

Yes, this was my intention and I think it fits directly with the stated design goals (aka customizable everything):

  1. Work “out of the box”: Provide a very fast, world class query engine with minimal setup or required configuration.
  2. Customizable everything: All behavior should be customizable by implementing traits.
  3. Architecturally boring 🥱: Follow industrial best practice rather than trying cutting edge, but unproven, techniques.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb

@comphead
Copy link
Contributor

Should we let it hang for day-two to get other reviews?

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

Should we let it hang for day-two to get other reviews?

Yes I think we should leave this open for several more days to make sure anyone who wants a chance to comment can do so. Obviously we can always update the wordiing with subsequent PRs too, but I don't think there is any reason to rush this one in

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Comment on lines +100 to +101
1. Major new functionality (even if it is part of the "standard sql")
2. New functions, especially if they aren't part of "standard sql"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Major new functionality (even if it is part of the "standard sql")
2. New functions, especially if they aren't part of "standard sql"
1. Major new functionality (even if it is part of the "standard SQL")
2. New functions, especially if they aren't part of "standard SQL"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
5 participants