Skip to content

Commit

Permalink
Merge branch 'main' into mysql-users
Browse files Browse the repository at this point in the history
  • Loading branch information
mvzink committed Nov 28, 2024
2 parents 50506c4 + 5a510ac commit 93a4b03
Show file tree
Hide file tree
Showing 33 changed files with 3,900 additions and 765 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,23 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
issues and distinguish different syntaxes in the AST.


## WIP: Extracting source locations from AST nodes

This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.

```rust
use sqlparser::ast::Spanned;

// Parse SQL
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();

// The source span can be retrieved with start and end locations
assert_eq!(ast[0].span(), Span {
start: Location::of(1, 1),
end: Location::of(1, 16),
});
```

## SQL compliance

SQL was first standardized in 1987, and revisions of the standard have been
Expand Down Expand Up @@ -210,6 +227,18 @@ Our goal as maintainers is to facilitate the integration
of various features from various contributors, but not to provide the
implementations ourselves, as we simply don't have the resources.

### Benchmarking

There are several micro benchmarks in the `sqlparser_bench` directory.
You can run them with:

```
git checkout main
cd sqlparser_bench
cargo bench
git checkout <your branch>
cargo bench
```

## Licensing

Expand Down
1 change: 1 addition & 0 deletions derive/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ license = "Apache-2.0"
include = [
"src/**/*.rs",
"Cargo.toml",
"LICENSE.TXT",
]
edition = "2021"

Expand Down
1 change: 1 addition & 0 deletions derive/LICENSE.TXT
52 changes: 52 additions & 0 deletions docs/source_spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

## Breaking Changes

These are the current breaking changes introduced by the source spans feature:

#### Added fields for spans (must be added to any existing pattern matches)
- `Ident` now stores a `Span`
- `Select`, `With`, `Cte`, `WildcardAdditionalOptions` now store a `TokenWithLocation`

#### Misc.
- `TokenWithLocation` stores a full `Span`, rather than just a source location. Users relying on `token.location` should use `token.location.start` instead.
## Source Span Contributing Guidelines

For contributing source spans improvement in addition to the general [contribution guidelines](../README.md#contributing), please make sure to pay attention to the following:


### Source Span Design Considerations

- `Ident` always have correct source spans
- Downstream breaking change impact is to be as minimal as possible
- To this end, use recursive merging of spans in favor of storing spans on all nodes
- Any metadata added to compute spans must not change semantics (Eq, Ord, Hash, etc.)

The primary reason for missing and inaccurate source spans at this time is missing spans of keyword tokens and values in many structures, either due to lack of time or because adding them would break downstream significantly.

When considering adding support for source spans on a type, consider the impact to consumers of that type and whether your change would require a consumer to do non-trivial changes to their code.

Example of a trivial change
```rust
match node {
ast::Query {
field1,
field2,
location: _, // add a new line to ignored location
}
```

If adding source spans to a type would require a significant change like wrapping that type or similar, please open an issue to discuss.

### AST Node Equality and Hashes

When adding tokens to AST nodes, make sure to store them using the [AttachedToken](https://docs.rs/sqlparser/latest/sqlparser/ast/helpers/struct.AttachedToken.html) helper to ensure that semantically equivalent AST nodes always compare as equal and hash to the same value. F.e. `select 5` and `SELECT 5` would compare as different `Select` nodes, if the select token was stored directly. f.e.

```rust
struct Select {
select_token: AttachedToken, // only used for spans
/// remaining fields
field1,
field2,
...
}
```
24 changes: 19 additions & 5 deletions examples/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@

#![warn(clippy::all)]

/// A small command-line app to run the parser.
/// Run with `cargo run --example cli`
//! A small command-line app to run the parser.
//! Run with `cargo run --example cli`
use std::fs;
use std::io::{stdin, Read};

use simple_logger::SimpleLogger;
use sqlparser::dialect::*;
Expand All @@ -38,6 +40,9 @@ $ cargo run --example cli FILENAME.sql [--dialectname]
To print the parse results as JSON:
$ cargo run --feature json_example --example cli FILENAME.sql [--dialectname]
To read from stdin instead of a file:
$ cargo run --example cli - [--dialectname]
"#,
);

Expand All @@ -57,9 +62,18 @@ $ cargo run --feature json_example --example cli FILENAME.sql [--dialectname]
s => panic!("Unexpected parameter: {s}"),
};

println!("Parsing from file '{}' using {:?}", &filename, dialect);
let contents = fs::read_to_string(&filename)
.unwrap_or_else(|_| panic!("Unable to read the file {}", &filename));
let contents = if filename == "-" {
println!("Parsing from stdin using {:?}", dialect);
let mut buf = Vec::new();
stdin()
.read_to_end(&mut buf)
.expect("failed to read from stdin");
String::from_utf8(buf).expect("stdin content wasn't valid utf8")
} else {
println!("Parsing from file '{}' using {:?}", &filename, dialect);
fs::read_to_string(&filename)
.unwrap_or_else(|_| panic!("Unable to read the file {}", &filename))
};
let without_bom = if contents.chars().next().unwrap() as u64 != 0xfeff {
contents.as_str()
} else {
Expand Down
1 change: 1 addition & 0 deletions sqlparser_bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

[package]
name = "sqlparser_bench"
description = "Benchmarks for sqlparser"
version = "0.1.0"
authors = ["Dandandan <[email protected]>"]
edition = "2018"
Expand Down
20 changes: 20 additions & 0 deletions sqlparser_bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

Benchmarks for sqlparser. See [the main README](../README.md) for more information.
10 changes: 5 additions & 5 deletions sqlparser_bench/benches/sqlparser_bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,24 +23,24 @@ fn basic_queries(c: &mut Criterion) {
let mut group = c.benchmark_group("sqlparser-rs parsing benchmark");
let dialect = GenericDialect {};

let string = "SELECT * FROM table WHERE 1 = 1";
let string = "SELECT * FROM my_table WHERE 1 = 1";
group.bench_function("sqlparser::select", |b| {
b.iter(|| Parser::parse_sql(&dialect, string));
b.iter(|| Parser::parse_sql(&dialect, string).unwrap());
});

let with_query = "
WITH derived AS (
SELECT MAX(a) AS max_a,
COUNT(b) AS b_num,
user_id
FROM TABLE
FROM MY_TABLE
GROUP BY user_id
)
SELECT * FROM table
SELECT * FROM my_table
LEFT JOIN derived USING (user_id)
";
group.bench_function("sqlparser::with_select", |b| {
b.iter(|| Parser::parse_sql(&dialect, with_query));
b.iter(|| Parser::parse_sql(&dialect, with_query).unwrap());
});
}

Expand Down
82 changes: 82 additions & 0 deletions src/ast/helpers/attached_token.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
use core::fmt::{self, Debug, Formatter};
use core::hash::{Hash, Hasher};

use crate::tokenizer::{Token, TokenWithLocation};

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

#[cfg(feature = "visitor")]
use sqlparser_derive::{Visit, VisitMut};

/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
/// This should be used when a token is not relevant for semantics, but is still needed for
/// accurate source location tracking.
#[derive(Clone)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct AttachedToken(pub TokenWithLocation);

impl AttachedToken {
pub fn empty() -> Self {
AttachedToken(TokenWithLocation::wrap(Token::EOF))
}
}

// Conditional Implementations
impl Debug for AttachedToken {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}

// Blanket Implementations
impl PartialEq for AttachedToken {
fn eq(&self, _: &Self) -> bool {
true
}
}

impl Eq for AttachedToken {}

impl PartialOrd for AttachedToken {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}

impl Ord for AttachedToken {
fn cmp(&self, _: &Self) -> Ordering {
Ordering::Equal
}
}

impl Hash for AttachedToken {
fn hash<H: Hasher>(&self, _state: &mut H) {
// Do nothing
}
}

impl From<TokenWithLocation> for AttachedToken {
fn from(value: TokenWithLocation) -> Self {
AttachedToken(value)
}
}
1 change: 1 addition & 0 deletions src/ast/helpers/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
pub mod attached_token;
pub mod stmt_create_table;
pub mod stmt_data_loading;
Loading

0 comments on commit 93a4b03

Please sign in to comment.