-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use tokens before parsing #6
Comments
Hi! You are right: I was thinking to reimplement As I don't have so much free time this month, I wasn't going to start this work before March. |
The project is perfectly fine for the prototype I'm currently working on. At least for the moment. But in my opinion the question is, what should be the target we aim for. For example at the moment the following is true So I'm not in a hurry at all and can hack in stuff if I need too, albeit I think, having a nice, handy and robust parser would be really nice. I am also quite busy this month, but march looks good for me (at least at the moment). FYI, my current state of the lexer (just a quick implementation, but it works quite nice): https://gist.github.com/jaheba/8957181 |
I agree: it works well for a prototype. I'm also starting to see the limits, so it definitely needs to be rewritten. The feature Regarding to my needs, I don't care if the parser discards some parts of a SQL query it doesn't understand if it can parse the rest. Also, I only need not too complex CRUD queries support (I don't care about I read your lexer; it gives me a lot of ideas/questions, so it's a good start! For example: we should use higher level functions like I'm not an expert at writing parsers, but I read the first 3 lessons of this course: https://github.com/DmitrySoshnikov/Essentials-of-interpretation/ It is really interesting, so I recommend it! Are you familiar with automated tests or Test Driven Development? Here I'm using QUnit to test if my parser gives the expected result. I'm using it through Grunt (with JSHint to check my JS syntax). I'm asking this because this workflow really helped the current parser to be more reliable. So I think it is something to keep. Final question and I'll stop for now to let you answer: what do you think about the structure of the AST generated by the parser? (I think this is something crucial.) |
Same here. But I think an option to have the query validated, would still be good. We want a live preview of the query, so filtering some invalid queries would help us, not sending them all to the database server. My lexer is an approach to capture all tokens of sql, and I'm not sure, how viable it is in the end. However I tested some large sql queries I had and it did pretty ok. Besides I am a fan of higher level functions, too; but I think one should not use them on self purpose. I read some of the lessons and recognised the style you used for the conditions parser. I don't agree completely how the code is organised, but the rest was indeed interesting. I am familiar with TDD, but not in Javascript (coming mainly from python). So I have a look into it. The AST generated looks good for me and I like the JSON style. |
Today, I started to re-think the AST. So I did a quick example here: https://gist.github.com/dsferruzza/9529264
What do you think of that? |
It is over two month ago, since I last posted, but at least I made some progress. There are basically two approaches to write a parser, either top-down, or bottom-up. Top-down is the "natural" approach, where you read tokens and then decide which rule to apply. That means, that for each token you know what has to be done next. Top-down parser can be written by hand, and the most common strategy is to use a LL(1) recursive decent parser. The problem you have, when using those is that you have to eliminate left-recursion from your grammar. The parser I have written (some sort of LL(1) parser) is far from being finished, but can parse some sql and supports stuff like subselects or nested brackets distinguishing between logical and mathematical operators. If you like I can share the current state. I like the structure of your ast, however it is difficult to keep/generate the expression nodes and I don't really the advantage, since you can apply the |
I'm happy to read from you! About writing the parser, I am trying to choose between 3 solutions:
So the next step for me is try to make a proof of concept with a parser combinator. About keeping expressions in nodes, the point was to let the user choose if he wants a fully parsed output, or just some pieces of strings. Do not hesitate to share your work, I'll have a look! |
I pushed a new branch: https://github.com/dsferruzza/simpleSqlParser/tree/v2 |
v2 is going well! |
What do I have to do to get require.js and Parsimon running? |
I don't really know require.js, so I can't tell for now... Also, maybe I can distribute a variant with an embedded version of Parsimmon. |
I have finally found some time to test it out a bit and it looks good so far :) I'm not too much into parser generators but I like the approach. However I think it is important to specify which dialect of sql you are supporting. Some DBMS support SCHEMAS additionally to TABLES and COLUMNS. HANA uses double quotes instead of backticks for identifier-strings. And I believe there are tons of stuff like that for every dialect. What a beautiful world we live in. Do you plan to support expressions as ast-nodes? |
At first I was skeptical about parser combinators, but after trying (and having written a top-down parser by hand in the past) I really like the ease it offers to write parsers and think about them. Parser generators offer similar mechanisms to build parsers, but I prefer (at least for now) parser combinators as they don't require a specific syntax (everything is valid JS) nor a special build step. But I guess it's a trade-off about performance/maintainability/ease of use... About specifying which dialect of SQL is supported: I agree that it is important. But I don't know if I will have the time/motivation to do that... For now, the policy is:
But I'm really open to discussion if you need more, have suggestions, or want to help!
I'm not sure to understand the question... If you are talking about
In some AST nodes, I only implemented the first level of parsing. That's why there is only an |
I have digged a bit deeper into the code and have noted some issues, which are in my opinion are due to the fact, that the input string is not tokenised before. I have started to write one, which handles the input quite well. I hope I find time this or next week to port it into the
sql2ast
function.Things which cause problems:
Select
is not valid, onlySELECT
orselect
ON
andAS
always needs to be written in capital lettersSELECT .. x = "some string"; .... "other string"
results into ignoring the;
as a seperator between the queriesI think there are more issues with the current way the parser works, that are those I just found.
The text was updated successfully, but these errors were encountered: