Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Fixes #128

Merged
merged 39 commits into from
Feb 1, 2012
Merged

Parser Fixes #128

merged 39 commits into from
Feb 1, 2012

Conversation

StephanErb
Copy link
Member

I've been working on some bugs deep down in our incremental parser and would be glad if someone could review my changes.

For more information, see ticket #113 and the hudson job of this branch.

Thanks,
Stephan

During the recursive TextBlock instatiation and token move operation, we
may have to navigate to back to our parent block. At this time this
block is not yet completely created and may contain empty leading
subblocks. We must be able to deal with this inconsistent state.

We do now skip over these empty blocks. The parser will no longer crash.

Unfortunately, I am not absolutely sure that this is the correct
fix in all cases. It may potentially happen that we do compute
the wrong original version. Further investigations / reviews
are therefore needed.
ANTLR will return a fixed EOF Token. We have to convert it to our custom
token implementation. Otherwise, we will crash if our root block is empty
/ contains only omitted tokens.
We have to use the absolut offset to completely replace all lexed tokens
with whitespace.
The incremental parser will now consider less TextBlocks to be equal.
This fixes some problems, but we may re-use fewer elements than we used
to! There might be a better way to fix the problem described below.

We might have considered TextBlocks to be equal which have different number
of tokens. Those tokens then ended up as a duplicate in the re-used TB,
resulting in an invalid TB model.

We only check for re-usable tokens on the current sub-node level (the level
where the lexer created these new tokens) and might then confuse tokens to
be equal, though these do actually belong to different sub-blocks in a
correctly nested TB tree.

There seems to be no obvious way to remove these outdated, remaining tokens.

This fixes the problems observed in TestParsingScenarios#
testReparsingWithoutModiication*();
We do now inspect a replace operation and check for the actual
textual differences. Only these differences will be applied to the
textblocks model.

Subdiffs leverage our re-use capabilites during incremental parsing.
For example, when a text region is replaced with a very similar text, we
can now re-use many tokens/textblocks. Before, the whole text range
(including the corresponding model elements) had to be re-created.
This commit improves upon 883e78e
and also consumes omitted tokens in empty regions without any lexed tokens.
4 new incremental parser tests are now failing. They create a new root
object though it can be expected that they reuse the old one.
We do now correctly handle the substring case where the start offset
lies within another token (e.g., because some tokens where skipped
due to lexing problems)
Correctly report expected tokens (if known). This fixes the strange
"Found X but expected <BUG>" messages.

Implement a workaround to make error locations reported by the lexer
absolute. They used to be relative to the parsed region.
This fixes issue #121.
The old implementation was only capable of re-evaluating property inits
in non-empty conditions and alternatives. The new implementation is
based on TcsUtil.wasExecuted() and does not suffer from this problem.
This will force that the referencedElements are always set. Before,
it may have happened due to incremental parsing and token
reuse that the reference was already resolved but the
referencedElements of the tokens were empty.
There may be more than one reference to a single property within
a textblock (e.g.,the auto-parenthesis within operator templates
have the same sequence element as the property itself). We
therefore need to find out if a property is still 'referenced' by any other
token in the re-used TextBlock.
Only if it is no longer referenced from anywhere, we are free
to unset it on the corresponding modelelement.
When deciding whether we can stop lexing or if we have to keep going,
we checked if we have two off-channel tokens at hand. Having two tokens
of this same time implied that our re-lexed stream converged with the
unmodified tokens.

This checked failed when we commented a line. We stopped lexing too early
and did not properly lex the last token at the end of the commented
line.

This commit fixes this by checking the more fine grained token type.
We do now consider more TextBlocks to be equal and thus improve
our re-use capabilities. Trick is to check if all old and new tokens
within a block correspond to each other.
I cannot say for sure which block should actually be reused. I've
therefore decided to simplify the testcase. That way we can at least
assert our current behaviour.
Also check for the equality of the sequence elements instead of the
mere equality of token values.
@StephanErb StephanErb merged commit a8547cc into master Feb 1, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant