-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tight binding of SwiLex to SwiParse makes some jobs difficult. #1
Comments
This kind of information (line number, column number) will be added in the next versions to the The SwiLex's default As you mentioned, this solution can not be used with SwiParse since the lexing is done manually.. and using a Lexer (without separators, ...) adds some unnecessary constraints.. I suggest the following solution: We need the indices of all the // the indices of all the newline characters in the base string
let newlineIndices = input.indices.filter{ input[$0] == "\n" } And with /// - Returns: A tuple of the line number and the column number of
/// a token in its base string.
func position(of token: Substring) -> (Int, Int) {
// the index of the previous newline character or -1 if no
// newline is found before the token
let previousNewlineIndex = previousIndex(of: token.startIndex,
in: newlineIndices)
let lineStartIndex: String.Index
if previousNewlineIndex >= 0 {
// start counting the column number from the character after
// the previous newline character in the base string
lineStartIndex = token.base.index(after: newlineIndices[previousNewlineIndex])
} else {
// If no previous newline is found then the column number
// counting starts from the beginning of the base string
lineStartIndex = token.base.startIndex
}
let line = previousNewlineIndex + 1
let column = token.base.distance(from: lineStartIndex,
to: token.startIndex)
return (line, column)
}
/// Binary search.
/// - Parameters:
/// - element: a `Comparable` item.
/// - array: array of `Comparable` items.
/// - Returns: the index of the element of `array` preceding `element`
/// or -1 if no element in `array` precedes it.
func previousIndex<T:Comparable>(of element: T, in array: [T]) -> Int {
var beg = 0
var end = array.count-1
while end >= beg {
let mid = beg + (end-beg) / 2
if array[mid] == element { return mid }
else if array[mid] < element { beg = mid + 1 }
else { end = mid - 1 }
}
return beg-1
} This solution can be used directly in the Here is a simple example using the solution. The parser here concatenates the positions (line number, column number) of each token in a String: enum Terminal: String, SwiLexable {
static var separators: Set<Character> = [" ", "\n"]
case text = "[A-Za-z]*"
case number = "[0-9]*"
case eof
case none
}
enum NonTerminal: SwiParsable {
case start // starting token
case concat
}
let input = """
Hello
this is
1 test
"""
let newlineIndices = input.indices.filter{ input[$0] == "\n" }
func singlePosition(s: Substring) -> String { "\(position(of: s))" }
func concatPosition(s1: String, s2: Substring) -> String { s1 + "\(position(of: s2))" }
let parser = try SwiParse<Terminal, NonTerminal>(rules: [
.start => [Word(.concat)], // accepting rule
.concat => [Word(.text)] >>>> singlePosition,
.concat => [Word(.number)] >>>> singlePosition,
.concat => [Word(.concat), Word(.text)] >>>> concatPosition,
.concat => [Word(.concat), Word(.number)] >>>> concatPosition,
])
let result = try parser.parse(input: input)
print(result)
// Optional("(0, 0)(1, 0)(1, 5)(2, 0)(2, 2)") As mentioned before this information (line number, column number, size, ...) will be in the An abstraction of the Lexer (with a generic Token), supporting different parsers, adding some debugging tools (with graphviz), visualisation of the AST, supporting files, ... are planned. I just need some free time (and maybe contributors 😉) |
For example, I want to keep track of source location (line, column, etc.) and bundle that in with my tokens. One way I can do that is to not have any
separators
, recognize whitespace explicitly,and then filter the tokens myself:
But there's really no way to interpose any of this into the process. Ideally the parser would be able to operate on a generic token type and could take a token stream rather than a string as input.
I can probably get out of this particular jam by mapping substring indices back into source locations, but it's quite awkward, and I'm pretty sure people will want to make their own lexers.
The text was updated successfully, but these errors were encountered: