cky_algorithm

The Cocke–Younger–Kasami algorithm: a parsing algorithm for context-free grammars

https://medium.com/swlh/cyk-cky-f63e347cf9b4

Stats: This algorithm is a dynamic, bottom-up approach to parsing. In this context, “dynamic” means we are going to create a table that solves a lot of subproblems, and then in the end we will be able to understand the whole problem by looking through our amalgamation of subproblems in the table. Minna Sundberg’s illustration of language families is flipped upside down so that the root is at the top Minna Sundberg’s illustration of language families displays few roots and many branches Bottom-up means that when we read the table we created, we’ll look from the bottom of the branches, or the input, and track which parts of speech can be chained together in a tree on the way up, rather than starting with the first part of speech, or the root, and tracing our way down which part of speech can branch off from the first one until we get to the input.

In terms of speed, this is not the most efficient algorithm for parsing in all cases, but it is one of the best for worst-case scenarios. The algorithm has a cubic run time for the length of the input, with a multiplier for the size of the grammar.

High Level: In this algorithm, we will use the upper right hand portion of a matrix. We will start by storing the words in the string in cells that span a diagonal line from the top left of the matrix to the bottom right. Using the information previously stored in the matrix, we will incrementally build up the rest of the upper right hand portion of the matrix, storing subtrees that match what we have as we go. As we get farther away from the words, a split point or pivot point becomes key to this process. As we look through the table’s rows and columns (i-j) we will also be checking for a pivot point (k) where we might have a grammar rule for both [i,k] and [k,j]. So to have A -> B C in [i][j], we will need to have B in [i][k] and C in [k][j] while k is greater than i but less than j. This is possible because we have split up our grammar rules into Chomsky normal form, so we will only have at most two rules in a production, and can check for all of these possible outcomes easily, as opposed to splitting them up into a billion extra substrings and searching through each one. The pseudocode looks like this:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
cky_parser.py		cky_parser.py
cky_parser.sh		cky_parser.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cky_algorithm

About

Releases

Packages

Languages

JaneHazen/cky_algorithm

Folders and files

Latest commit

History

Repository files navigation

cky_algorithm

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages