Skip to content

Commit 8eccc20

Browse files
author
Maarten
committed
2 parents 3ed5956 + e19f369 commit 8eccc20

File tree

1 file changed

+10
-9
lines changed

1 file changed

+10
-9
lines changed

README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,9 @@
55

66
# Probabilistic Earley parser
77

8-
This is an implementation of a probabilistic Earley parsing algorithm, which can parse any Probabilistic Context Free Grammar (PCFG) (also
9-
known as Stochastic Context Free Grammar (SCFG)),
10-
or equivalently any language described in Backus-Naur Form (BNF). In these grammars,
11-
rewrite rules may be non-deterministic and have a probability attached to them.
12-
8+
This is a library for parsing a string of tokens (like words) into parse trees that are weighted by probability. For example: you might want to know the probabilities for all derivations of an English sentence, or the most likely table of contents structure for a list of paragraphs. This library allows you to do so efficiently, as long as you can describe the rules as a [Context-free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar) (CFG).
139

10+
The innovation of this library with respect to the gazillion other parsing libraries is that this one allows the poduction rules in your grammar to have a probability attached to them. This allows us to make a better choice in case of an ambiguous sentence: just select the derivation with the highest probability (this is called the Viterbi parse). If you do not need probabilities attached to your parse trees, you are probably better off using [nearley](http://nearley.js.org) instead.
1411

1512
For a theoretical grounding of this work, refer to [*Stolcke, An Efficient Probabilistic Context-Free
1613
Parsing Algorithm that Computes Prefix
@@ -140,6 +137,11 @@ console.log(treeify.asTree(makeTree(viterbi.parseTree)));
140137

141138
Written in TypeScript, published as a [commonjs module on NPM](https://www.npmjs.com/package/probabilistic-earley-parser) and a [single-file minified UMD module on Github](https://github.com/digitalheir/probabilistic-earley-parser-javascript/releases) in vulgar ES5.
142139

140+
This is an implementation of a probabilistic Earley parsing algorithm, which can parse any Probabilistic Context Free Grammar (PCFG) (also
141+
known as Stochastic Context Free Grammar (SCFG)),
142+
or equivalently any language described in Backus-Naur Form (BNF). In these grammars,
143+
rewrite rules may be non-deterministic and have a probability attached to them.
144+
143145
The probability of a parse is defined as the product of the probalities all the applied rules. Usually,
144146
we define probability as a number between 0 and 1 inclusive, and use common algebraic notions of addition and
145147
multiplication.
@@ -151,6 +153,7 @@ semiring which holds the minus log of the probability. So that maps the numbers
151153
between infinity and zero, skewed towards lower probabilities:
152154

153155
#### Graph plot of f(x) = -log(x)
156+
154157
![Graph for f(x) = -log x](https://leibniz.cloudant.com/assets/_design/ddoc/graph%20for%20-log%20x.PNG)
155158

156159

@@ -167,12 +170,10 @@ Note that this implementation does not apply innovations such as [Joop Leo's imp
167170
For a faster parser that work on non-probabilistic grammars, look into [nearley](nearley.js.org).
168171

169172
### Limitations
170-
* I have not provisioned for ε-rules
173+
* I have not provisioned for ε-rules (rules with an empty right hand side)
171174
* Rule probability estimation may be performed using the inside-outside algorithm, but is not currently implemented
172175
* Higher level concepts such as wildcards, * and + are not implemented
173-
* Viterbi parsing (querying the most likely parse tree) only returns one single parse. In the case of an ambiguous sentence, the returned parse is not guaranteed the left-most parse.
174-
* Behavior for strangely defined grammars is not defined, such as when the same rule is defined multiple times with
175-
a different probability
176+
* Viterbi parsing (querying the most likely parse tree) only returns one single parse. In the case of an ambiguous sentence in which multiple dervation have the highest probability, the returned parse is not guaranteed the left-most parse (I think).
176177

177178
## License
178179
This software is licensed under a permissive [MIT license](https://opensource.org/licenses/MIT).

0 commit comments

Comments
 (0)