|
| 1 | +# SearchExpressionParser |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +[](https://github.com/Carthage/Carthage) |
| 8 | + |
| 9 | +Parses search strings (as in: what you put into a search engine) into evaluable expressions. |
| 10 | + |
| 11 | +## Parsing |
| 12 | + |
| 13 | +You call the `Parser.parse(searchString:)`. This returns a tree of the parsed expression combinations. You can ask the `Expression` object if it is matches a given haystack, for example: |
| 14 | + |
| 15 | +```swift |
| 16 | +import SearchExpressionParser |
| 17 | +guard let expr = try? Parser.parse(searchString: "Hello") else { fatalError() } |
| 18 | +expr.isSatisfied(by: "Hello World!") // true |
| 19 | +``` |
| 20 | + |
| 21 | +Empty search strings evaluate to a wildcard matching anything. |
| 22 | + |
| 23 | +### Efficient full-text search |
| 24 | + |
| 25 | +To use search expressions effectively in an app, I found it beneficial to operate on an all-lowercase representation of the text and use C's `strstr`. |
| 26 | + |
| 27 | +So in a note-taking app, for example, you should consider lowercasing your notes in-memory and then use C-String comparison for the expressions. |
| 28 | + |
| 29 | +First, make your text implement the `CStringExpressionSatisfiable` protocol: |
| 30 | + |
| 31 | +```swift |
| 32 | +struct Note { |
| 33 | + let text: String |
| 34 | + private let cString: [CChar] |
| 35 | + |
| 36 | + init(text: String) { |
| 37 | + self.text = text |
| 38 | + self.cString = text |
| 39 | + // Favor simple over grapheme cluster characters |
| 40 | + .precomposedStringWithCanonicalMapping |
| 41 | + .cString(using: .utf8)! |
| 42 | + } |
| 43 | +} |
| 44 | + |
| 45 | +import SearchExpressionParser |
| 46 | + |
| 47 | +extension Note: CStringExpressionSatisfiable { |
| 48 | + func matches(needle: [CChar]) -> Bool { |
| 49 | + return strstr(self.cString, needle) != nil |
| 50 | + } |
| 51 | +} |
| 52 | +``` |
| 53 | + |
| 54 | +Then pass this object to the expression. |
| 55 | + |
| 56 | +```swift |
| 57 | +let warAndPeace = Note(String(contentsOf: "books/Tolstoy/War-and-Peace.txt")) |
| 58 | +let protagonist = try! Parser.parse(searchString: "\"Pierre Bezukhov\" OR \"Pyotr Kirillovich\"") |
| 59 | +protagonist.isSatisfied(by: warAndPeace) // true |
| 60 | +``` |
| 61 | + |
| 62 | +This sadly puts the burden of implementing the matching algorithm on your side, but this is by design so you keep a C-String around instead of relying on the framework to convert the text for you on the fly -- because that's be useless. The speed gain is well worth the couple lines of code compared to regular `String.contains` matching, which even gets slower when Emoji are involved. |
| 63 | + |
| 64 | +### Operators |
| 65 | + |
| 66 | +Operators are all caps: `AND`, `OR`, `NOT`/`!`. |
| 67 | + |
| 68 | +- `foo bar baz` is equivalent to `foo AND bar AND baz` |
| 69 | +- `NOT b` equals `!b` |
| 70 | +- `! b` (note the space) is `! AND b` |
| 71 | +- `"!b"` is a phrase search for "!b", matching the literal exclamation mark |
| 72 | +- Escaping works in addition to phrase search, too: `\!b` |
| 73 | +- Escaping in phrase searches also works: `hello "you \"lovely\" specimen"` |
| 74 | +- Escaping operator keywords treats them literal: `\AND`. Note that a lowercase "and" will not be treated as an operator, only all-caps will. |
| 75 | + |
| 76 | +You can parenthesize expressions: |
| 77 | + |
| 78 | + !(foo OR (baz AND !bar)) |
| 79 | + |
| 80 | +... is, of course, equivalent to: |
| 81 | + |
| 82 | + !foo OR !baz AND !foo OR !bar |
| 83 | + |
| 84 | +As of yet, there is no real operator precedence implementation because the full-text search context I was using this in didn't need that. |
| 85 | + |
| 86 | +The `Expression` object of this nested term looks like this: |
| 87 | + |
| 88 | + // !(foo OR (baz AND !bar)) |
| 89 | + NotNode( |
| 90 | + OrNode(lhs: ContainsNode("foo"), |
| 91 | + rhs: AndNode(lhs: ContainsNode("baz"), |
| 92 | + rhs: NotNode(ContainsNode("bar"))))) |
| 93 | + |
| 94 | + |
| 95 | +### Expressions |
| 96 | + |
| 97 | +When you call the high-level `Parser.parse(searchString:)` entry point, you get an object in return that conforms to `Expression`. |
| 98 | + |
| 99 | +The `Expression` protocol is: |
| 100 | + |
| 101 | + public protocol Expression { |
| 102 | + func isSatisfied(by satisfiable: StringExpressionSatisfiable) -> Bool |
| 103 | + func isSatisfied(by satisfiable: CStringExpressionSatisfiable) -> Bool |
| 104 | + } |
| 105 | + |
| 106 | +You can pass the haystack to `isStatisfied`, e.g. the text you want to search. |
| 107 | + |
| 108 | +When the case of words doesn't matter, remember it's much faster if you make the text you want to search conform to `CStringExpressionSatisfiable` and pass _that_ in, instead. See above for details. |
| 109 | + |
| 110 | +The expressions provided are: |
| 111 | + |
| 112 | +- `AnythingNode` will match anything you put it; it's the wildcard or empty search. |
| 113 | +- `ContainsNode` represents check similar to `String.contains`. |
| 114 | +- `NotNode` wraps 1 other node and reverses the result of its outcome. |
| 115 | +- `AndNode` and `OrNode` both take 2 other notes and combine their results with the boolean operator equivalents. |
| 116 | + |
| 117 | +## License |
| 118 | + |
| 119 | +Copyright (c) 2018 Christian Tietze. Distributed under the MIT License. |
| 120 | + |
| 121 | +## Apps that use this |
| 122 | + |
| 123 | +- [The Archive](https://zettelkasten.de/the-archive/), a fast plain-text note-taking app for macOS. |
0 commit comments