|
| 1 | +# Typed Grammar Issues |
| 2 | + |
| 3 | +This document catalogs issues with generated typed grammars that cause fallback to the basic parser. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The typed parser uses auto-generated Lark grammars for each MF6 component type (e.g., `gwf-nam.lark`, `gwf-chd.lark`, `gwf-disv.lark`). When these grammars have errors or are incomplete, parsing falls back to the basic (untyped) parser. |
| 8 | + |
| 9 | +The basic parser returns blocks as flat lists of lines, requiring the mapper layer to use heuristic workarounds. The typed parser with `__default__()` returns properly structured dicts, eliminating the need for workarounds. |
| 10 | + |
| 11 | +## Issue 1: Rule Name Conflicts |
| 12 | + |
| 13 | +**Component**: GWF (model-level name file) |
| 14 | +**File**: `flopy4/mf6/codec/reader/grammar/generated/gwf-nam.lark` |
| 15 | +**Error**: `Rule 'list' defined more than once` |
| 16 | + |
| 17 | +### Root Cause |
| 18 | + |
| 19 | +The grammar imports a `list` rule from the typed module (for structured list/recarray data), then tries to define its own `list` rule for the LIST filename option: |
| 20 | + |
| 21 | +```lark |
| 22 | +%import typed.list -> list # Line 10: imports 'list' rule |
| 23 | +... |
| 24 | +list: "list"i string # Line 29: field rule for LIST option |
| 25 | +``` |
| 26 | + |
| 27 | +Lark doesn't allow duplicate rule names, so the grammar fails to compile. |
| 28 | + |
| 29 | +### Why This Happens |
| 30 | + |
| 31 | +The grammar generator creates a rule for each field using the field's name. When a field name conflicts with an imported rule name (`list`, `record`, `array`, etc.), we get a collision. |
| 32 | + |
| 33 | +### Fix Options |
| 34 | + |
| 35 | +**Option A**: Grammar generator should detect conflicts and rename field rules |
| 36 | +```lark |
| 37 | +list_option: "list"i string # Renamed to avoid conflict |
| 38 | +``` |
| 39 | + |
| 40 | +**Option B**: Grammar generator should use a consistent prefix for field rules |
| 41 | +```lark |
| 42 | +field_list: "list"i string |
| 43 | +field_newton: "newton"i "under_relaxation"i |
| 44 | +``` |
| 45 | + |
| 46 | +**Option C**: Don't import unused rules (if `list` structured data isn't used in this file) |
| 47 | + |
| 48 | +### Impact |
| 49 | + |
| 50 | +- GWF model-level files always fall back to basic parser |
| 51 | +- Affects every simulation (all have GWF name files) |
| 52 | +- Lower priority since name files are simple (just file listings) |
| 53 | + |
| 54 | +### Workaround |
| 55 | + |
| 56 | +The basic parser handles name files fine, just less cleanly. Not urgent. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Issue 2: Missing Keywords in Grammar |
| 61 | + |
| 62 | +**Component**: NPF (Node Property Flow package) |
| 63 | +**File**: `flopy4/mf6/codec/reader/grammar/generated/gwf-npf.lark` |
| 64 | +**Error**: `Unexpected token Token('__ANON_2', 'wetfct') at line 3, column 10` |
| 65 | + |
| 66 | +### Root Cause |
| 67 | + |
| 68 | +The NPF file contains a `wetfct` keyword that's not defined in the generated grammar. This suggests: |
| 69 | +1. The grammar generator missed a field from the definition file |
| 70 | +2. The definition file is incomplete |
| 71 | +3. The keyword is deprecated or from a different MF6 version |
| 72 | + |
| 73 | +### Example File Content |
| 74 | + |
| 75 | +``` |
| 76 | +BEGIN options |
| 77 | + SAVE_FLOWS |
| 78 | + wetfct 0.1 |
| 79 | + ... |
| 80 | +END options |
| 81 | +``` |
| 82 | + |
| 83 | +The grammar likely has: |
| 84 | +```lark |
| 85 | +options_fields: (save_flows | ... )* |
| 86 | +# Missing: wetfct rule |
| 87 | +``` |
| 88 | + |
| 89 | +### Fix |
| 90 | + |
| 91 | +Check the NPF definition file (DFN) and ensure all valid options are included in the grammar generation process. Add: |
| 92 | +```lark |
| 93 | +wetfct: "wetfct"i double |
| 94 | +``` |
| 95 | + |
| 96 | +### Impact |
| 97 | + |
| 98 | +- NPF files with `wetfct` option fall back to basic parser |
| 99 | +- Affects models using wetting/drying functionality |
| 100 | +- Medium priority (common in certain model types) |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## Issue 3: Strict Whitespace Handling |
| 105 | + |
| 106 | +**Component**: DISV (Vertex discretization) |
| 107 | +**File**: `flopy4/mf6/codec/reader/grammar/generated/gwf-disv.lark` |
| 108 | +**Error**: `Unexpected token Token('NEWLINE', '\n') at line 8, column 10. Expected one of: END, NVERT, NCPL, NLAY` |
| 109 | + |
| 110 | +### Root Cause |
| 111 | + |
| 112 | +The grammar expects dimension fields to appear consecutively without extra whitespace: |
| 113 | + |
| 114 | +```lark |
| 115 | +dimensions_fields: (nlay | ncpl | nvert)* |
| 116 | +``` |
| 117 | + |
| 118 | +But MF6 files often have blank lines or extra spacing: |
| 119 | +``` |
| 120 | +BEGIN dimensions |
| 121 | + NLAY 1 |
| 122 | + NCPL 121 |
| 123 | + # <-- This blank line causes the error |
| 124 | + NVERT 148 |
| 125 | +END dimensions |
| 126 | +``` |
| 127 | + |
| 128 | +### Why This Happens |
| 129 | + |
| 130 | +The grammar correctly ignores whitespace within lines (`%ignore WS`) and comments (`%ignore SH_COMMENT`), but doesn't handle optional newlines between field definitions gracefully. |
| 131 | + |
| 132 | +The `*` operator means "zero or more fields", but the parser gets confused when it sees a NEWLINE after fields but before END. |
| 133 | + |
| 134 | +### Fix Options |
| 135 | + |
| 136 | +**Option A**: Make the grammar more permissive about newlines |
| 137 | +```lark |
| 138 | +dimensions_fields: (NEWLINE* (nlay | ncpl | nvert) NEWLINE*)* |
| 139 | +``` |
| 140 | + |
| 141 | +**Option B**: Add explicit newline handling in the grammar |
| 142 | +```lark |
| 143 | +dimensions_fields: (dimension_field)* |
| 144 | +dimension_field: (nlay | ncpl | nvert) NEWLINE+ |
| 145 | +``` |
| 146 | + |
| 147 | +**Option C**: Preprocess files to remove blank lines (not recommended) |
| 148 | + |
| 149 | +### Impact |
| 150 | + |
| 151 | +- DISV models always fall back to basic parser |
| 152 | +- Affects all vertex discretization models |
| 153 | +- **High priority** - common model type |
| 154 | + |
| 155 | +### Workaround |
| 156 | + |
| 157 | +The basic parser handles DISV files, but loses the clean dict structure benefits. The mapper workarounds compensate. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## Issue 4: OC Print Format Keywords |
| 162 | + |
| 163 | +**Component**: OC (Output Control) |
| 164 | +**File**: `flopy4/mf6/codec/reader/grammar/generated/gwf-oc.lark` |
| 165 | +**Error**: `Unexpected token Token('__ANON_2', 'COLUMNS') at line 5, column 23` |
| 166 | + |
| 167 | +### Root Cause |
| 168 | + |
| 169 | +The OC file has print format keywords that aren't in the grammar: |
| 170 | + |
| 171 | +``` |
| 172 | +BEGIN options |
| 173 | + HEAD PRINT_FORMAT COLUMNS 10 WIDTH 15 DIGITS 6 GENERAL |
| 174 | + ... |
| 175 | +END options |
| 176 | +``` |
| 177 | + |
| 178 | +The grammar likely only has: |
| 179 | +```lark |
| 180 | +head: "head"i "fileout"i word |
| 181 | +``` |
| 182 | + |
| 183 | +But not: |
| 184 | +```lark |
| 185 | +head_format: "head"i "print_format"i "columns"i integer "width"i integer ... |
| 186 | +``` |
| 187 | + |
| 188 | +### Fix |
| 189 | + |
| 190 | +The grammar needs to handle both HEAD FILEOUT and HEAD PRINT_FORMAT variants: |
| 191 | +```lark |
| 192 | +head: head_fileout | head_format |
| 193 | +head_fileout: "head"i "fileout"i word |
| 194 | +head_format: "head"i "print_format"i format_spec |
| 195 | +format_spec: ("columns"i integer)? ("width"i integer)? ... |
| 196 | +``` |
| 197 | + |
| 198 | +### Impact |
| 199 | + |
| 200 | +- OC files with print format options fall back to basic parser |
| 201 | +- Common in models with detailed output control |
| 202 | +- Medium priority |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Statistics |
| 207 | + |
| 208 | +### Fallback Frequency |
| 209 | + |
| 210 | +To measure fallback frequency, search for warnings in test output: |
| 211 | +```python |
| 212 | +grep "Typed parsing failed for" test_output.txt | sort | uniq -c |
| 213 | +``` |
| 214 | + |
| 215 | +Expected common failures: |
| 216 | +- **Gwf**: Every simulation (name file has `list` conflict) |
| 217 | +- **Disv**: Most vertex grid models (whitespace issue) |
| 218 | +- **Npf**: Some models (missing keywords) |
| 219 | +- **Oc**: Some models (format keywords) |
| 220 | + |
| 221 | +### Success Rate |
| 222 | + |
| 223 | +Components with **working** typed grammars: |
| 224 | +- CHD (Constant Head) |
| 225 | +- DRN (Drain) |
| 226 | +- WEL (Well) |
| 227 | +- RIV (River) |
| 228 | +- GHB (General Head Boundary) |
| 229 | +- RCH (Recharge) |
| 230 | +- Most other boundary packages |
| 231 | + |
| 232 | +These benefit from the `__default__()` structured output immediately. |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## TODO |
| 237 | + |
| 238 | +1. Fix high-priority grammar issues: |
| 239 | + - DISV whitespace handling (affects many models) |
| 240 | + - Rule name conflicts in GWF/GWT/GWE name files |
| 241 | +2. Validate generated grammars: add CI check that all grammars compile |
| 242 | +3. Document known limitations: which keywords/options aren't supported |
| 243 | +4. Consider whether basic parser can structure blocks as dicts too? |
| 244 | + |
| 245 | +## Testing Strategy |
| 246 | + |
| 247 | +### Verify Typed Parsing Works |
| 248 | + |
| 249 | +```python |
| 250 | +from flopy4.mf6.codec.reader import loads |
| 251 | +from flopy4.mf6.gwf.chd import Chd |
| 252 | + |
| 253 | +# Should NOT see fallback warning |
| 254 | +data = loads(chd_file_content, component_type=Chd) |
| 255 | +assert isinstance(data['dimensions'], dict) # Clean structure |
| 256 | +``` |
| 257 | + |
| 258 | +### Verify Fallback Still Works |
| 259 | + |
| 260 | +```python |
| 261 | +# Even if typed parsing fails, basic parser should work |
| 262 | +data = loads(disv_file_content, component_type=Disv) |
| 263 | +# Will see fallback warning, but still get data |
| 264 | +assert 'dimensions' in data |
| 265 | +``` |
0 commit comments