syntax-tokenize is the first stage of Oak's parsing pipeline. It converts raw Oak source text into a stream of typed token objects, preserving shebangs, newlines, and comments alongside the semantic tokens. The token stream is later filtered and consumed by syntax-parse.
tok := import('syntax-tokenize')
{ Tokenizer: Tokenizer, renderToken: renderToken, renderPos: renderPos, shebang?: shebang? } := import('syntax-tokenize')
Returns true when text begins with a shebang (#!) line.
shebang?('#!/usr/bin/env oak\nfn main ...') // => true
shebang?('fn main ...') // => false
Formats a position triple [index, line, col] as a [line:col] string.
renderPos([0, 3, 12]) // => '[3:12]'
Returns a human-readable string describing a token, including its type, optional value, and position. Useful for error messages.
renderToken({ type: :identifier, val: 'foo', pos: [0, 1, 1] })
// => ':identifier(foo) [1:1]'
Creates a tokenizer object for source. Call .tokenize() on it to produce the full token list.
{ tokenize: tokenize } := Tokenizer(sourceCode)
tokens := tokenize()
The returned token list includes all token types, including :newline and :comment. Filter these out before passing to the parser when building an AST.
Each token is an object with the following fields:
| Field | Type | Description |
|---|---|---|
type |
atom | Token type (see table below). |
val |
string | Token value, or ? for punctuation with no content. |
pos |
list | [byteIndex, line, col] — 1-based line and column. |
| Type | Example |
|---|---|
:stringLiteral |
'hello' |
:numberLiteral |
42, 3.14 |
:trueLiteral |
true |
:falseLiteral |
false |
| Type | Example |
|---|---|
:identifier |
foo, bar? |
:atom |
:ok, :error |
:ifKeyword |
if |
:fnKeyword |
fn |
:withKeyword |
with |
:csKeyword |
cs |
:underscore |
_ |
| Type | Symbol | Type | Symbol |
|---|---|---|---|
:assign |
:= |
:nonlocalAssign |
<- |
:pipeArrow |
|> |
:branchArrow |
-> |
:pushArrow |
<< |
:rshift |
>> |
:plus |
+ |
:minus |
- |
:times |
* |
:divide |
/ |
:modulus |
% |
:xor |
^ |
:and |
& |
:or |
| |
:eq |
= |
:neq |
!= |
:greater |
> |
:less |
< |
:geq |
>= |
:leq |
<= |
:exclam |
! |
:colon |
: |
:ellipsis |
... |
:qmark |
? |
| Type | Symbol |
|---|---|
:leftParen |
( |
:rightParen |
) |
:leftBracket |
[ |
:rightBracket |
] |
:leftBrace |
{ |
:rightBrace |
} |
:dot |
. |
:comma |
, |
| Type | Notes |
|---|---|
:newline |
Significant for semicolon insertion; usually filtered. |
:comment |
// ... line comment content. |