Skip to content

Commit d821505

Browse files
committed
Adding technical documentation
1 parent 50eb94c commit d821505

2 files changed

Lines changed: 77 additions & 0 deletions

File tree

docs/main.tex

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ \section{#2}
5353

5454
\importsection{sections/first-program}{The first program}
5555

56+
\importsection{sections/language-documentation}{Language Documentation}
57+
5658
\importsection{sections/flags}{Flags}
5759

5860
\end{document}
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
The programming language Tau is a very simple language. It only implements a Turing Machine and the interface, which is the programming language.
2+
3+
\subsection{EBNF}
4+
This is how the language is parsed. It does not mean that everything that can be built with this EBNF syntax is also valid.
5+
It is implied that between each token there can be as much whitespace as needed.
6+
The tokens are being declared with uppercase letters.
7+
8+
\begin{verbatim}
9+
program = head, body;
10+
head = { statement };
11+
body = { state };
12+
13+
statement = IDENTIFIER, EQUALS, ( list | IDENTIFIER );
14+
state = IDENTIFIER, CURLY_OPEN, { rule }, CURLY_CLOSE;
15+
rule = (IDENTIFIER | UNDERSCORE), EQUALS, IDENTIFIER, DIRECTION, IDENTIFIER;
16+
17+
list = IDENTIFIER, { COMMA, IDENTIFIER };
18+
19+
IDENTIFIER = character - "_", { character };
20+
character = "A" .. "Z" | "a" .. "z" | "_";
21+
22+
EQUALS = "=";
23+
COMMA = ",";
24+
UNDERSCORE = "_";
25+
26+
DIRECTION = "RIGHT" | "R" | "LEFT" | "L" | "STAY" | "S";
27+
\end{verbatim}
28+
29+
\subsection{Linking}
30+
There are two different linking steps:
31+
\begin{enumerate}
32+
\item Linking for the head where the different statements are being checked and converted.
33+
\item Linking for the body where the references are being built.
34+
\end{enumerate}
35+
36+
\subsubsection{Head Linking}
37+
The emulator that executes the defined Turing Machine does not use strings.
38+
This means that all of these strings have to be converted to integers.
39+
This is being done by using the index of the symbols.
40+
41+
But any statement can be defined in any order, which means that this linking has to be done after
42+
the parsing of the head. This means that this is being done when the divider is being found.
43+
44+
It checks if all the necessary statements are being defined and applies default values if not.
45+
It also checks if the blank is a valid symbol.
46+
47+
\subsubsection{Body Linking}
48+
Any state can be defined in any order. This means that there can be a different state which is being
49+
referenced that will be parsed. This means that the linking is also being done after the parser finds the end
50+
of the file.
51+
52+
The linker also needs the states to be exhaustive. This means that every single possibility is being covered.
53+
For example: If there are the symbols \code{0}, \code{1}, \code{2} and \code{3}, a state like this:
54+
\begin{verbatim}
55+
TEST {
56+
1 = 2, RIGHT, TEST
57+
2 = 2, STAY, HALT
58+
}
59+
\end{verbatim}
60+
will throw an error as the \code{0} and \code{3} aren't defined. This will happen even if this rule will never be reached
61+
as the linker does not know at this stage if this rule will be accessed.
62+
63+
This can be easily avoided by adding a rule that will be triggered for any other symbol:
64+
\begin{verbatim}
65+
TEST {
66+
1 = 2, RIGHT, TEST
67+
2 = 2, STAY, HALT
68+
_ = 1, STAY, HALT
69+
}
70+
\end{verbatim}
71+
72+
The \code{\_} defines that this state should execute this rule in any other scenario on the tape.
73+
It is like the \code{else} block in many programming languages.
74+
75+
There can't be more than one \code{\_}-rules in one state.

0 commit comments

Comments
 (0)