You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add what to verify section
* Fix syntax
* Minor updates to verify paragraph in elf part
* Update updates
* Fix broken link
* Move useful info section from appendices
* Changes requested
* Minor fix on Nasm appendice
* Minor typo fixes
* Minor fix on cross compiling chapter
* Minor fix on cross compile chapter
* changes requested
Copy file name to clipboardExpand all lines: 00_Introduction/01_README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,6 @@ Below the list of parts that compose the book:
25
25
**Inter-Process Communication (IPC)* - This part looks at how we might implement IPC for our kernel, allowing isolated programs to communicate with each other in a controlled way.
26
26
**Virtual File System (VFS)* - This part will cover how a kernel presents different file systems to the rest of the system. We'll also take a look at implementing a 'tempfs' that is loaded from a tape archive (tar), similar to initrd.
27
27
**The ELF format* - Once we have a file system we can load files from it, why not load a program? This part looks at writing a simple program loader for ELF64 binaries, and why you would want to use this format.
28
-
**Going Beyond* - The final part (for now). We have implemented all the core components of a kernel, and we are free to go from here. This final chapter contains some ideas for new components that you might want to add, or at least begin thinking about.
28
+
**Going Beyond* - The final part (for now). We have implemented all the core components of a kernel, and we are free to go from here. This final chapter contains some ideas for new components that we might want to add, or at least begin thinking about.
29
29
30
30
In the appendices we cover various topic, from debugging tips, language specific information, troubleshooting, etc.
You'll notice this function looks almost identical to the `create_process` function from before. That's because a lot of it is the same! The first part of the function is just inserting the new thread at the end of the list of threads.
252
+
We can see that this function looks almost identical to the `create_process` outlined earlier in this chapter. That's because a lot of it is the same! The first part of the function is just inserting the new thread at the end of the list of threads.
253
253
254
254
Let's look at how our `create_process` function would look now:
Copy file name to clipboardExpand all lines: 09_Loading_Elf/01_Elf_Theory.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,12 +78,12 @@ Finding the program headers within an ELF binary is also quite straightforward.
78
78
Like section headers, each program header is tighly packed against the next one. This means that program headers can be treated as an array. As an example is possible to loop through the _phdrs_ as follows:
Copy file name to clipboardExpand all lines: 09_Loading_Elf/02_Loading_And_Running.md
+43-1Lines changed: 43 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,14 +14,56 @@ In the previous chapter we looked at the details of loading program headers, but
14
14
15
15
- First a copy of the ELF file to be loaded is needed. The recommended way is to load a file via the VFS, but it could be a bootloader module or even embedded into the kernel.
16
16
- Then once the ELF is loaded, we need verify that its header is correct. Also check the architecture (machine type) matches the current machine, and that the bit-ness is correct (dont try to run a 32-bit program if you dont support it!).
17
-
- Find all the loadable program headers for the ELF, we'll need those in a moment.
17
+
- Find all the program headers with the type `PT_LOAD`, we'll need those in a moment.
18
18
- Create a new address space for the program to live in. This usually involves creating a new process with a new VMM instance, but the specifics will vary depending on your design. Don't forget to keep the kernel mappings in the higher half!
19
19
- Copy the loadable program headers into this new address space. Take care when writing this code, as the program headers may not be page-aligned:. Don't forget to zero the extra bytes between `memsz` and `filesz`.
20
20
- Once loaded, set the appropriate permission on the memory each program header lives in: the write, execute (or no-execute) and user flags.
21
21
- Now we'll need to create a new thread to act as the main thread for this program, and set its entry point to the `e_entry` field in the ELF header. This field is the start function of the program. You'll also need to create a stack in the memory space of this program for the thread to use, if this wasnt already done as part of our thread creation.
22
22
23
23
If all of the above are done, then the program is ready to run! We now should be able to enqueue the main thread in the scheduler and let it run.
24
24
25
+
### Verifying an ELF file
26
+
27
+
When veryfying an ELF file there are few things we need to check in order to decide if an executable is valid, the fields to validate are at different points in the ELF header. Some can be found in the `e_ident` field, like the following:
28
+
29
+
* The first thing we want to check is the magic number, this is the `ELFMAG` part. It is expected to be the following values:
30
+
31
+
| Value | Byte|
32
+
|-------|-----|
33
+
|`0x7f`| 0 |
34
+
|`E`| 1 |
35
+
|`L`| 2 |
36
+
|`F`| 3 |
37
+
38
+
* We need to check that the file class match with the one we are supporting. There are two possible classes: 64 and 32. This is byte 4
39
+
* The data field indicates the _endiannes_, again this depends on the architecture used. It can be three values: None (0), LSB (1) and MSB (2). For example `x86_64` architecture endiannes is LSB, then the value is expected to be 1. This field is in the byte 5.
40
+
* The version field, byte 6, to be a valid elf it has to be set to 1 (EVCURRENT).
41
+
* The OS Abi and Abi version they identify the operating system together with the ABI to which the object is targeted and the version of the ABI to which the object is targeted, for now we can ignore them, the should be 0.
42
+
43
+
Then from the other fields that need validation (that area not in the `e_ident` field) are:
44
+
45
+
*`e_type`: this identifies the type of elf, for our purpose the one to be considered valid this value should be 2 that indicates an Executable File (ET_EXEC) there are other values that in the future we could support, for example the `ET_DYN` type that is used for position independent code or shared object, but they require more work to be done.
46
+
*`e_machine`: this indicates the required architecture for the executable, the value depends on the architectures we are supporting. For example the value for the AMD64 architecture is `62`
47
+
48
+
Be aware that most of the variables and their values have a specific naming convention, for more information refer to the ELF specs.
49
+
50
+
Beware that some compilers when generating a simple executable are not using the `ET_EXEC` value, but it could be of the type `ET_REL` (value 1), to obtain an executable we need to link it using a linker. For example if we generated the executable: `example.elf` with `ET_REL` type, we can use `ld` (or another equivalent linker):
51
+
52
+
```sh
53
+
ld -o example.o example.elf
54
+
```
55
+
56
+
For basic executables, we most likely don't need to include any linker script.
57
+
58
+
If we want to know the type of an elf, we can use the `readelf` command, if we are on a unix-like os:
59
+
```sh
60
+
readelf -e example.elf
61
+
```
62
+
63
+
Will print out all the executable information, including the type.
64
+
65
+
66
+
25
67
## Caveats
26
68
27
69
As we can already see from the above restrictions there is plenty of room for improvement. There are also some other things to keep in mind:
Copy file name to clipboardExpand all lines: 99_Appendices/C_Language_Info.md
-5Lines changed: 0 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -199,8 +199,3 @@ If unfamiliar with caching, each page can have a set of caching attributes appli
199
199
One of these is write-combine. It works by queueing up writes to nearby areas of memory, until a buffer is full, and then flushing them to main memory in a single access. This is much faster than accessing main memory each write.
200
200
201
201
However if we're working with an older x86 cpu, or another platform, this solution is not available. Hence `volatile` would do the job.
202
-
203
-
### Useful Links
204
-
205
-
*[IBM inline assembly guide](https://www.ibm.com/docs/en/xl-c-and-cpp-linux/13.1.4?topic=compatibility-inline-assembly-statements) This is not gcc related, but the syntax is identical, and it contains useful and concise info on the constraintsa
206
-
*[Fedora Inline assembly list of constraints](https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/how-to-use-inline-assembly-language-in-c-code.html#simple-constraints)
Copy file name to clipboardExpand all lines: 99_Appendices/D_Nasm.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,15 +79,16 @@ quad_var:
79
79
dq 133.463 ; Example with a real number
80
80
```
81
81
82
-
But what if we want to declare a string? Well in this case we can use a different syntax for db:
82
+
If we want to declare a stringwe need to use a different syntax for db:
83
83
84
84
```nasm
85
85
string_var:
86
86
db "Hello", 10
87
87
```
88
-
What does it mean? We are simply declaring a variable (string_variable) that starts at 'H', and fill the consecutive bytes with the next letters. But what about the last number? It is just an extra byte, that represents the newline character. So what we are really storing is the string _"Hello\\n"_
89
88
90
-
Now what we have seen so far is valid for a variable that can be initialized with a value, but what if we don't know the value yet, but we want just to "label" it with a variable name? Well is pretty simple, we have equivalent directives for reserving memory:
89
+
The above code means that we are declaring a variable (`string_variable`) that starts at 'H', and fill the consecutive bytes with the next letters. And what about the last number? It is just an extra byte, that represents the newline character, so what we are really storing is the string _"Hello\\n"_
90
+
91
+
What we have seen so far is valid for a variable that can be initialized with a value, but what if we don't know the value yet, but we want just to "label" it with a variable name? Well is pretty simple, we have equivalent directives for reserving memory:
Copy file name to clipboardExpand all lines: 99_Appendices/E_Cross_Compilers.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ There's three environment variables we're going to use during the build process:
36
36
37
37
```
38
38
export PREFIX="/your/path/to/cross/compiler"
39
-
export TARGET="riscv64-elf"
39
+
export TARGET="x86_64-elf"
40
40
export PATH="$PREFIX/bin:$PATH"
41
41
```
42
42
@@ -154,16 +154,16 @@ Qemu is quite a large program, so it's recommended to make use of all cores when
154
154
155
155
## GDB
156
156
157
-
The steps for building GDB are similar to binutils and GCC. We'll create a temporary working directory and move into it. Gdb has a few extra dependencies we'll need:
157
+
The steps for building GDB are similar to binutils and GCC. We'll create a temporary working directory and move into it. Gdb has a few extra dependencies we'll need (name can be different depending on the distribution used):
The last two options enable compiling the text-user-interface (`--enable-tui`) and source code highlighting (`--enable-source-highlight) which are nice-to-haves. These flags can be safely omitted if these aren't features we want.
166
+
The last two options enable compiling the text-user-interface (`--enable-tui`) and source code highlighting (`--enable-source-highlight`) which are nice-to-haves. These flags can be safely omitted if these aren't features we want.
167
167
168
168
The `--target=` flag is special here in that it can also take an option `all` which builds gdb with support for every single architecture it can support. If we're going to develop on one machine but test on multiple architectures (via qemu or real hardware) this is nice. It allows a single instance of gdb to debug multiple architectures without needing different versions of gdb. Often this is how the 'gdb-multiarch' package is created for distros that have it.
*[IBM inline assembly guide](https://www.ibm.com/docs/en/xl-c-and-cpp-linux/13.1.4?topic=compatibility-inline-assembly-statements) This is not gcc related, but the syntax is identical, and it contains useful and concise info on the constraintsa
87
+
*[Fedora Inline assembly list of constraints](https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/how-to-use-inline-assembly-language-in-c-code.html#simple-constraints)
0 commit comments