Skip to content

Commit cd00e55

Browse files
committed
Linux: try to improve coherency of page table and memory layout sections
1 parent 2eec82e commit cd00e55

1 file changed

Lines changed: 129 additions & 103 deletions

File tree

docs/Linux.md

Lines changed: 129 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ into our Linux on OpenRISC tutorials. We will cover:
1616
* Boot loaders - we need to get Linux onto the system, we will explain how this
1717
is done.
1818
* Device tree - how does Linux know what hardware is available in the system
19-
* Toolchains - We covered this before, but a quick refresher on linux
19+
* Toolchains - We covered this before, but a quick refresher on Linux
2020
specific toolchains
2121
* Rootfs - Applications
2222
* Memory layout - we explain how devices, Linux and our user processes
@@ -32,32 +32,38 @@ If you wish to skip this you can continue directly with our tutorials:
3232

3333
### Boot loaders
3434

35-
The job of the [boot loader](https://en.wikipedia.org/wiki/Bootloader) is to prepare the operating system to boot
36-
and then boot it. In the most simple sense this means loading the operating system kernel into memory and then
37-
jumping to the entry point. Traditionally the popular Linux boot loader is [GRUB](https://www.gnu.org/software/grub/).
38-
However, on embedded Linux platforms like OpenRISC Linux more simple loaders are used. These include:
35+
The job of the [boot loader](https://en.wikipedia.org/wiki/Bootloader) is to
36+
prepare the operating system to boot and then boot it. In the most simple sense
37+
this means loading the operating system kernel into memory and then jumping to
38+
the entry point. Traditionally the popular Linux boot loader is
39+
[GRUB](https://www.gnu.org/software/grub/). However, on embedded Linux
40+
platforms like OpenRISC Linux more simple loaders are used. These include:
3941

4042
- For Simulators - or1ksim and QEMU provide built in boot loaders
4143
- FPGA Boards - For larger FPGA boards with litex support we use the litex bios
4244
- Tiny FPGA Boards - For tiny FPGA boards we use GDB as a simple boot loader
4345

44-
Simulators like `or1ksim` and `QEMU` have the ability to be passed a kernel ELF image from the command
45-
line. When the simulator is initialized they can read the ELF binary and load the bits directly into the simulator memory.
46-
In `QEMU` it will additionally generate and load a device tree to describe to the kernel what hardware
47-
is available, dynamically. After the system and memory are initialized the simulator CPU will jump to `0x100`
48-
the entry point of the OpenRISC platform.
49-
50-
On typical FPGA boards there is storage available to store a bootloader and devices available to store the operating system.
51-
For example on the [Digilent Arty](https://digilent.com/shop/arty-a7-100t-artix-7-fpga-development-board/) when
52-
the FPGA bitstream is programmed a ROM is programmed with the [litex bios](https://github.com/enjoy-digital/litex/blob/master/litex/soc/software/bios/main.c).
53-
This firmware plus boot loader will train DDR3 RAM before loading and jumping to the kernel entry point.
54-
The litex bios can load the operating system from an SD-card or from TFTP over a network connection.
55-
56-
On very Tiny FPGA boards like a base De0 Nano lacking non-volatile storage,
57-
there is no means to load an OS via SD-card or network. We use GDB, a debugger
46+
Simulators like `or1ksim` and `QEMU` have the ability to be passed a kernel ELF
47+
image from the command line. When the simulator is initialized they will read
48+
the ELF binary and load the binary content directly into the simulator memory.
49+
In `QEMU` it will additionally generate and load a device tree to describe to
50+
the kernel what hardware is available, dynamically. After the system and memory
51+
are initialized the simulator CPU will jump to `0x100` the entry point of the
52+
OpenRISC platform.
53+
54+
On typical FPGA boards there is storage available to store a bootloader and
55+
devices available to store the operating system. For example on the [Digilent Arty](https://digilent.com/shop/arty-a7-100t-artix-7-fpga-development-board/)
56+
when the FPGA bitstream is programmed a ROM is programmed with the [litex bios](https://github.com/enjoy-digital/litex/blob/master/litex/soc/software/bios/main.c).
57+
This firmware plus boot loader will train DDR3 RAM before loading and jumping to
58+
the kernel entry point. The litex bios can load the operating system from an
59+
SD-card or from TFTP over a network connection.
60+
61+
On very tiny FPGA boards like a base De0 Nano lacking non-volatile storage,
62+
there may be no means to load an OS via SD-card or network. We use GDB, a debugger
5863
typically used to read and write CPU and memory state. We can leverage this to
59-
load ELF kernel images into memory over the JTAG debug interface. Once, memory
64+
load ELF kernel images into memory over a JTAG debug interface. Once, memory
6065
is loaded we can reset the CPU to have it jump to `0x100` and boot the kernel.
66+
Address `0x100` is the OpenRISC default reset vector.
6167

6268
### Device tree
6369

@@ -69,13 +75,14 @@ a boot parameter via register `r3`.
6975

7076
The below is a very simple device tree source file describing an OpenRISC system
7177
with:
78+
7279
- 1 CPU
7380
- 1 UART at 0x90000000
7481
- 32 MB main memory at address 0x0
7582
- 20 Mhz clock
7683

7784
The device tree will be compiled down to a `.dtb` binary file using the device
78-
tree compiler (`dtc`) durig the build processes. During the boot process the
85+
tree compiler (`dtc`) during the build processes. During the boot process the
7986
kernel uses the device tree definitions to initialize devices and memory.
8087

8188
```
@@ -128,8 +135,9 @@ kernel uses the device tree definitions to initialize devices and memory.
128135

129136
To compile the Linux kernel itself the toolchain used is not very important,
130137
as the kernel doesn't depend on any toolchain runtime features. You can use
131-
any toolchain to build the kernel.
132-
However, if you want to build userspace applications choosing the correct
138+
any toolchain to build the kernel, as long as it is a recent OpenRISC
139+
toolchain.
140+
However, if you want to build user space applications choosing the correct
133141
toolchain requires some thought. The main choices are:
134142

135143
- [musl](../musl.html) - A lightweight and efficient toolchain
@@ -141,104 +149,98 @@ runtime installed.
141149

142150
### Rootfs
143151

144-
The rootfs is like the Linux distribution for an embedded linux.
152+
The rootfs is like the Linux distribution for an embedded Linux.
145153

146154
We provide some [prebuilt rootfs images](https://github.com/stffrdhrn/or1k-rootfs-build) to
147-
help get you started. The main choices are:
155+
help get you started. The top choices are:
148156

149157
- buildroot - a fully featured rootfs ideal for boards with and sd-card, with
150-
well known utilties like `bash`.
151-
- busybox - a lightweight single binary rootfs, comming in at under 3MB
158+
well known utilities like `bash`.
159+
- busybox - a lightweight single binary rootfs, coming in at under 3MB
152160

153161
### Memory Layout
154162

155163
The OpenRISC is able to address up to 32-bits of address space giving us up
156164
to 4GB of addressable memory. The space is shared between user space, the
157-
kernel and hardware devices.
165+
kernel and hardware devices. Memory protection between processes is achieved
166+
using the OpenRISC memory management unit **MMU**.
158167

159-
Paging
168+
The OpenRISC MMU uses 8KB (13-bits) pages leaving the most significant 19-bits
169+
for indexing into a software page table. The architecture uses a 2-level [page table](linux/mm/page_tables.rst)
170+
using 8-bits to index a 256 entry page directory and 11-bits to index 2048 page table entry leaf nodes.
160171

161-
Openrisc uses 2-level paging
172+
The **page global directory** or **pgd** looks like the following in OpenRISC:
162173

163174
```
164-
_ 11 bits for pte offset
165-
/
166-
| __-- 13 bit pages
167-
|/ \
168-
| |
169-
/ \ |
170-
0xfffe0000
171-
\/
172-
\_ top 8 bit used for pgd
173-
174-
175+
PGD (256 entries)
176+
177+
--> +-----+ PTE (2048 entries)
178+
| ptr |-------> +-----+
179+
| ptr |- | ptr |-------> PAGE
180+
| ptr | \ | ptr |
181+
| ptr | \ ...
182+
| ... | \
183+
| ptr | \ PTE
184+
+-----+ +----> +-----+
185+
| ptr |-------> PAGE
186+
| ptr |
187+
...
188+
189+
PMD, PUD and P4D are folded up on OpenRISC
175190
```
176191

177-
Notes for or1k PGD
192+
Virtual address bits are used to index into the page table
193+
and derive the physical address as below:
178194

179195
```
180-
PGD - dir top 8 bits - 256 enties pgd_offset
181-
PMD - mid 1
182-
PTE - entry least sig 11 bits of page - 2048 entries in PTE page
183-
184-
pte_offset
185-
return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
186-
187-
[ 8 ][ 11 ][ 13 ]
188-
189-
13 + 13-2 => 24
190-
1 << 24
191-
192-
#define PGDIR_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-2))
193-
#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
196+
+--------+--------+--------+--------+
197+
| 31 24 | 23 16 | 15 8 | 7 0 |
198+
+--------+--------+--------+--------+
199+
| | |
200+
| | v
201+
| | [12:0] in-page offset
202+
| +-------> [23:13] PTE index
203+
+-----------------> [21:24] PGD index
204+
```
194205

195-
1 << 8
206+
The are defined in `page.h` and `pgtable.h` as follows:
196207

197-
1 Page per PTE / 4 => 2048
198-
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-2))
208+
From page.h:
199209

200-
2048
210+
```
211+
#define PAGE_SHIFT 13 // 8KB
212+
```
201213

202-
#define PTRS_PER_PGD (1UL << (32-PGDIR_SHIFT))
214+
From pgtable.h:
203215

204-
256
216+
```
217+
#define PGDIR_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-2)) // 24
218+
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-2)) // 2048
219+
#define PTRS_PER_PGD (1UL << (32-PGDIR_SHIFT)) // 256
205220
221+
#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
206222
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
207-
128
208-
209-
swapper_pg_dir[PTRS_PER_PGD];
210-
211-
if (ret) {
212-
memset(ret, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
213-
memcpy(ret + USER_PTRS_PER_PGD,
214-
swapper_pg_dir + USER_PTRS_PER_PGD,
215-
(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
216-
217-
}
218-
219-
0-128 - zeroed for users
220-
128-256 - copied from kernel
221-
222-
page
223-
31 ... 13 - this is what it should be
224-
225-
31 ... 10
226-
227-
* An OR32 PTE looks like this:
228-
*
229-
* | 31 ... 10 | 9 | 8 ... 6 | 5 | 4 | 3 | 2 | 1 | 0 |
230-
* Phys pg.num L PP Index D A WOM WBC CI CC
231-
*
232223
```
233224

225+
The definition of `USER_PTRS_PER_PGD` evaluates to 128. This macro is used to
226+
reserve the first 128 pfn's for user space leaving pfn's 128 to 255 for kernel
227+
space.
228+
234229
#### Physical Addresses
235230

236231
In Linux SoC's our data caches are configured with a 31-bit addresses width.
237-
This means only the first 2GB of memory addresses are cached. This is useful
232+
This means only the first 2GB of physical memory space addresses are cached. This is useful
238233
as it guarantees that all operations on addresses above `0x80000000` are not cached.
239234
We use these upper address ranges for IO devices which we do not want to be
240235
cached.
241236

237+
This means that technically OpenRISC systems cannot have more than 2GiB of main
238+
memory. However, due to the OpenRISC kernel not supporting highmem and some
239+
other reserved address space, the main memory limit is about 768MiB; which is
240+
plenty for OpenRISC embedded system.
241+
242+
The physical address space looks like the follow:
243+
242244
```
243245
Address Range | Description
244246
-------------------+---------------------------
@@ -250,8 +252,8 @@ Address Range | Description
250252
#### Virtual Memory
251253

252254
Virtual memory in Linux is split between kernel space and user space as below.
253-
There is 1GB reserved for the kernel, 2GB reserved for userspace and a 1GB hole
254-
which we reserver for other purposes.
255+
There is 1GB reserved for the kernel, 2GB reserved for user space and a 1GB hole
256+
which we reserve for other purposes.
255257

256258
OpenRISC uses 8kb pages.
257259

@@ -270,8 +272,8 @@ OpenRISC uses 8kb pages.
270272
+----
271273
```
272274

273-
If we look at the Linux kernel ELF binary we see the following.
274-
275+
We can see how this works in practice if we look at a Linux kernel ELF binary as
276+
below:
275277

276278
```
277279
readelf -S vmlinux
@@ -305,9 +307,22 @@ Section Headers:
305307
[23] .symtab SYMTAB 00000000 66aec58 066db0 10 24 14480 4
306308
[24] .strtab STRTAB 00000000 6715a08 069418 00 0 0 1
307309
[25] .shstrtab STRTAB 00000000 677ee20 0000ff 00 0 0 1
310+
311+
Program Headers:
312+
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
313+
LOAD 0x002000 0xc0000000 0x00000000 0x549344 0x549344 R E 0x2000
314+
LOAD 0x54c000 0xc054a000 0x0054a000 0x485ea0 0x499c20 RWE 0x2000
315+
NOTE 0x60d85c 0xc060b85c 0x0060b85c 0x00054 0x00054 R 0x4
316+
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
308317
```
309318

310-
For fat kernels the rootfs is built into `.data` section. As we can see below.
319+
Notice the **Program Headers** reveal that only some of the sections
320+
are loaded into memory. Many of the ELF binary sections above are used
321+
for debugging. The main executable section `.text` is loaded starting at address `0x0`.
322+
The other sections are added after that. The virtual addresses
323+
of the sections have a base of `0xc0000000`.
324+
325+
For *"fat"* kernels a rootfs is built into `.data` section. As we can see below.
311326

312327
```
313328
$ nm vmlinux | grep __irf_
@@ -324,15 +339,6 @@ In the above example, we can see the included data is about 3.4 MB in size.
324339
The rootfs is included into the kernel image using the Makefile and tools
325340
in the `usr/` directory of kernel source tree.
326341

327-
```
328-
Program Headers:
329-
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
330-
LOAD 0x002000 0xc0000000 0x00000000 0x549344 0x549344 R E 0x2000
331-
LOAD 0x54c000 0xc054a000 0x0054a000 0x485ea0 0x499c20 RWE 0x2000
332-
NOTE 0x60d85c 0xc060b85c 0x0060b85c 0x00054 0x00054 R 0x4
333-
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
334-
```
335-
336342
If we have a look at the ELF binary of a user space process we see the
337343
following:
338344

@@ -385,7 +391,10 @@ Program Headers:
385391
386392
```
387393

388-
When this is running we can see it maps into user space as follows.
394+
Notice how the virtual addresses of the loaded sections have a base address
395+
of `0x00000000`, not `0xc0000000` as we saw in the Linux kernel binary above.
396+
397+
When this binary is running we can see it maps into user space as follows.
389398

390399
```
391400
~ # cat /proc/1/maps
@@ -401,4 +410,21 @@ When this is running we can see it maps into user space as follows.
401410
7ff84000-7ffa6000 rw-p 00000000 00:00 0 [stack]
402411
```
403412

413+
We can see a few things looking at this map:
414+
415+
- The first page is not mapped; mapping starts at 0x2000. This
416+
allows accesses to `0x0` to throw a null pointer exception.
417+
- The binary sections are loaded into executable, read only and read write
418+
protected regions.
419+
- A dynamic heap has been allocated.
420+
- Shared libraries are mapped into memory space around the `0x30000000`
421+
range.
422+
- The stack is high in the virtual memory address space around `0x7fffffff`.
423+
It grows down.
424+
425+
### Conclusion
404426

427+
We have gone over some of the internals of the OpenRISC Linux implementation.
428+
We hope this helps you in the understanding of the fundamentals of embedded
429+
Linux and will improve your understanding of the Linux bring up tutorials that
430+
follow.

0 commit comments

Comments
 (0)