int

Thanks @MambaWong for pointing out the errors #22 of this article

related file
memory layout
how is element stored inside
small ints

related file

cpython/Objects/longobject.c
cpython/Include/longobject.h
cpython/Include/longintrepr.h

memory layout

After Python 3, there's only a type named int. The long type in Python 2.x is the int type in Python 3.x.

The structure of long object looks like the structure of tuple object. Obviously, there's only one field to store the real int value: ob_digit.

But how does CPython represent the variable-size int at the byte level? Let's see

how is element stored inside

ingeter 0

Notice: when the value is 0, the ob_digit field doesn't store anything. The value 0 in ob_size indicates that the long object represents integer 0

i = 0

ingeter 1

There are two different types of ob_digit depending on your system.

#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
typedef int64_t stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    30
#define _PyLong_DECIMAL_SHIFT   9 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)1000000000) /* 10 ** DECIMAL_SHIFT */
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit; /* signed variant of digit */
typedef unsigned long twodigits;
typedef long stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    15
#define _PyLong_DECIMAL_SHIFT   4 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)10000) /* 10 ** DECIMAL_SHIFT */

I've modified the source code to change the value of PYLONG_BITS_IN_DIGIT to 15

But when it represents integer 1, ob_size becomes 1 and the field in ob_digit represents the value 1 with type unsigned short

i = 1

ingeter -1

When i becomes -1, the only difference from integer 1 is the value in the ob_size field. CPython interprets ob_size as a signed type to differentiate between positive and negative signs

i = -1

ingeter 1023

The basic unit is type digit, which provides 2 bytes (16 bits) for storage. Since 1023 takes the rightmost 10 bits, the value of the ob_size field is still 1.

ingeter 32767

The integer 32767 is represented in the same way as usual

ingeter 32768

CPython doesn't use all the 16 bits in the digit field. The first bit of every digit is reserved; we will see why later

little endian and big endian

Notice: because digit is the smallest unit at the CPython abstract level, the order between bytes inside a single ob_digit is the same as your machine order (mine is little endian).

The order between digits in the ob_digit array is represented in most-significant-digit-in-the-rightmost order (little endian order).

We can have a better understanding with the integer value -262143.

The minus sign is stored in the ob_size field.

The integer 262143 (2^18 = 262144) in binary representation is 00000011 11111111 11111111

reserved bit

Why is the leftmost bit in digit reserved? Why is the order between digits in the ob_digit array represented as little-endian?

Let's try to add two integer values

i = 1073741824 - 1 # 1 << 30 == 1073741824
j = 1

k = i + j

First, initialize a temporary PyLongObject with size = max(size(i), size(j)) + 1

Step 1: sum the first digit in each ob_digit array into a variable named carry

Step 2: set the value in temp[0] to (carry & PyLong_MASK)

Step 3: right shift the carry up to the leftmost bit

Step 4: add the second digit in each ob_digit array to the result of carry

Step 5: set the value in temp[1] to (carry & PyLong_MASK)

Step 6: right shift the carry again

Go to step 4 and repeat until no more digit is left. Set the final carry to the last index of temp

The variable temp contains the sum. Now you see the reserved bit is used for the carry or borrow when you add/subtract an integer. The digits in the ob_digit array are stored in little-endian order so that the add/subtract operation can process each digit from left to right.

The subtraction operation is similar to the addition operation, so you can read the source code directly

small ints

CPython also uses a buffer pool to store frequently used integers

#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

let's see

c = 0
d = 0
e = 0
print(id(c), id(d), id(e)) # 4480940400 4480940400 4480940400
a = -5
b = -5
print(id(a), id(b)) # 4480940240 4480940240
f = 1313131313131313
g = 1313131313131313
print(id(f), id(g)) # 4484604176 4484604016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int

contents

related file

memory layout

how is element stored inside

ingeter 0

ingeter 1

ingeter -1

ingeter 1023

ingeter 32767

ingeter 32768

little endian and big endian

reserved bit

small ints

FilesExpand file tree

long.md

Latest commit

History

long.md

File metadata and controls

int

contents

related file

memory layout

how is element stored inside

ingeter 0

ingeter 1

ingeter -1

ingeter 1023

ingeter 32767

ingeter 32768

little endian and big endian

reserved bit

small ints