Skip to content

Latest commit

 

History

History
214 lines (131 loc) · 7.25 KB

File metadata and controls

214 lines (131 loc) · 7.25 KB

int

Thanks @MambaWong for pointing out the errors #22 of this article

contents

related file

  • cpython/Objects/longobject.c
  • cpython/Include/longobject.h
  • cpython/Include/longintrepr.h

memory layout

memory layout

After Python 3, there's only a type named int. The long type in Python 2.x is the int type in Python 3.x.

The structure of long object looks like the structure of tuple object. Obviously, there's only one field to store the real int value: ob_digit.

But how does CPython represent the variable-size int at the byte level? Let's see

how is element stored inside

ingeter 0

Notice: when the value is 0, the ob_digit field doesn't store anything. The value 0 in ob_size indicates that the long object represents integer 0

i = 0

0

ingeter 1

There are two different types of ob_digit depending on your system.

#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
typedef int64_t stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    30
#define _PyLong_DECIMAL_SHIFT   9 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)1000000000) /* 10 ** DECIMAL_SHIFT */
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit; /* signed variant of digit */
typedef unsigned long twodigits;
typedef long stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT    15
#define _PyLong_DECIMAL_SHIFT   4 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE    ((digit)10000) /* 10 ** DECIMAL_SHIFT */

I've modified the source code to change the value of PYLONG_BITS_IN_DIGIT to 15

But when it represents integer 1, ob_size becomes 1 and the field in ob_digit represents the value 1 with type unsigned short

i = 1

1

ingeter -1

When i becomes -1, the only difference from integer 1 is the value in the ob_size field. CPython interprets ob_size as a signed type to differentiate between positive and negative signs

i = -1

-1

ingeter 1023

The basic unit is type digit, which provides 2 bytes (16 bits) for storage. Since 1023 takes the rightmost 10 bits, the value of the ob_size field is still 1.

1023

ingeter 32767

The integer 32767 is represented in the same way as usual

32767

ingeter 32768

CPython doesn't use all the 16 bits in the digit field. The first bit of every digit is reserved; we will see why later

32768

little endian and big endian

Notice: because digit is the smallest unit at the CPython abstract level, the order between bytes inside a single ob_digit is the same as your machine order (mine is little endian).

The order between digits in the ob_digit array is represented in most-significant-digit-in-the-rightmost order (little endian order).

We can have a better understanding with the integer value -262143.

The minus sign is stored in the ob_size field.

The integer 262143 (2^18 = 262144) in binary representation is 00000011 11111111 11111111

262143

reserved bit

Why is the leftmost bit in digit reserved? Why is the order between digits in the ob_digit array represented as little-endian?

Let's try to add two integer values

i = 1073741824 - 1 # 1 << 30 == 1073741824
j = 1

before_add

k = i + j

First, initialize a temporary PyLongObject with size = max(size(i), size(j)) + 1

temp_add

Step 1: sum the first digit in each ob_digit array into a variable named carry

step_1

Step 2: set the value in temp[0] to (carry & PyLong_MASK)

step_2

Step 3: right shift the carry up to the leftmost bit

step_3_rshift

Step 4: add the second digit in each ob_digit array to the result of carry

step_4

Step 5: set the value in temp[1] to (carry & PyLong_MASK)

step_4

Step 6: right shift the carry again

step_3_rshift

Go to step 4 and repeat until no more digit is left. Set the final carry to the last index of temp

step_final

The variable temp contains the sum. Now you see the reserved bit is used for the carry or borrow when you add/subtract an integer. The digits in the ob_digit array are stored in little-endian order so that the add/subtract operation can process each digit from left to right.

The subtraction operation is similar to the addition operation, so you can read the source code directly

k

small ints

CPython also uses a buffer pool to store frequently used integers

#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

let's see

c = 0
d = 0
e = 0
print(id(c), id(d), id(e)) # 4480940400 4480940400 4480940400
a = -5
b = -5
print(id(a), id(b)) # 4480940240 4480940240
f = 1313131313131313
g = 1313131313131313
print(id(f), id(g)) # 4484604176 4484604016

small_ints