Thanks @MambaWong for pointing out the errors #22 of this article
- cpython/Objects/longobject.c
- cpython/Include/longobject.h
- cpython/Include/longintrepr.h
After Python 3, there's only a type named int. The long type in Python 2.x is the int type in Python 3.x.
The structure of long object looks like the structure of tuple object. Obviously, there's only one field to store the real int value: ob_digit.
But how does CPython represent the variable-size int at the byte level? Let's see
Notice: when the value is 0, the ob_digit field doesn't store anything. The value 0 in ob_size indicates that the long object represents integer 0
i = 0There are two different types of ob_digit depending on your system.
#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
typedef int64_t stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT 30
#define _PyLong_DECIMAL_SHIFT 9 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE ((digit)1000000000) /* 10 ** DECIMAL_SHIFT */
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit; /* signed variant of digit */
typedef unsigned long twodigits;
typedef long stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT 15
#define _PyLong_DECIMAL_SHIFT 4 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE ((digit)10000) /* 10 ** DECIMAL_SHIFT */
I've modified the source code to change the value of PYLONG_BITS_IN_DIGIT to 15
But when it represents integer 1, ob_size becomes 1 and the field in ob_digit represents the value 1 with type unsigned short
i = 1When i becomes -1, the only difference from integer 1 is the value in the ob_size field. CPython interprets ob_size as a signed type to differentiate between positive and negative signs
i = -1The basic unit is type digit, which provides 2 bytes (16 bits) for storage. Since 1023 takes the rightmost 10 bits, the value of the ob_size field is still 1.
The integer 32767 is represented in the same way as usual
CPython doesn't use all the 16 bits in the digit field. The first bit of every digit is reserved; we will see why later
Notice: because digit is the smallest unit at the CPython abstract level, the order between bytes inside a single ob_digit is the same as your machine order (mine is little endian).
The order between digits in the ob_digit array is represented in most-significant-digit-in-the-rightmost order (little endian order).
We can have a better understanding with the integer value -262143.
The minus sign is stored in the ob_size field.
The integer 262143 (2^18 = 262144) in binary representation is 00000011 11111111 11111111
Why is the leftmost bit in digit reserved? Why is the order between digits in the ob_digit array represented as little-endian?
Let's try to add two integer values
i = 1073741824 - 1 # 1 << 30 == 1073741824
j = 1k = i + jFirst, initialize a temporary PyLongObject with size = max(size(i), size(j)) + 1
Step 1: sum the first digit in each ob_digit array into a variable named carry
Step 2: set the value in temp[0] to (carry & PyLong_MASK)
Step 3: right shift the carry up to the leftmost bit
Step 4: add the second digit in each ob_digit array to the result of carry
Step 5: set the value in temp[1] to (carry & PyLong_MASK)
Step 6: right shift the carry again
Go to step 4 and repeat until no more digit is left. Set the final carry to the last index of temp
The variable temp contains the sum. Now you see the reserved bit is used for the carry or borrow when you add/subtract an integer. The digits in the ob_digit array are stored in little-endian order so that the add/subtract operation can process each digit from left to right.
The subtraction operation is similar to the addition operation, so you can read the source code directly
CPython also uses a buffer pool to store frequently used integers
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];let's see
c = 0
d = 0
e = 0
print(id(c), id(d), id(e)) # 4480940400 4480940400 4480940400
a = -5
b = -5
print(id(a), id(b)) # 4480940240 4480940240
f = 1313131313131313
g = 1313131313131313
print(id(f), id(g)) # 4484604176 4484604016
















