Python-100-Days-English/Day31-35/31.python-advanced.md at main · tamnd/Python-100-Days-English

Advanced Python Topics

Important Knowledge Points

How to use comprehensions

prices = {
    'AAPL': 191.88,
    'GOOG': 1186.96,
    'IBM': 149.24,
    'ORCL': 48.44,
    'ACN': 166.89,
    'FB': 208.09,
    'SYMC': 21.29
}
# Build a new dictionary with stocks whose price is greater than 100
prices2 = {key: value for key, value in prices.items() if value > 100}
print(prices2)

Note: Comprehensions can be used to make lists, sets, and dictionaries.

A pitfall of nested lists

names = ['Guan Yu', 'Zhang Fei', 'Zhao Yun', 'Ma Chao', 'Huang Zhong']
courses = ['Chinese', 'Math', 'English']
# Enter the scores of five students in three courses
# Wrong - see http://pythontutor.com/visualize.html#mode=edit
# scores = [[None] * len(courses)] * len(names)
scores = [[None] * len(courses) for _ in range(len(names))]
for row, name in enumerate(names):
    for col, course in enumerate(courses):
        scores[row][col] = float(input(f'Please enter {name}\'s {course} score: '))
        print(scores)

Python Tutor - VISUALIZE CODE AND GET LIVE HELP

The heapq module (heap sort)

"""
Find the largest or smallest N elements in a list
Heap structure (max heap / min heap)
"""
import heapq

list1 = [34, 25, 12, 99, 87, 63, 58, 78, 88, 92]
list2 = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]
print(heapq.nlargest(3, list1))
print(heapq.nsmallest(3, list1))
print(heapq.nlargest(2, list2, key=lambda x: x['price']))
print(heapq.nlargest(2, list2, key=lambda x: x['shares']))

The itertools module

"""
Iteration tools module
"""
import itertools

# Produce all permutations of ABCD
itertools.permutations('ABCD')
# Produce 3-element combinations from ABCDE
itertools.combinations('ABCDE', 3)
# Produce the Cartesian product of ABCD and 123
itertools.product('ABCD', '123')
# Produce an infinite cycle of ABC
itertools.cycle(('A', 'B', 'C'))

The collections module

Common utility classes:
- namedtuple: a named tuple. It is a class factory. It accepts a type name and an attribute list to create a class.
- deque: a double-ended queue. It is another way to implement a list. In Python, a list is based on an array, but deque is based on a doubly linked list. So when you need to add or delete elements at both ends, deque has better performance, and its asymptotic time complexity is O(1).
- Counter: a subclass of dict. The key is the element, and the value is the count of that element. Its most_common() method can help us get the elements that appear most often. I think the inheritance relationship between Counter and dict is open to discussion. According to the CARP principle, it would be more reasonable to design the relationship between Counter and dict as an association relationship.
- OrderedDict: a subclass of dict. It records the order in which key-value pairs are inserted. It looks like it has the behavior of both a dictionary and a linked list.
- defaultdict: similar to a dictionary type, but it can get a default value for a key through a factory function. Compared with the setdefault() method of a dictionary, this is more efficient.
```
"""
Find the elements that appear most often in a sequence
"""
from collections import Counter

words = [
    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around',
    'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes',
    'look', 'into', 'my', 'eyes', "you're", 'under'
]
counter = Counter(words)
print(counter.most_common(3))
```

Data Structures and Algorithms

Algorithm: the method and steps used to solve a problem.
To judge whether an algorithm is good or bad, we usually look at asymptotic time complexity and asymptotic space complexity.
Big-O notation for asymptotic time complexity:
- O(c) - constant time complexity - Bloom filters / hash storage
- O(log_2n) - logarithmic time complexity - binary search
- O(n) - linear time complexity - sequential search / counting sort
- O(n * log_2n) - log-linear time complexity - advanced sorting algorithms (merge sort, quicksort)
- O(n^2) - square time complexity - simple sorting algorithms (selection sort, insertion sort, bubble sort)
- O(n^3) - cubic time complexity - Floyd's algorithm / matrix multiplication
- O(2^n) - geometric-series time complexity - Tower of Hanoi
- O(n!) - factorial time complexity - traveling salesman problem / NPC

Sorting algorithms (selection sort, bubble sort, merge sort) and searching algorithms (sequential search and binary search)

def select_sort(items, comp=lambda x, y: x < y):
    """Simple selection sort"""
    items = items[:]
    for i in range(len(items) - 1):
        min_index = i
        for j in range(i + 1, len(items)):
            if comp(items[j], items[min_index]):
                min_index = j
        items[i], items[min_index] = items[min_index], items[i]
    return items

def bubble_sort(items, comp=lambda x, y: x > y):
    """Bubble sort"""
    items = items[:]
    for i in range(len(items) - 1):
        swapped = False
        for j in range(len(items) - 1 - i):
            if comp(items[j], items[j + 1]):
                items[j], items[j + 1] = items[j + 1], items[j]
                swapped = True
        if not swapped:
            break
    return items

def bubble_sort(items, comp=lambda x, y: x > y):
    """Cocktail sort (an upgraded version of bubble sort)"""
    items = items[:]
    for i in range(len(items) - 1):
        swapped = False
        for j in range(len(items) - 1 - i):
            if comp(items[j], items[j + 1]):
                items[j], items[j + 1] = items[j + 1], items[j]
                swapped = True
        if swapped:
            swapped = False
            for j in range(len(items) - 2 - i, i, -1):
                if comp(items[j - 1], items[j]):
                    items[j], items[j - 1] = items[j - 1], items[j]
                    swapped = True
        if not swapped:
            break
    return items

def merge(items1, items2, comp=lambda x, y: x < y):
    """Merge (merge two sorted lists into one sorted list)"""
    items = []
    index1, index2 = 0, 0
    while index1 < len(items1) and index2 < len(items2):
        if comp(items1[index1], items2[index2]):
            items.append(items1[index1])
            index1 += 1
        else:
            items.append(items2[index2])
            index2 += 1
    items += items1[index1:]
    items += items2[index2:]
    return items


def merge_sort(items, comp=lambda x, y: x < y):
    return _merge_sort(list(items), comp)


def _merge_sort(items, comp):
    """Merge sort"""
    if len(items) < 2:
        return items
    mid = len(items) // 2
    left = _merge_sort(items[:mid], comp)
    right = _merge_sort(items[mid:], comp)
    return merge(left, right, comp)

def seq_search(items, key):
    """Sequential search"""
    for index, item in enumerate(items):
        if item == key:
            return index
    return -1

def bin_search(items, key):
    """Binary search"""
    start, end = 0, len(items) - 1
    while start <= end:
        mid = (start + end) // 2
        if key > items[mid]:
            start = mid + 1
        elif key < items[mid]:
            end = mid - 1
        else:
            return mid
    return -1

Common algorithms:

Exhaustive method, also called brute-force cracking. Check all possibilities until the correct answer is found.
Greedy method. When solving a problem, always make what seems to be the best choice at the current moment. It does not try to find the best solution, but quickly finds a satisfactory solution.
Divide-and-conquer method. Break a complex problem into two or more same or similar subproblems, then keep breaking the subproblems into smaller subproblems until they can be solved directly. At last, combine the solutions of the subproblems to get the solution of the original problem.
Backtracking method. The backtracking method is also called the trial method. Search forward according to the better-choice condition. When the search reaches some step and finds that the original choice is not good enough or cannot reach the goal, go back one step and choose again.
Dynamic programming. The basic idea is also to break the problem to be solved into several subproblems, solve these subproblems first and save their solutions, so that a large amount of repeated computation can be avoided.

Example of the exhaustive method: the hundred-money hundred-chickens problem and the five people dividing fish problem.

# A rooster costs 5 yuan, a hen costs 3 yuan, and 3 chicks cost 1 yuan
# Use 100 yuan to buy 100 chickens, how many roosters / hens / chicks are there
for x in range(20):
    for y in range(33):
        z = 100 - x - y
        if 5 * x + 3 * y + z // 3 == 100 and z % 3 == 0:
            print(x, y, z)

# Five people A, B, C, D, and E caught fish together one night and then fell asleep
# The next morning, A woke up first. He divided the fish into 5 parts, threw away 1 extra fish,
# and took his own part. B woke up second and also divided the fish in the same way.
# Then C, D, and E woke up one after another and divided the fish in the same way.
# Ask how many fish they caught at least
fish = 6
while True:
    total = fish
    enough = True
    for _ in range(5):
        if (total - 1) % 5 == 0:
            total = (total - 1) // 5 * 4
        else:
            enough = False
            break
    if enough:
        print(fish)
        break
    fish += 5

Example of the greedy method: suppose a thief has a backpack that can hold at most 20 kilograms of stolen goods. He breaks into a house and finds the items shown in the table below. Clearly, he cannot put all the items into the backpack, so he must decide which items to take away and which items to leave.

Name	Price (USD)	Weight (kg)
Computer	200	20
Radio	20	4
Clock	175	10
Vase	50	2
Book	10	1
Oil Painting	90	9

"""
Greedy method: when solving a problem, always make what seems to be the best
choice at the current moment. It does not try to find the best solution, but
quickly finds a satisfactory solution.
Input:
20 6
Computer 200 20
Radio 20 4
Clock 175 10
Vase 50 2
Book 10 1
OilPainting 90 9
"""
class Thing(object):
    """Item"""

    def __init__(self, name, price, weight):
        self.name = name
        self.price = price
        self.weight = weight

    @property
    def value(self):
        """Price-weight ratio"""
        return self.price / self.weight


def input_thing():
    """Input item information"""
    name_str, price_str, weight_str = input().split()
    return name_str, int(price_str), int(weight_str)


def main():
    """Main function"""
    max_weight, num_of_things = map(int, input().split())
    all_things = []
    for _ in range(num_of_things):
        all_things.append(Thing(*input_thing()))
    all_things.sort(key=lambda x: x.value, reverse=True)
    total_weight = 0
    total_price = 0
    for thing in all_things:
        if total_weight + thing.weight <= max_weight:
            print(f'The thief took {thing.name}')
            total_weight += thing.weight
            total_price += thing.price
    print(f'Total value: {total_price} dollars')


if __name__ == '__main__':
    main()

Example of the divide-and-conquer method: quicksort.

"""
Quicksort - choose a pivot to divide the elements. The left side is all
smaller than the pivot, and the right side is all larger than the pivot
"""
def quick_sort(items, comp=lambda x, y: x <= y):
    items = list(items)[:]
    _quick_sort(items, 0, len(items) - 1, comp)
    return items


def _quick_sort(items, start, end, comp):
    if start < end:
        pos = _partition(items, start, end, comp)
        _quick_sort(items, start, pos - 1, comp)
        _quick_sort(items, pos + 1, end, comp)


def _partition(items, start, end, comp):
    pivot = items[end]
    i = start - 1
    for j in range(start, end):
        if comp(items[j], pivot):
            i += 1
            items[i], items[j] = items[j], items[i]
    items[i + 1], items[end] = items[end], items[i + 1]
    return i + 1

Example of the backtracking method: Knight's tour.

"""
Recursive backtracking: also called the trial method. Search forward according
to the better-choice condition. When the search reaches some step and finds
that the original choice is not good enough or cannot reach the goal, then go
back one step and choose again. Classic problems include knight's tour, eight
queens, and maze path finding.
"""
import sys
import time

SIZE = 5
total = 0


def print_board(board):
    for row in board:
        for col in row:
            print(str(col).center(4), end='')
        print()


def patrol(board, row, col, step=1):
    if row >= 0 and row < SIZE and \
        col >= 0 and col < SIZE and \
        board[row][col] == 0:
        board[row][col] = step
        if step == SIZE * SIZE:
            global total
            total += 1
            print(f'The {total}th way: ')
            print_board(board)
        patrol(board, row - 2, col - 1, step + 1)
        patrol(board, row - 1, col - 2, step + 1)
        patrol(board, row + 1, col - 2, step + 1)
        patrol(board, row + 2, col - 1, step + 1)
        patrol(board, row + 2, col + 1, step + 1)
        patrol(board, row + 1, col + 2, step + 1)
        patrol(board, row - 1, col + 2, step + 1)
        patrol(board, row - 2, col + 1, step + 1)
        board[row][col] = 0


def main():
    board = [[0] * SIZE for _ in range(SIZE)]
    patrol(board, SIZE - 1, SIZE - 1)


if __name__ == '__main__':
    main()

Example of dynamic programming: the maximum value of the sum of elements in a sublist.

Note: A sublist means a list made up of elements with continuous indexes in the list. The elements in the list are of type int, and they may include positive integers, 0, and negative integers. The program inputs the elements in the list and outputs the maximum value of the sum of elements in a sublist. For example:

Input: 1 -2 3 5 -3 2

Output: 8

Input: 0 -2 3 5 -1 2

Output: 9

Input: -9 -2 -3 -5 -3

Output: -2

def main():
    items = list(map(int, input().split()))
    overall = partial = items[0]
    for i in range(1, len(items)):
        partial = max(items[i], partial + items[i])
        overall = max(partial, overall)
    print(overall)


if __name__ == '__main__':
    main()

Note: The easiest solution to think of for this problem is to use two nested loops, but the time performance of the code becomes very bad. By using the idea of dynamic programming and only two more variables, the original problem with O(N^2) complexity becomes O(N).

Ways to Use Functions

Treat functions as "first-class citizens"
- Functions can be assigned to variables
- Functions can be used as parameters of functions
- Functions can be used as return values of functions

How to use higher-order functions (filter, map, and their replacements)

items1 = list(map(lambda x: x ** 2, filter(lambda x: x % 2, range(1, 10))))
items2 = [x ** 2 for x in range(1, 10) if x % 2]

Positional parameters, variable parameters, keyword parameters, named keyword parameters
Metadata of parameters (a code readability issue)
How to use anonymous functions and inline functions (lambda functions)
Closures and the scope problem
- Python's LEGB order for searching variables (Local >>> Embedded >>> Global >>> Built-in)
- The roles of the global and nonlocal keywords
  
  global: declare or define a global variable (either directly use an existing variable in the global scope, or define a variable and put it into the global scope).
  
  nonlocal: declare that you are using a variable in a nested scope (the variable must exist in the nested scope, otherwise an error is raised).

Decorator functions (using decorators and removing decorators)

Example: a decorator that outputs the execution time of a function.

from functools import wraps
from time import time


def record_time(func):
    """Custom decorator for decorating a function"""

    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time()
        result = func(*args, **kwargs)
        print(f'{func.__name__}: {time() - start} seconds')
        return result

    return wrapper

If the decorator does not want to be coupled with the print function, we can write a parameterized decorator.

from functools import wraps
from time import time


def record(output):
    """Parameterized decorator"""

    def decorate(func):

        @wraps(func)
        def wrapper(*args, **kwargs):
            start = time()
            result = func(*args, **kwargs)
            output(func.__name__, time() - start)
            return result

        return wrapper

    return decorate

from functools import wraps
from time import time


class Record():
    """Define a decorator by defining a class"""

    def __init__(self, output):
        self.output = output

    def __call__(self, func):

        @wraps(func)
        def wrapper(*args, **kwargs):
            start = time()
            result = func(*args, **kwargs)
            self.output(func.__name__, time() - start)
            return result

        return wrapper

Note: Because the function with decorator function is decorated with @wraps, we can get the function or class before decoration by func.__wrapped__, and then remove the effect of the decorator.

Example: use a decorator to implement the singleton pattern.

from functools import wraps


def singleton(cls):
    """Decorator for decorating a class"""
    instances = {}

    @wraps(cls)
    def wrapper(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]

    return wrapper


@singleton
class President:
    """President, singleton class"""
    pass

Tip: The code above uses a closure. I do not know whether you have noticed it. There is also one small problem. The code above does not implement a thread-safe singleton. How should it be done if we want to implement a thread-safe singleton?

Thread-safe singleton decorator.

from functools import wraps
from threading import RLock


def singleton(cls):
    """Thread-safe singleton decorator"""
    instances = {}
    locker = RLock()

    @wraps(cls)
    def wrapper(*args, **kwargs):
        if cls not in instances:
            with locker:
                if cls not in instances:
                    instances[cls] = cls(*args, **kwargs)
        return instances[cls]

    return wrapper

Tip: The code above uses with context syntax to do the lock operation, because the lock object itself is a context manager object, it supports the magic methods __enter__ and __exit__. In the wrapper function, we first do one check without lock, and then do one check with lock. This does better in performance than checking directly with lock. If the object has already been created, there is no need to lock again. We can directly return the object.

Object-Oriented Knowledge

The three pillars: encapsulation, inheritance, polymorphism

Example: salary settlement system.

"""
Monthly salary settlement system:
department manager 15000 every month
programmer 200 per hour
salesperson 1800 base salary plus 5% commission of sales
"""
from abc import ABCMeta, abstractmethod


class Employee(metaclass=ABCMeta):
    """Employee, abstract class"""

    def __init__(self, name):
        self.name = name

    @abstractmethod
    def get_salary(self):
        """Settle monthly salary, abstract method"""
        pass


class Manager(Employee):
    """Department manager"""

    def get_salary(self):
        return 15000.0


class Programmer(Employee):
    """Programmer"""

    def __init__(self, name, working_hour=0):
        self.working_hour = working_hour
        super().__init__(name)

    def get_salary(self):
        return 200.0 * self.working_hour


class Salesman(Employee):
    """Salesperson"""

    def __init__(self, name, sales=0.0):
        self.sales = sales
        super().__init__(name)

    def get_salary(self):
        return 1800.0 + self.sales * 0.05


class EmployeeFactory:
    """Factory for creating employees.
    Factory pattern: use a factory to decouple the object user and the object.
    """

    @staticmethod
    def create(emp_type, *args, **kwargs):
        """Create employee"""
        all_emp_types = {'M': Manager, 'P': Programmer, 'S': Salesman}
        cls = all_emp_types[emp_type.upper()]
        return cls(*args, **kwargs) if cls else None


def main():
    """Main function"""
    emps = [
        EmployeeFactory.create('M', 'Cao Cao'),
        EmployeeFactory.create('P', 'Xun Yu', 120),
        EmployeeFactory.create('P', 'Guo Jia', 85),
        EmployeeFactory.create('S', 'Dian Wei', 123000),
    ]
    for emp in emps:
        print(f'{emp.name}: {emp.get_salary():.2f} yuan')


if __name__ == '__main__':
    main()

The relationship between classes and classes

is-a relationship: inheritance
has-a relationship: association / aggregation / composition
use-a relationship: dependency

Example: poker game.

"""
Experience: symbolic constants are always better than literal constants.
Enumeration type is the best choice to define symbolic constants.
"""
from enum import Enum, unique

import random


@unique
class Suite(Enum):
    """Suit"""

    SPADE, HEART, CLUB, DIAMOND = range(4)

    def __lt__(self, other):
        return self.value < other.value


class Card:
    """Card"""

    def __init__(self, suite, face):
        """Initialization method"""
        self.suite = suite
        self.face = face

    def show(self):
        """Show the face of the card"""
        suites = ['♠︎', '♥︎', '♣︎', '♦︎']
        faces = ['', 'A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']
        return f'{suites[self.suite.value]}{faces[self.face]}'

    def __repr__(self):
        return self.show()


class Poker:
    """Poker"""

    def __init__(self):
        self.index = 0
        self.cards = [Card(suite, face)
                      for suite in Suite
                      for face in range(1, 14)]

    def shuffle(self):
        """Shuffle cards, random order"""
        random.shuffle(self.cards)
        self.index = 0

    def deal(self):
        """Deal cards"""
        card = self.cards[self.index]
        self.index += 1
        return card

    @property
    def has_more(self):
        return self.index < len(self.cards)


class Player:
    """Player"""

    def __init__(self, name):
        self.name = name
        self.cards = []

    def get_one(self, card):
        """Get one card"""
        self.cards.append(card)

    def sort(self, comp=lambda card: (card.suite, card.face)):
        """Arrange the cards in hand"""
        self.cards.sort(key=comp)


def main():
    """Main function"""
    poker = Poker()
    poker.shuffle()
    players = [Player('East Heretic'), Player('West Poison'),
               Player('South Emperor'), Player('North Beggar')]
    while poker.has_more:
        for player in players:
            player.get_one(poker.deal())
    for player in players:
        player.sort()
        print(player.name, end=': ')
        print(player.cards)


if __name__ == '__main__':
    main()

Note: The code above uses Emoji characters to represent the four suits of poker cards. On some systems that do not support Emoji characters, they may not display.

Object copying, deep copy / deep clone and shallow copy / shadow clone
Garbage collection, circular reference, and weak reference

Python uses automatic memory management. This mechanism is based on reference counting, and at the same time introduces mark-sweep and generational collection as supporting strategies.
```
typedef struct _object {
    /* reference count */
    int ob_refcnt;
    /* object pointer */
    struct _typeobject *ob_type;
} PyObject;
```
```
/* macro to increase reference count */
#define Py_INCREF(op)   ((op)->ob_refcnt++)
/* macro to decrease reference count */
#define Py_DECREF(op) \
    if (--(op)->ob_refcnt != 0) \
        ; \
    else \
        __Py_Dealloc((PyObject *)(op))
```
Situations that cause reference count +1:
- The object is created, for example a = 23
- The object is referenced, for example b = a
- The object is passed as a parameter into a function, for example f(a)
- The object is stored in a container as an element, for example list1 = [a, a]
Situations that cause reference count -1:
- The alias of the object is explicitly destroyed, for example del a
- The alias of the object is assigned a new object, for example a = 24
- An object leaves its scope, for example when function f finishes execution, local variables in f will, global variables will not
- The container containing the object is destroyed, or the object is deleted from the container
Reference counting may cause the problem of circular reference, and circular reference will cause memory leak, as shown in the code below. To solve this problem, Python introduces mark-sweep and generational collection. When an object is created, it is put in generation 1. If the object survives the garbage check in generation 1, then the object is put into generation 2. In the same way, if the object survives the garbage check in generation 2, then the object is put into generation 3.
```
# Circular reference causes memory leak.
# Besides reference counting, Python also introduces mark cleanup and generational collection.
# Before Python 3.6, if __del__ magic method was overridden, circular reference handling failed.
# If you do not want to cause circular reference, you can use weak reference.
list1 = []
list2 = []
list1.append(list2)
list2.append(list1)
```
The following situations will cause garbage collection:
- Call gc.collect()
- The counter of the gc module reaches the threshold
- The program exits
If two objects in a circular reference both define the __del__ method, the gc module will not destroy these unreachable objects, because the gc module does not know which object's __del__ method should be called first. This problem was solved in Python 3.6.

We can also solve the circular reference problem by creating weak references through the weakref module.
Magic attributes and methods, please refer to A Guide to Python Magic Methods

There are a few small questions for everyone to think about:
- Can custom objects use operators to do operations?
- Can custom objects be put into set? Can they remove duplicates?
- Can custom objects be used as keys of dict?
- Can custom objects use context syntax?

Mixin

Example: custom dictionary limits that key-value pairs can be set only when the specified key does not exist.

class SetOnceMappingMixin:
    """Custom mixin class"""
    __slots__ = ()

    def __setitem__(self, key, value):
        if key in self:
            raise KeyError(str(key) + ' already set')
        return super().__setitem__(key, value)


class SetOnceDict(SetOnceMappingMixin, dict):
    """Custom dictionary"""
    pass


my_dict = SetOnceDict()
try:
    my_dict['username'] = 'jackfrued'
    my_dict['username'] = 'hellokitty'
except KeyError:
    pass
print(my_dict)

Metaprogramming and metaclass

Objects are created through classes. Classes are created through metaclasses. Metaclasses provide the meta information for creating classes. All classes directly or indirectly inherit from object. All metaclasses directly or indirectly inherit from type.

Example: use metaclass to implement singleton pattern.

import threading


class SingletonMeta(type):
    """Custom metaclass"""

    def __init__(cls, *args, **kwargs):
        cls.__instance = None
        cls.__lock = threading.RLock()
        super().__init__(*args, **kwargs)

    def __call__(cls, *args, **kwargs):
        if cls.__instance is None:
            with cls.__lock:
                if cls.__instance is None:
                    cls.__instance = super().__call__(*args, **kwargs)
        return cls.__instance


class President(metaclass=SingletonMeta):
    """President, singleton class"""
    pass

Object-oriented design principles
- Single responsibility principle, SRP: a class only does what it should do, the design of the class should have high cohesion
- Open-closed principle, OCP: software entities should be open for extension and closed for modification
- Dependency inversion principle, DIP: programming facing abstraction, this has already been weakened in weakly typed languages
- Liskov substitution principle, LSP: at any time, a subclass object can replace a parent class object
- Interface segregation principle, ISP: interfaces should be small and specialized, not big and complete. In Python there is no concept of interface
- Composition aggregate reuse principle, CARP: prefer using strong association relationship instead of inheritance relationship to reuse code
- Least knowledge principle, Law of Demeter, LoD: do not send messages to objects that have no necessary connection
Note: The bold letters above together are called the SOLID principles in object-oriented programming.

GoF design patterns

Creational patterns: singleton, factory, builder, prototype
Structural patterns: adapter, facade, proxy
Behavioral patterns: iterator, observer, state, strategy

Example: pluggable hash algorithm, strategy pattern.

class StreamHasher:
    """Hash digest generator"""

    def __init__(self, alg='md5', size=4096):
        self.size = size
        alg = alg.lower()
        self.hasher = getattr(__import__('hashlib'), alg.lower())()

    def __call__(self, stream):
        return self.to_digest(stream)

    def to_digest(self, stream):
        """Generate digest in hexadecimal form"""
        for buf in iter(lambda: stream.read(self.size), b''):
            self.hasher.update(buf)
        return self.hasher.hexdigest()


def main():
    """Main function"""
    hasher1 = StreamHasher()
    with open('Python-3.7.6.tgz', 'rb') as stream:
        print(hasher1.to_digest(stream))
    hasher2 = StreamHasher('sha1')
    with open('Python-3.7.6.tgz', 'rb') as stream:
        print(hasher2(stream))


if __name__ == '__main__':
    main()

Iterators and Generators

An iterator is an object that implements the iterator protocol.

In Python there is no keyword like protocol or interface to define a protocol.
In Python, magic methods are used to represent protocols.
The magic methods __iter__ and __next__ are the iterator protocol.

class Fib(object):
    """Iterator"""

    def __init__(self, num):
        self.num = num
        self.a, self.b = 0, 1
        self.idx = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.idx < self.num:
            self.a, self.b = self.b, self.a + self.b
            self.idx += 1
            return self.a
        raise StopIteration()

A generator is an iterator in a syntax-simplified version.

def fib(num):
    """Generator"""
    a, b = 0, 1
    for _ in range(num):
        a, b = b, a + b
        yield a

Generators evolve into coroutines.

A generator object can use the send() method to send data, and the sent data becomes the value obtained by the yield expression in the generator function. In this way, generators can be used as coroutines. Simply speaking, coroutines are subprograms that can cooperate with each other.

def calc_avg():
    """Calculate average value in a stream way"""
    total, counter = 0, 0
    avg_value = None
    while True:
        value = yield avg_value
        total, counter = total + value, counter + 1
        avg_value = total / counter


gen = calc_avg()
next(gen)
print(gen.send(10))
print(gen.send(20))
print(gen.send(30))

Concurrent Programming

Python has three solutions for concurrent programming: multi-threading, multi-process, and asynchronous I/O. The good side of concurrent programming is that it can improve the execution efficiency of the program and improve user experience. The bad side is that concurrent programs are not easy to develop and debug, and to other programs they are not friendly either.

Multi-threading: Python provides the Thread class, together with Lock, Condition, Event, Semaphore, and Barrier. Python has GIL to prevent multiple threads from executing local bytecode at the same time. This lock is necessary for CPython, because the memory management of CPython is not thread-safe. Because of the existence of GIL, multi-threading cannot use the multi-core features of CPU.

"""
Interview question: what are the difference and relationship between process and thread?
Process: the basic unit for the operating system to allocate memory.
One process can contain one or more threads.
Thread: the basic unit for the operating system to allocate CPU.
Concurrent programming
1. Improve execution performance: let parts in the program that have no causal relationship execute concurrently.
2. Improve user experience: let time-consuming operations not cause fake death of the program.
"""
import glob
import os
import threading

from PIL import Image

PREFIX = 'thumbnails'


def generate_thumbnail(infile, size, format='PNG'):
    """Generate thumbnail of specified image file"""
    file, ext = os.path.splitext(infile)
    file = file[file.rfind('/') + 1:]
    outfile = f'{PREFIX}/{file}_{size[0]}_{size[1]}.{ext}'
    img = Image.open(infile)
    img.thumbnail(size, Image.ANTIALIAS)
    img.save(outfile, format)


def main():
    """Main function"""
    if not os.path.exists(PREFIX):
        os.mkdir(PREFIX)
    for infile in glob.glob('images/*.png'):
        for size in (32, 64, 128):
            # Create and start thread
            threading.Thread(
                target=generate_thumbnail,
                args=(infile, (size, size))
            ).start()


if __name__ == '__main__':
    main()

The situation where multiple threads compete for resources.

"""
Multi-threaded programs are usually easier if there is no competition for resources.
When multiple threads compete for critical resources,
if there is no necessary protection measure, data disorder will happen.
Note: critical resource means the resource competed for by multiple threads.
"""
import time
import threading

from concurrent.futures import ThreadPoolExecutor


class Account(object):
    """Bank account"""

    def __init__(self):
        self.balance = 0.0
        self.lock = threading.Lock()

    def deposit(self, money):
        # Use a lock to protect critical resources
        with self.lock:
            new_balance = self.balance + money
            time.sleep(0.001)
            self.balance = new_balance


def main():
    """Main function"""
    account = Account()
    # Create thread pool
    pool = ThreadPoolExecutor(max_workers=10)
    futures = []
    for _ in range(100):
        future = pool.submit(account.deposit, 1)
        futures.append(future)
    # Close thread pool
    pool.shutdown()
    for future in futures:
        future.result()
    print(account.balance)


if __name__ == '__main__':
    main()

Modify the program above. Start 5 threads to deposit money into the account, and 5 threads to take money from the account. When taking money, if the balance is not enough, then pause the thread and wait. To reach the goal above, we need to schedule the deposit and withdraw threads. When the balance is not enough, the withdraw thread pauses and releases the lock, and after the deposit thread puts money in, it should notify the withdraw thread, so that it wakes up from the paused state. We can use Condition in the threading module to implement thread scheduling. This object is also created based on a lock. The code is shown below:

"""
Multiple threads compete for one resource: protect critical resource, use lock, Lock / RLock
Multiple threads compete for multiple resources, thread count > resource count: semaphore, Semaphore
Scheduling of multiple threads: pause thread execution / wake up waiting thread, Condition
"""
from concurrent.futures import ThreadPoolExecutor
from random import randint
from time import sleep

import threading


class Account:
    """Bank account"""

    def __init__(self, balance=0):
        self.balance = balance
        lock = threading.RLock()
        self.condition = threading.Condition(lock)

    def withdraw(self, money):
        """Withdraw money"""
        with self.condition:
            while money > self.balance:
                self.condition.wait()
            new_balance = self.balance - money
            sleep(0.001)
            self.balance = new_balance

    def deposit(self, money):
        """Deposit money"""
        with self.condition:
            new_balance = self.balance + money
            sleep(0.001)
            self.balance = new_balance
            self.condition.notify_all()


def add_money(account):
    while True:
        money = randint(5, 10)
        account.deposit(money)
        print(threading.current_thread().name,
              ':', money, '====>', account.balance)
        sleep(0.5)


def sub_money(account):
    while True:
        money = randint(10, 30)
        account.withdraw(money)
        print(threading.current_thread().name,
              ':', money, '<====', account.balance)
        sleep(1)


def main():
    account = Account()
    with ThreadPoolExecutor(max_workers=15) as pool:
        for _ in range(5):
            pool.submit(add_money, account)
        for _ in range(10):
            pool.submit(sub_money, account)


if __name__ == '__main__':
    main()

Multi-process: multi-process can effectively solve the problem of GIL. The main class for implementing multi-process is Process. Other helper classes are similar to those in the threading module. For sharing data between processes, pipes and sockets can be used. In the multiprocessing module there is a Queue class. It provides a queue shared by multiple processes based on pipe and lock mechanism. Below is one example from the official documentation about multi-process and process pool.

"""
Use of multi-process and process pool.
Because of GIL, multi-threading cannot use the multi-core features of CPU.
For compute-intensive tasks, we should consider using multi-process.
time python3 example22.py
real    0m11.512s
user    0m39.319s
sys     0m0.169s
After using multi-process, the actual execution time is 11.512 seconds,
while the user time 39.319 seconds is about 4 times the actual execution time.
This proves that our program used the multi-core features of CPU through multi-process,
and this computer has a 4-core CPU.
"""
import concurrent.futures
import math

PRIMES = [
    1116281,
    1297337,
    104395303,
    472882027,
    533000389,
    817504243,
    982451653,
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419
] * 5


def is_prime(n):
    """Judge prime number"""
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True


def main():
    """Main function"""
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))


if __name__ == '__main__':
    main()

Key point: Comparison between multi-threading and multi-process.

The following situations need multi-threading:

The program needs to maintain many shared states, especially mutable states. Lists, dictionaries, and sets in Python are all thread-safe, so using threads instead of processes to maintain shared state has relatively small cost.

The program will spend a lot of time on I/O operations, and does not have much need for parallel computing, and does not need to occupy too much memory.

The following situations need multi-process:

The program executes compute-intensive tasks, for example bytecode operations, data processing, scientific computing.

The input of the program can be split into blocks in parallel, and the operation results can be merged.

The program has no limit at all in memory use, and does not strongly depend on I/O operations, for example reading and writing files and sockets.

Asynchronous processing: choose tasks from the task queue of the scheduler. The scheduler executes these tasks in an interleaving way. We cannot guarantee the tasks will execute in some specific order, because the execution order depends on whether one task in the queue is willing to give CPU processing time to another task. Asynchronous tasks are usually implemented through multi-task cooperative processing. Because the execution time and order are uncertain, we need callback style programming or a future object to get the execution result of the task. Python 3 supports asynchronous processing through the asyncio module and the await and async keywords, they officially became keywords in Python 3.7.
```
"""
Asynchronous I/O, async / await
"""
import asyncio


def num_generator(m, n):
    """Number generator in specified range"""
    yield from range(m, n + 1)


async def prime_filter(m, n):
    """Prime number filter"""
    primes = []
    for i in num_generator(m, n):
        flag = True
        for j in range(2, int(i ** 0.5 + 1)):
            if i % j == 0:
                flag = False
                break
        if flag:
            print('Prime =>', i)
            primes.append(i)

        await asyncio.sleep(0.001)
    return tuple(primes)


async def square_mapper(m, n):
    """Square mapper"""
    squares = []
    for i in num_generator(m, n):
        print('Square =>', i * i)
        squares.append(i * i)

        await asyncio.sleep(0.001)
    return squares


def main():
    """Main function"""
    loop = asyncio.get_event_loop()
    future = asyncio.gather(prime_filter(2, 100), square_mapper(1, 100))
    future.add_done_callback(lambda x: print(x.result()))
    loop.run_until_complete(future)
    loop.close()


if __name__ == '__main__':
    main()
```
Note: The code above uses the get_event_loop function to get the system default event loop. Through the gather function, we can get a future object. The add_done_callback of the future object can add a callback function to run when execution is completed. The run_until_complete method of the loop object can wait for the coroutine execution result obtained through the future object.

Python has a third-party library named aiohttp. It provides asynchronous HTTP client and server. This third-party library can work together with the asyncio module, and provides support for Future objects. Python 3.6 introduced async and await to define asynchronously executed functions and create asynchronous context, and in Python 3.7 they officially became keywords. The code below asynchronously gets pages from 5 URLs and uses named capture groups in regular expressions to extract the website titles.
```
import asyncio
import re

import aiohttp

PATTERN = re.compile(r'\<title\>(?P<title>.*)\<\/title\>')


async def fetch_page(session, url):
    async with session.get(url, ssl=False) as resp:
        return await resp.text()


async def show_title(url):
    async with aiohttp.ClientSession() as session:
        html = await fetch_page(session, url)
        print(PATTERN.search(html).group('title'))


def main():
    urls = ('https://www.python.org/',
            'https://git-scm.com/',
            'https://www.jd.com/',
            'https://www.taobao.com/',
            'https://www.douban.com/')
    loop = asyncio.get_event_loop()
    cos = [show_title(url) for url in urls]
    loop.run_until_complete(asyncio.wait(cos))
    loop.close()


if __name__ == '__main__':
    main()
```
Key point: Comparison between asynchronous I/O and multi-process.

When the program does not need real concurrency or parallelism, but depends more on asynchronous processing and callback, asyncio is a very good choice. If there is a large amount of waiting and sleeping in the program, asyncio should also be considered. It is very suitable for writing Web application servers that do not need real-time data processing.

Python also has many third-party libraries for handling parallel tasks, such as joblib and PyMP. In actual development, to improve the scalability and concurrency of the system, there are usually two ways: vertical scaling, increase the processing ability of a single node, and horizontal scaling, change one single node into multiple nodes. We can use message queues to decouple the application. A message queue is equal to an extended version of the multi-thread synchronous queue. Applications on different machines are equal to threads, and the shared distributed message queue is the original Queue in the program. The most popular and standardized implementation of message queue, message-oriented middleware, is AMQP, Advanced Message Queuing Protocol. AMQP comes from the financial industry, and provides queueing, routing, reliable transmission, security, and other functions. The most famous implementations include Apache ActiveMQ, RabbitMQ, and so on.

To make tasks asynchronous, we can use the third-party library named Celery. Celery is a distributed task queue written in Python. It works by using distributed messages, and can use RabbitMQ or Redis as the message broker in the backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Python Topics

Important Knowledge Points

Data Structures and Algorithms

Ways to Use Functions

Object-Oriented Knowledge

Iterators and Generators

Concurrent Programming

FilesExpand file tree

31.python-advanced.md

Latest commit

History

31.python-advanced.md

File metadata and controls

Advanced Python Topics

Important Knowledge Points

Data Structures and Algorithms

Ways to Use Functions

Object-Oriented Knowledge

Iterators and Generators

Concurrent Programming