-
How to use comprehensions
prices = { 'AAPL': 191.88, 'GOOG': 1186.96, 'IBM': 149.24, 'ORCL': 48.44, 'ACN': 166.89, 'FB': 208.09, 'SYMC': 21.29 } # Build a new dictionary with stocks whose price is greater than 100 prices2 = {key: value for key, value in prices.items() if value > 100} print(prices2)
Note: Comprehensions can be used to make lists, sets, and dictionaries.
-
A pitfall of nested lists
names = ['Guan Yu', 'Zhang Fei', 'Zhao Yun', 'Ma Chao', 'Huang Zhong'] courses = ['Chinese', 'Math', 'English'] # Enter the scores of five students in three courses # Wrong - see http://pythontutor.com/visualize.html#mode=edit # scores = [[None] * len(courses)] * len(names) scores = [[None] * len(courses) for _ in range(len(names))] for row, name in enumerate(names): for col, course in enumerate(courses): scores[row][col] = float(input(f'Please enter {name}\'s {course} score: ')) print(scores)
Python Tutor - VISUALIZE CODE AND GET LIVE HELP
-
The
heapqmodule (heap sort)""" Find the largest or smallest N elements in a list Heap structure (max heap / min heap) """ import heapq list1 = [34, 25, 12, 99, 87, 63, 58, 78, 88, 92] list2 = [ {'name': 'IBM', 'shares': 100, 'price': 91.1}, {'name': 'AAPL', 'shares': 50, 'price': 543.22}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}, {'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'ACME', 'shares': 75, 'price': 115.65} ] print(heapq.nlargest(3, list1)) print(heapq.nsmallest(3, list1)) print(heapq.nlargest(2, list2, key=lambda x: x['price'])) print(heapq.nlargest(2, list2, key=lambda x: x['shares']))
-
The
itertoolsmodule""" Iteration tools module """ import itertools # Produce all permutations of ABCD itertools.permutations('ABCD') # Produce 3-element combinations from ABCDE itertools.combinations('ABCDE', 3) # Produce the Cartesian product of ABCD and 123 itertools.product('ABCD', '123') # Produce an infinite cycle of ABC itertools.cycle(('A', 'B', 'C'))
-
The
collectionsmoduleCommon utility classes:
namedtuple: a named tuple. It is a class factory. It accepts a type name and an attribute list to create a class.deque: a double-ended queue. It is another way to implement a list. In Python, a list is based on an array, butdequeis based on a doubly linked list. So when you need to add or delete elements at both ends,dequehas better performance, and its asymptotic time complexity isO(1).Counter: a subclass ofdict. The key is the element, and the value is the count of that element. Itsmost_common()method can help us get the elements that appear most often. I think the inheritance relationship betweenCounteranddictis open to discussion. According to the CARP principle, it would be more reasonable to design the relationship betweenCounteranddictas an association relationship.OrderedDict: a subclass ofdict. It records the order in which key-value pairs are inserted. It looks like it has the behavior of both a dictionary and a linked list.defaultdict: similar to a dictionary type, but it can get a default value for a key through a factory function. Compared with thesetdefault()method of a dictionary, this is more efficient.
""" Find the elements that appear most often in a sequence """ from collections import Counter words = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under' ] counter = Counter(words) print(counter.most_common(3))
-
Algorithm: the method and steps used to solve a problem.
-
To judge whether an algorithm is good or bad, we usually look at asymptotic time complexity and asymptotic space complexity.
-
Big-O notation for asymptotic time complexity:
O(c)- constant time complexity - Bloom filters / hash storageO(log_2n)- logarithmic time complexity - binary searchO(n)- linear time complexity - sequential search / counting sortO(n * log_2n)- log-linear time complexity - advanced sorting algorithms (merge sort, quicksort)O(n^2)- square time complexity - simple sorting algorithms (selection sort, insertion sort, bubble sort)O(n^3)- cubic time complexity - Floyd's algorithm / matrix multiplicationO(2^n)- geometric-series time complexity - Tower of HanoiO(n!)- factorial time complexity - traveling salesman problem / NPC
-
Sorting algorithms (selection sort, bubble sort, merge sort) and searching algorithms (sequential search and binary search)
def select_sort(items, comp=lambda x, y: x < y): """Simple selection sort""" items = items[:] for i in range(len(items) - 1): min_index = i for j in range(i + 1, len(items)): if comp(items[j], items[min_index]): min_index = j items[i], items[min_index] = items[min_index], items[i] return items
def bubble_sort(items, comp=lambda x, y: x > y): """Bubble sort""" items = items[:] for i in range(len(items) - 1): swapped = False for j in range(len(items) - 1 - i): if comp(items[j], items[j + 1]): items[j], items[j + 1] = items[j + 1], items[j] swapped = True if not swapped: break return items
def bubble_sort(items, comp=lambda x, y: x > y): """Cocktail sort (an upgraded version of bubble sort)""" items = items[:] for i in range(len(items) - 1): swapped = False for j in range(len(items) - 1 - i): if comp(items[j], items[j + 1]): items[j], items[j + 1] = items[j + 1], items[j] swapped = True if swapped: swapped = False for j in range(len(items) - 2 - i, i, -1): if comp(items[j - 1], items[j]): items[j], items[j - 1] = items[j - 1], items[j] swapped = True if not swapped: break return items
def merge(items1, items2, comp=lambda x, y: x < y): """Merge (merge two sorted lists into one sorted list)""" items = [] index1, index2 = 0, 0 while index1 < len(items1) and index2 < len(items2): if comp(items1[index1], items2[index2]): items.append(items1[index1]) index1 += 1 else: items.append(items2[index2]) index2 += 1 items += items1[index1:] items += items2[index2:] return items def merge_sort(items, comp=lambda x, y: x < y): return _merge_sort(list(items), comp) def _merge_sort(items, comp): """Merge sort""" if len(items) < 2: return items mid = len(items) // 2 left = _merge_sort(items[:mid], comp) right = _merge_sort(items[mid:], comp) return merge(left, right, comp)
def seq_search(items, key): """Sequential search""" for index, item in enumerate(items): if item == key: return index return -1
def bin_search(items, key): """Binary search""" start, end = 0, len(items) - 1 while start <= end: mid = (start + end) // 2 if key > items[mid]: start = mid + 1 elif key < items[mid]: end = mid - 1 else: return mid return -1
-
Common algorithms:
- Exhaustive method, also called brute-force cracking. Check all possibilities until the correct answer is found.
- Greedy method. When solving a problem, always make what seems to be the best choice at the current moment. It does not try to find the best solution, but quickly finds a satisfactory solution.
- Divide-and-conquer method. Break a complex problem into two or more same or similar subproblems, then keep breaking the subproblems into smaller subproblems until they can be solved directly. At last, combine the solutions of the subproblems to get the solution of the original problem.
- Backtracking method. The backtracking method is also called the trial method. Search forward according to the better-choice condition. When the search reaches some step and finds that the original choice is not good enough or cannot reach the goal, go back one step and choose again.
- Dynamic programming. The basic idea is also to break the problem to be solved into several subproblems, solve these subproblems first and save their solutions, so that a large amount of repeated computation can be avoided.
Example of the exhaustive method: the hundred-money hundred-chickens problem and the five people dividing fish problem.
# A rooster costs 5 yuan, a hen costs 3 yuan, and 3 chicks cost 1 yuan # Use 100 yuan to buy 100 chickens, how many roosters / hens / chicks are there for x in range(20): for y in range(33): z = 100 - x - y if 5 * x + 3 * y + z // 3 == 100 and z % 3 == 0: print(x, y, z) # Five people A, B, C, D, and E caught fish together one night and then fell asleep # The next morning, A woke up first. He divided the fish into 5 parts, threw away 1 extra fish, # and took his own part. B woke up second and also divided the fish in the same way. # Then C, D, and E woke up one after another and divided the fish in the same way. # Ask how many fish they caught at least fish = 6 while True: total = fish enough = True for _ in range(5): if (total - 1) % 5 == 0: total = (total - 1) // 5 * 4 else: enough = False break if enough: print(fish) break fish += 5
Example of the greedy method: suppose a thief has a backpack that can hold at most 20 kilograms of stolen goods. He breaks into a house and finds the items shown in the table below. Clearly, he cannot put all the items into the backpack, so he must decide which items to take away and which items to leave.
Name Price (USD) Weight (kg) Computer 200 20 Radio 20 4 Clock 175 10 Vase 50 2 Book 10 1 Oil Painting 90 9 """ Greedy method: when solving a problem, always make what seems to be the best choice at the current moment. It does not try to find the best solution, but quickly finds a satisfactory solution. Input: 20 6 Computer 200 20 Radio 20 4 Clock 175 10 Vase 50 2 Book 10 1 OilPainting 90 9 """ class Thing(object): """Item""" def __init__(self, name, price, weight): self.name = name self.price = price self.weight = weight @property def value(self): """Price-weight ratio""" return self.price / self.weight def input_thing(): """Input item information""" name_str, price_str, weight_str = input().split() return name_str, int(price_str), int(weight_str) def main(): """Main function""" max_weight, num_of_things = map(int, input().split()) all_things = [] for _ in range(num_of_things): all_things.append(Thing(*input_thing())) all_things.sort(key=lambda x: x.value, reverse=True) total_weight = 0 total_price = 0 for thing in all_things: if total_weight + thing.weight <= max_weight: print(f'The thief took {thing.name}') total_weight += thing.weight total_price += thing.price print(f'Total value: {total_price} dollars') if __name__ == '__main__': main()
Example of the divide-and-conquer method: quicksort.
""" Quicksort - choose a pivot to divide the elements. The left side is all smaller than the pivot, and the right side is all larger than the pivot """ def quick_sort(items, comp=lambda x, y: x <= y): items = list(items)[:] _quick_sort(items, 0, len(items) - 1, comp) return items def _quick_sort(items, start, end, comp): if start < end: pos = _partition(items, start, end, comp) _quick_sort(items, start, pos - 1, comp) _quick_sort(items, pos + 1, end, comp) def _partition(items, start, end, comp): pivot = items[end] i = start - 1 for j in range(start, end): if comp(items[j], pivot): i += 1 items[i], items[j] = items[j], items[i] items[i + 1], items[end] = items[end], items[i + 1] return i + 1
Example of the backtracking method: Knight's tour.
""" Recursive backtracking: also called the trial method. Search forward according to the better-choice condition. When the search reaches some step and finds that the original choice is not good enough or cannot reach the goal, then go back one step and choose again. Classic problems include knight's tour, eight queens, and maze path finding. """ import sys import time SIZE = 5 total = 0 def print_board(board): for row in board: for col in row: print(str(col).center(4), end='') print() def patrol(board, row, col, step=1): if row >= 0 and row < SIZE and \ col >= 0 and col < SIZE and \ board[row][col] == 0: board[row][col] = step if step == SIZE * SIZE: global total total += 1 print(f'The {total}th way: ') print_board(board) patrol(board, row - 2, col - 1, step + 1) patrol(board, row - 1, col - 2, step + 1) patrol(board, row + 1, col - 2, step + 1) patrol(board, row + 2, col - 1, step + 1) patrol(board, row + 2, col + 1, step + 1) patrol(board, row + 1, col + 2, step + 1) patrol(board, row - 1, col + 2, step + 1) patrol(board, row - 2, col + 1, step + 1) board[row][col] = 0 def main(): board = [[0] * SIZE for _ in range(SIZE)] patrol(board, SIZE - 1, SIZE - 1) if __name__ == '__main__': main()
Example of dynamic programming: the maximum value of the sum of elements in a sublist.
Note: A sublist means a list made up of elements with continuous indexes in the list. The elements in the list are of type
int, and they may include positive integers,0, and negative integers. The program inputs the elements in the list and outputs the maximum value of the sum of elements in a sublist. For example:Input:
1 -2 3 5 -3 2Output:
8Input:
0 -2 3 5 -1 2Output:
9Input:
-9 -2 -3 -5 -3Output:
-2def main(): items = list(map(int, input().split())) overall = partial = items[0] for i in range(1, len(items)): partial = max(items[i], partial + items[i]) overall = max(partial, overall) print(overall) if __name__ == '__main__': main()
Note: The easiest solution to think of for this problem is to use two nested loops, but the time performance of the code becomes very bad. By using the idea of dynamic programming and only two more variables, the original problem with
O(N^2)complexity becomesO(N).
-
Treat functions as "first-class citizens"
- Functions can be assigned to variables
- Functions can be used as parameters of functions
- Functions can be used as return values of functions
-
How to use higher-order functions (
filter,map, and their replacements)items1 = list(map(lambda x: x ** 2, filter(lambda x: x % 2, range(1, 10)))) items2 = [x ** 2 for x in range(1, 10) if x % 2]
-
Positional parameters, variable parameters, keyword parameters, named keyword parameters
-
Metadata of parameters (a code readability issue)
-
How to use anonymous functions and inline functions (
lambdafunctions) -
Closures and the scope problem
-
Python's LEGB order for searching variables (
Local >>> Embedded >>> Global >>> Built-in) -
The roles of the
globalandnonlocalkeywordsglobal: declare or define a global variable (either directly use an existing variable in the global scope, or define a variable and put it into the global scope).nonlocal: declare that you are using a variable in a nested scope (the variable must exist in the nested scope, otherwise an error is raised).
-
-
Decorator functions (using decorators and removing decorators)
Example: a decorator that outputs the execution time of a function.
from functools import wraps from time import time def record_time(func): """Custom decorator for decorating a function""" @wraps(func) def wrapper(*args, **kwargs): start = time() result = func(*args, **kwargs) print(f'{func.__name__}: {time() - start} seconds') return result return wrapper
If the decorator does not want to be coupled with the
printfunction, we can write a parameterized decorator.from functools import wraps from time import time def record(output): """Parameterized decorator""" def decorate(func): @wraps(func) def wrapper(*args, **kwargs): start = time() result = func(*args, **kwargs) output(func.__name__, time() - start) return result return wrapper return decorate
from functools import wraps from time import time class Record(): """Define a decorator by defining a class""" def __init__(self, output): self.output = output def __call__(self, func): @wraps(func) def wrapper(*args, **kwargs): start = time() result = func(*args, **kwargs) self.output(func.__name__, time() - start) return result return wrapper
Note: Because the function with decorator function is decorated with
@wraps, we can get the function or class before decoration byfunc.__wrapped__, and then remove the effect of the decorator.Example: use a decorator to implement the singleton pattern.
from functools import wraps def singleton(cls): """Decorator for decorating a class""" instances = {} @wraps(cls) def wrapper(*args, **kwargs): if cls not in instances: instances[cls] = cls(*args, **kwargs) return instances[cls] return wrapper @singleton class President: """President, singleton class""" pass
Tip: The code above uses a closure. I do not know whether you have noticed it. There is also one small problem. The code above does not implement a thread-safe singleton. How should it be done if we want to implement a thread-safe singleton?
Thread-safe singleton decorator.
from functools import wraps from threading import RLock def singleton(cls): """Thread-safe singleton decorator""" instances = {} locker = RLock() @wraps(cls) def wrapper(*args, **kwargs): if cls not in instances: with locker: if cls not in instances: instances[cls] = cls(*args, **kwargs) return instances[cls] return wrapper
Tip: The code above uses
withcontext syntax to do the lock operation, because the lock object itself is a context manager object, it supports the magic methods__enter__and__exit__. In thewrapperfunction, we first do one check without lock, and then do one check with lock. This does better in performance than checking directly with lock. If the object has already been created, there is no need to lock again. We can directly return the object.
-
The three pillars: encapsulation, inheritance, polymorphism
Example: salary settlement system.
""" Monthly salary settlement system: department manager 15000 every month programmer 200 per hour salesperson 1800 base salary plus 5% commission of sales """ from abc import ABCMeta, abstractmethod class Employee(metaclass=ABCMeta): """Employee, abstract class""" def __init__(self, name): self.name = name @abstractmethod def get_salary(self): """Settle monthly salary, abstract method""" pass class Manager(Employee): """Department manager""" def get_salary(self): return 15000.0 class Programmer(Employee): """Programmer""" def __init__(self, name, working_hour=0): self.working_hour = working_hour super().__init__(name) def get_salary(self): return 200.0 * self.working_hour class Salesman(Employee): """Salesperson""" def __init__(self, name, sales=0.0): self.sales = sales super().__init__(name) def get_salary(self): return 1800.0 + self.sales * 0.05 class EmployeeFactory: """Factory for creating employees. Factory pattern: use a factory to decouple the object user and the object. """ @staticmethod def create(emp_type, *args, **kwargs): """Create employee""" all_emp_types = {'M': Manager, 'P': Programmer, 'S': Salesman} cls = all_emp_types[emp_type.upper()] return cls(*args, **kwargs) if cls else None def main(): """Main function""" emps = [ EmployeeFactory.create('M', 'Cao Cao'), EmployeeFactory.create('P', 'Xun Yu', 120), EmployeeFactory.create('P', 'Guo Jia', 85), EmployeeFactory.create('S', 'Dian Wei', 123000), ] for emp in emps: print(f'{emp.name}: {emp.get_salary():.2f} yuan') if __name__ == '__main__': main()
-
The relationship between classes and classes
- is-a relationship: inheritance
- has-a relationship: association / aggregation / composition
- use-a relationship: dependency
Example: poker game.
""" Experience: symbolic constants are always better than literal constants. Enumeration type is the best choice to define symbolic constants. """ from enum import Enum, unique import random @unique class Suite(Enum): """Suit""" SPADE, HEART, CLUB, DIAMOND = range(4) def __lt__(self, other): return self.value < other.value class Card: """Card""" def __init__(self, suite, face): """Initialization method""" self.suite = suite self.face = face def show(self): """Show the face of the card""" suites = ['♠︎', '♥︎', '♣︎', '♦︎'] faces = ['', 'A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K'] return f'{suites[self.suite.value]}{faces[self.face]}' def __repr__(self): return self.show() class Poker: """Poker""" def __init__(self): self.index = 0 self.cards = [Card(suite, face) for suite in Suite for face in range(1, 14)] def shuffle(self): """Shuffle cards, random order""" random.shuffle(self.cards) self.index = 0 def deal(self): """Deal cards""" card = self.cards[self.index] self.index += 1 return card @property def has_more(self): return self.index < len(self.cards) class Player: """Player""" def __init__(self, name): self.name = name self.cards = [] def get_one(self, card): """Get one card""" self.cards.append(card) def sort(self, comp=lambda card: (card.suite, card.face)): """Arrange the cards in hand""" self.cards.sort(key=comp) def main(): """Main function""" poker = Poker() poker.shuffle() players = [Player('East Heretic'), Player('West Poison'), Player('South Emperor'), Player('North Beggar')] while poker.has_more: for player in players: player.get_one(poker.deal()) for player in players: player.sort() print(player.name, end=': ') print(player.cards) if __name__ == '__main__': main()
Note: The code above uses Emoji characters to represent the four suits of poker cards. On some systems that do not support Emoji characters, they may not display.
-
Object copying, deep copy / deep clone and shallow copy / shadow clone
-
Garbage collection, circular reference, and weak reference
Python uses automatic memory management. This mechanism is based on reference counting, and at the same time introduces mark-sweep and generational collection as supporting strategies.
typedef struct _object { /* reference count */ int ob_refcnt; /* object pointer */ struct _typeobject *ob_type; } PyObject;
/* macro to increase reference count */ #define Py_INCREF(op) ((op)->ob_refcnt++) /* macro to decrease reference count */ #define Py_DECREF(op) \ if (--(op)->ob_refcnt != 0) \ ; \ else \ __Py_Dealloc((PyObject *)(op))
Situations that cause reference count +1:
- The object is created, for example
a = 23 - The object is referenced, for example
b = a - The object is passed as a parameter into a function, for example
f(a) - The object is stored in a container as an element, for example
list1 = [a, a]
Situations that cause reference count -1:
- The alias of the object is explicitly destroyed, for example
del a - The alias of the object is assigned a new object, for example
a = 24 - An object leaves its scope, for example when function
ffinishes execution, local variables infwill, global variables will not - The container containing the object is destroyed, or the object is deleted from the container
Reference counting may cause the problem of circular reference, and circular reference will cause memory leak, as shown in the code below. To solve this problem, Python introduces mark-sweep and generational collection. When an object is created, it is put in generation 1. If the object survives the garbage check in generation 1, then the object is put into generation 2. In the same way, if the object survives the garbage check in generation 2, then the object is put into generation 3.
# Circular reference causes memory leak. # Besides reference counting, Python also introduces mark cleanup and generational collection. # Before Python 3.6, if __del__ magic method was overridden, circular reference handling failed. # If you do not want to cause circular reference, you can use weak reference. list1 = [] list2 = [] list1.append(list2) list2.append(list1)
The following situations will cause garbage collection:
- Call
gc.collect() - The counter of the
gcmodule reaches the threshold - The program exits
If two objects in a circular reference both define the
__del__method, thegcmodule will not destroy these unreachable objects, because thegcmodule does not know which object's__del__method should be called first. This problem was solved in Python 3.6.We can also solve the circular reference problem by creating weak references through the
weakrefmodule. - The object is created, for example
-
Magic attributes and methods, please refer to A Guide to Python Magic Methods
There are a few small questions for everyone to think about:
- Can custom objects use operators to do operations?
- Can custom objects be put into
set? Can they remove duplicates? - Can custom objects be used as keys of
dict? - Can custom objects use context syntax?
-
Mixin
Example: custom dictionary limits that key-value pairs can be set only when the specified key does not exist.
class SetOnceMappingMixin: """Custom mixin class""" __slots__ = () def __setitem__(self, key, value): if key in self: raise KeyError(str(key) + ' already set') return super().__setitem__(key, value) class SetOnceDict(SetOnceMappingMixin, dict): """Custom dictionary""" pass my_dict = SetOnceDict() try: my_dict['username'] = 'jackfrued' my_dict['username'] = 'hellokitty' except KeyError: pass print(my_dict)
-
Metaprogramming and metaclass
Objects are created through classes. Classes are created through metaclasses. Metaclasses provide the meta information for creating classes. All classes directly or indirectly inherit from
object. All metaclasses directly or indirectly inherit fromtype.Example: use metaclass to implement singleton pattern.
import threading class SingletonMeta(type): """Custom metaclass""" def __init__(cls, *args, **kwargs): cls.__instance = None cls.__lock = threading.RLock() super().__init__(*args, **kwargs) def __call__(cls, *args, **kwargs): if cls.__instance is None: with cls.__lock: if cls.__instance is None: cls.__instance = super().__call__(*args, **kwargs) return cls.__instance class President(metaclass=SingletonMeta): """President, singleton class""" pass
-
Object-oriented design principles
- Single responsibility principle, SRP: a class only does what it should do, the design of the class should have high cohesion
- Open-closed principle, OCP: software entities should be open for extension and closed for modification
- Dependency inversion principle, DIP: programming facing abstraction, this has already been weakened in weakly typed languages
- Liskov substitution principle, LSP: at any time, a subclass object can replace a parent class object
- Interface segregation principle, ISP: interfaces should be small and specialized, not big and complete. In Python there is no concept of interface
- Composition aggregate reuse principle, CARP: prefer using strong association relationship instead of inheritance relationship to reuse code
- Least knowledge principle, Law of Demeter, LoD: do not send messages to objects that have no necessary connection
Note: The bold letters above together are called the SOLID principles in object-oriented programming.
-
GoF design patterns
- Creational patterns: singleton, factory, builder, prototype
- Structural patterns: adapter, facade, proxy
- Behavioral patterns: iterator, observer, state, strategy
Example: pluggable hash algorithm, strategy pattern.
class StreamHasher: """Hash digest generator""" def __init__(self, alg='md5', size=4096): self.size = size alg = alg.lower() self.hasher = getattr(__import__('hashlib'), alg.lower())() def __call__(self, stream): return self.to_digest(stream) def to_digest(self, stream): """Generate digest in hexadecimal form""" for buf in iter(lambda: stream.read(self.size), b''): self.hasher.update(buf) return self.hasher.hexdigest() def main(): """Main function""" hasher1 = StreamHasher() with open('Python-3.7.6.tgz', 'rb') as stream: print(hasher1.to_digest(stream)) hasher2 = StreamHasher('sha1') with open('Python-3.7.6.tgz', 'rb') as stream: print(hasher2(stream)) if __name__ == '__main__': main()
-
An iterator is an object that implements the iterator protocol.
- In Python there is no keyword like
protocolorinterfaceto define a protocol. - In Python, magic methods are used to represent protocols.
- The magic methods
__iter__and__next__are the iterator protocol.
class Fib(object): """Iterator""" def __init__(self, num): self.num = num self.a, self.b = 0, 1 self.idx = 0 def __iter__(self): return self def __next__(self): if self.idx < self.num: self.a, self.b = self.b, self.a + self.b self.idx += 1 return self.a raise StopIteration()
- In Python there is no keyword like
-
A generator is an iterator in a syntax-simplified version.
def fib(num): """Generator""" a, b = 0, 1 for _ in range(num): a, b = b, a + b yield a
-
Generators evolve into coroutines.
A generator object can use the
send()method to send data, and the sent data becomes the value obtained by theyieldexpression in the generator function. In this way, generators can be used as coroutines. Simply speaking, coroutines are subprograms that can cooperate with each other.def calc_avg(): """Calculate average value in a stream way""" total, counter = 0, 0 avg_value = None while True: value = yield avg_value total, counter = total + value, counter + 1 avg_value = total / counter gen = calc_avg() next(gen) print(gen.send(10)) print(gen.send(20)) print(gen.send(30))
Python has three solutions for concurrent programming: multi-threading, multi-process, and asynchronous I/O. The good side of concurrent programming is that it can improve the execution efficiency of the program and improve user experience. The bad side is that concurrent programs are not easy to develop and debug, and to other programs they are not friendly either.
-
Multi-threading: Python provides the
Threadclass, together withLock,Condition,Event,Semaphore, andBarrier. Python has GIL to prevent multiple threads from executing local bytecode at the same time. This lock is necessary for CPython, because the memory management of CPython is not thread-safe. Because of the existence of GIL, multi-threading cannot use the multi-core features of CPU.""" Interview question: what are the difference and relationship between process and thread? Process: the basic unit for the operating system to allocate memory. One process can contain one or more threads. Thread: the basic unit for the operating system to allocate CPU. Concurrent programming 1. Improve execution performance: let parts in the program that have no causal relationship execute concurrently. 2. Improve user experience: let time-consuming operations not cause fake death of the program. """ import glob import os import threading from PIL import Image PREFIX = 'thumbnails' def generate_thumbnail(infile, size, format='PNG'): """Generate thumbnail of specified image file""" file, ext = os.path.splitext(infile) file = file[file.rfind('/') + 1:] outfile = f'{PREFIX}/{file}_{size[0]}_{size[1]}.{ext}' img = Image.open(infile) img.thumbnail(size, Image.ANTIALIAS) img.save(outfile, format) def main(): """Main function""" if not os.path.exists(PREFIX): os.mkdir(PREFIX) for infile in glob.glob('images/*.png'): for size in (32, 64, 128): # Create and start thread threading.Thread( target=generate_thumbnail, args=(infile, (size, size)) ).start() if __name__ == '__main__': main()
The situation where multiple threads compete for resources.
""" Multi-threaded programs are usually easier if there is no competition for resources. When multiple threads compete for critical resources, if there is no necessary protection measure, data disorder will happen. Note: critical resource means the resource competed for by multiple threads. """ import time import threading from concurrent.futures import ThreadPoolExecutor class Account(object): """Bank account""" def __init__(self): self.balance = 0.0 self.lock = threading.Lock() def deposit(self, money): # Use a lock to protect critical resources with self.lock: new_balance = self.balance + money time.sleep(0.001) self.balance = new_balance def main(): """Main function""" account = Account() # Create thread pool pool = ThreadPoolExecutor(max_workers=10) futures = [] for _ in range(100): future = pool.submit(account.deposit, 1) futures.append(future) # Close thread pool pool.shutdown() for future in futures: future.result() print(account.balance) if __name__ == '__main__': main()
Modify the program above. Start 5 threads to deposit money into the account, and 5 threads to take money from the account. When taking money, if the balance is not enough, then pause the thread and wait. To reach the goal above, we need to schedule the deposit and withdraw threads. When the balance is not enough, the withdraw thread pauses and releases the lock, and after the deposit thread puts money in, it should notify the withdraw thread, so that it wakes up from the paused state. We can use
Conditionin thethreadingmodule to implement thread scheduling. This object is also created based on a lock. The code is shown below:""" Multiple threads compete for one resource: protect critical resource, use lock, Lock / RLock Multiple threads compete for multiple resources, thread count > resource count: semaphore, Semaphore Scheduling of multiple threads: pause thread execution / wake up waiting thread, Condition """ from concurrent.futures import ThreadPoolExecutor from random import randint from time import sleep import threading class Account: """Bank account""" def __init__(self, balance=0): self.balance = balance lock = threading.RLock() self.condition = threading.Condition(lock) def withdraw(self, money): """Withdraw money""" with self.condition: while money > self.balance: self.condition.wait() new_balance = self.balance - money sleep(0.001) self.balance = new_balance def deposit(self, money): """Deposit money""" with self.condition: new_balance = self.balance + money sleep(0.001) self.balance = new_balance self.condition.notify_all() def add_money(account): while True: money = randint(5, 10) account.deposit(money) print(threading.current_thread().name, ':', money, '====>', account.balance) sleep(0.5) def sub_money(account): while True: money = randint(10, 30) account.withdraw(money) print(threading.current_thread().name, ':', money, '<====', account.balance) sleep(1) def main(): account = Account() with ThreadPoolExecutor(max_workers=15) as pool: for _ in range(5): pool.submit(add_money, account) for _ in range(10): pool.submit(sub_money, account) if __name__ == '__main__': main()
-
Multi-process: multi-process can effectively solve the problem of GIL. The main class for implementing multi-process is
Process. Other helper classes are similar to those in thethreadingmodule. For sharing data between processes, pipes and sockets can be used. In themultiprocessingmodule there is aQueueclass. It provides a queue shared by multiple processes based on pipe and lock mechanism. Below is one example from the official documentation about multi-process and process pool.""" Use of multi-process and process pool. Because of GIL, multi-threading cannot use the multi-core features of CPU. For compute-intensive tasks, we should consider using multi-process. time python3 example22.py real 0m11.512s user 0m39.319s sys 0m0.169s After using multi-process, the actual execution time is 11.512 seconds, while the user time 39.319 seconds is about 4 times the actual execution time. This proves that our program used the multi-core features of CPU through multi-process, and this computer has a 4-core CPU. """ import concurrent.futures import math PRIMES = [ 1116281, 1297337, 104395303, 472882027, 533000389, 817504243, 982451653, 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419 ] * 5 def is_prime(n): """Judge prime number""" if n % 2 == 0: return False sqrt_n = int(math.floor(math.sqrt(n))) for i in range(3, sqrt_n + 1, 2): if n % i == 0: return False return True def main(): """Main function""" with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, prime)) if __name__ == '__main__': main()
Key point: Comparison between multi-threading and multi-process.
The following situations need multi-threading:
- The program needs to maintain many shared states, especially mutable states. Lists, dictionaries, and sets in Python are all thread-safe, so using threads instead of processes to maintain shared state has relatively small cost.
- The program will spend a lot of time on I/O operations, and does not have much need for parallel computing, and does not need to occupy too much memory.
The following situations need multi-process:
- The program executes compute-intensive tasks, for example bytecode operations, data processing, scientific computing.
- The input of the program can be split into blocks in parallel, and the operation results can be merged.
- The program has no limit at all in memory use, and does not strongly depend on I/O operations, for example reading and writing files and sockets.
-
Asynchronous processing: choose tasks from the task queue of the scheduler. The scheduler executes these tasks in an interleaving way. We cannot guarantee the tasks will execute in some specific order, because the execution order depends on whether one task in the queue is willing to give CPU processing time to another task. Asynchronous tasks are usually implemented through multi-task cooperative processing. Because the execution time and order are uncertain, we need callback style programming or a
futureobject to get the execution result of the task. Python 3 supports asynchronous processing through theasynciomodule and theawaitandasynckeywords, they officially became keywords in Python 3.7.""" Asynchronous I/O, async / await """ import asyncio def num_generator(m, n): """Number generator in specified range""" yield from range(m, n + 1) async def prime_filter(m, n): """Prime number filter""" primes = [] for i in num_generator(m, n): flag = True for j in range(2, int(i ** 0.5 + 1)): if i % j == 0: flag = False break if flag: print('Prime =>', i) primes.append(i) await asyncio.sleep(0.001) return tuple(primes) async def square_mapper(m, n): """Square mapper""" squares = [] for i in num_generator(m, n): print('Square =>', i * i) squares.append(i * i) await asyncio.sleep(0.001) return squares def main(): """Main function""" loop = asyncio.get_event_loop() future = asyncio.gather(prime_filter(2, 100), square_mapper(1, 100)) future.add_done_callback(lambda x: print(x.result())) loop.run_until_complete(future) loop.close() if __name__ == '__main__': main()
Note: The code above uses the
get_event_loopfunction to get the system default event loop. Through thegatherfunction, we can get afutureobject. Theadd_done_callbackof thefutureobject can add a callback function to run when execution is completed. Therun_until_completemethod of theloopobject can wait for the coroutine execution result obtained through thefutureobject.Python has a third-party library named
aiohttp. It provides asynchronous HTTP client and server. This third-party library can work together with theasynciomodule, and provides support forFutureobjects. Python 3.6 introducedasyncandawaitto define asynchronously executed functions and create asynchronous context, and in Python 3.7 they officially became keywords. The code below asynchronously gets pages from 5 URLs and uses named capture groups in regular expressions to extract the website titles.import asyncio import re import aiohttp PATTERN = re.compile(r'\<title\>(?P<title>.*)\<\/title\>') async def fetch_page(session, url): async with session.get(url, ssl=False) as resp: return await resp.text() async def show_title(url): async with aiohttp.ClientSession() as session: html = await fetch_page(session, url) print(PATTERN.search(html).group('title')) def main(): urls = ('https://www.python.org/', 'https://git-scm.com/', 'https://www.jd.com/', 'https://www.taobao.com/', 'https://www.douban.com/') loop = asyncio.get_event_loop() cos = [show_title(url) for url in urls] loop.run_until_complete(asyncio.wait(cos)) loop.close() if __name__ == '__main__': main()
Key point: Comparison between asynchronous I/O and multi-process.
When the program does not need real concurrency or parallelism, but depends more on asynchronous processing and callback,
asynciois a very good choice. If there is a large amount of waiting and sleeping in the program,asyncioshould also be considered. It is very suitable for writing Web application servers that do not need real-time data processing.Python also has many third-party libraries for handling parallel tasks, such as
joblibandPyMP. In actual development, to improve the scalability and concurrency of the system, there are usually two ways: vertical scaling, increase the processing ability of a single node, and horizontal scaling, change one single node into multiple nodes. We can use message queues to decouple the application. A message queue is equal to an extended version of the multi-thread synchronous queue. Applications on different machines are equal to threads, and the shared distributed message queue is the originalQueuein the program. The most popular and standardized implementation of message queue, message-oriented middleware, is AMQP, Advanced Message Queuing Protocol. AMQP comes from the financial industry, and provides queueing, routing, reliable transmission, security, and other functions. The most famous implementations include Apache ActiveMQ, RabbitMQ, and so on.To make tasks asynchronous, we can use the third-party library named
Celery.Celeryis a distributed task queue written in Python. It works by using distributed messages, and can use RabbitMQ or Redis as the message broker in the backend.

