Skip to content

Commit 1009ee8

Browse files
committed
upload sections
1 parent 9224cfb commit 1009ee8

2 files changed

Lines changed: 213 additions & 0 deletions

File tree

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Bridging Python and C/C++ Functions
2+
3+
Developers frequently encounter the need to incorporate custom operators
4+
into a machine learning framework. These operators implement new models,
5+
optimizers, data processing functions, and more. Custom operators, in
6+
particular, often require implementation in C/C++ to achieve optimized
7+
performance. They also have Python interfaces, facilitating developers
8+
to integrate custom operators with existing machine learning workflows
9+
written in Python. This section will delve into the implementation
10+
details of this process.
11+
12+
The Python interpreter, being implemented in C, enables the invocation
13+
of C and C++ functions within Python. Contemporary machine learning
14+
frameworks such as TensorFlow, PyTorch, and MindSpore rely on pybind11
15+
to automatically generate Python functions from underlying C and C++
16+
functions. This mechanism is known as *Python binding*. Prior to the
17+
advent of pybind11, Python binding was accomplished using one of the
18+
following approaches:
19+
20+
1. **C-APIs in Python**: This approach necessitates the inclusion of
21+
`Python.h` in C++ programs and the utilization of Python's C-APIs to
22+
execute Python operations. To effectively work with C-APIs,
23+
developers must possess a comprehensive understanding of Python's
24+
internal implementation, such as managing reference counting.
25+
26+
2. **Simplified Wrapper and Interface Generator (SWIG)**: SWIG serves
27+
as a bridge between C/C++ code and Python, and it played a
28+
significant role in the initial development of TensorFlow. Utilizing
29+
SWIG involves crafting intricate interface statements and relying on
30+
SWIG to automatically generate C code that interfaces with Python's
31+
C-APIs. However, due to the lack of readability in the generated
32+
code, the maintenance costs associated with it tend to be high.
33+
34+
3. **Python `ctypes` module**: This module encompasses a comprehensive
35+
range of types found in the C language and allows direct invocation
36+
of dynamic link libraries (DLLs). However, a limitation of this
37+
module is its heavy reliance on native C types, which results in
38+
insufficient support for customized types.
39+
40+
4. **CPython**: In basic terms, CPython can be described as the fusion
41+
of Python syntax with static types from the C language. It
42+
facilitates the retention of Python's syntax while automatically
43+
translating CPython functions into C/C++ code. This functionality
44+
empowers developers to seamlessly incorporate invocations of C/C++
45+
functions within the CPython environment.
46+
47+
5. **Boost::Python (a C++ library)**: Boost::Python allows for the
48+
exposure of C++ functions as Python functions. It operates on
49+
similar principles to Python's C-APIs but provides a more
50+
user-friendly interface. However, the reliance on the Boost library
51+
introduces a significant dependency on third-party components, which
52+
can be a potential drawback for Boost::Python.
53+
54+
In comparison to the above Python binding approaches, pybind11 shares
55+
similarities with Boost::Python in terms of simplicity and usability.
56+
However, pybind11 stands out due to its focus on supporting C++ 11 and
57+
eliminating dependencies on Boost. As a lightweight Python library,
58+
pybind11 is particularly suitable for exposing numerous Python functions
59+
in complex C++ projects such as the machine learning system discussed in
60+
this book. The combination of Code
61+
`ch02/code2.5.1` and Code
62+
`ch02/code2.5.2` is an example of adding a custom operator to
63+
Pytorch with the integration of C++ and Python:\
64+
In C++:
65+
66+
**ch02/code2.5.1**
67+
```cpp
68+
//custom_add.cpp
69+
#include <torch/extension.h>
70+
#include <pybind11/pybind11.h>
71+
72+
torch::Tensor custom_add(torch::Tensor a, torch::Tensor b) {
73+
return a + b;
74+
}
75+
76+
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
77+
m.def("custom_add", &custom_add, "A custom add function");
78+
}
79+
```
80+
81+
In Python:
82+
83+
**ch02/code2.5.2**
84+
```python
85+
import torch
86+
from torch.utils.cpp_extension import load
87+
88+
# Load the C++ extension
89+
custom_extension = load(
90+
name='custom_extension',
91+
sources=['custom_add.cpp'],
92+
verbose=True
93+
)
94+
# Use your custom add function
95+
a = torch.randn(10)
96+
b = torch.randn(10)
97+
c = custom_extension.custom_add(a, b)
98+
```
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Functional Programming
2+
3+
In the following, we will discuss the reasons behind the growing trend
4+
of incorporating functional programming into the design of machine
5+
learning frameworks.
6+
7+
## Benefits of Functional Programming
8+
9+
Training constitutes the most critical phase in machine learning, and
10+
the manner in which training is depicted hinges significantly on
11+
optimizer algorithms. Predominantly, contemporary machine learning tasks
12+
utilize first-order optimizers, favored for their ease of use. With
13+
machine learning advancing at a rapid pace, both software and hardware
14+
are incessantly updated to stay abreast. Consequently, an increasing
15+
number of researchers are beginning to investigate higher-order
16+
optimizers, noted for their superior convergence performance. Frequently
17+
utilized second-order optimizers, such as the Newton method,
18+
quasi-Newton method, and AdaHessians, necessitate the computation of a
19+
Hessian matrix incorporating second-order derivative information. Two
20+
considerable challenges arise from this computation: 1) how to manage
21+
such a hefty computational load efficiently; 2) how to express
22+
higher-order derivatives in programmatic language.
23+
24+
In recent times, numerous large AI models have been introduced, which
25+
include (with the number of parameters noted in parentheses) OpenAI
26+
GPT-3 (175B) in 2020; PanGu (100B), PanGu-$\alpha$ (200B), Google's
27+
Switch Transformer (1.6T), and WuDao (1.75T) in 2021; along with
28+
Facebook's NLLB-200 (54B) in 2022. The demand for ultra-large model
29+
training is escalating, and data parallelism alone cannot meet this
30+
growing requirement. Conversely, model parallelism demands manual model
31+
segmentation, a process that is time-intensive and laborious.
32+
Consequently, the main challenge future machine learning frameworks must
33+
overcome is how to actualize automatic parallelism. At its core, a
34+
machine learning model is a representation of a mathematical model.
35+
Hence, the ability to succinctly represent machine learning models has
36+
risen to a key concern in the design of programming paradigms for
37+
machine learning frameworks.
38+
39+
Recognizing the challenges presented by the practical implementation of
40+
machine learning frameworks, researchers have identified that functional
41+
programming could offer beneficial solutions. Functional programming, in
42+
computer science, is a programming paradigm that envisions computation
43+
as the evaluation of mathematical functions, actively avoiding state
44+
changes and data mutations. This paradigm harmonizes well with
45+
mathematical reasoning. Neural networks are composed of interconnected
46+
nodes, with each node performing basic mathematical operations.
47+
Functional programming languages allow developers to portray these
48+
mathematical operations in a language that closely mirrors the
49+
operations, enhancing the readability and maintainability of programs.
50+
Concurrently, in functional languages, functions are kept separate,
51+
simplifying the management of concurrency and parallelism.
52+
53+
In summary, functional programming is anticipated to confer the
54+
following benefits to machine learning frameworks:
55+
56+
1. It is suited for machine learning scenarios where higher-order
57+
derivatives are needed.
58+
59+
2. It simplifies the development of parallel programming interfaces.
60+
61+
3. It results in a more concise code representation.
62+
63+
## Framework Support for Functional Programming
64+
65+
Machine learning frameworks have increasing support for functional
66+
programming. In 2018, Google rolled out JAX. Contrary to traditional
67+
machine learning frameworks, JAX amalgamates neural network computation
68+
and numerical computation. Its interfaces are compatible with native
69+
data science interfaces in Python, such as NumPy and SciPy. Moreover,
70+
JAX extends distribution, vectorization, high-order derivation, and
71+
hardware acceleration in a functional programming style, characterized
72+
by Lambda closure and no side effects.
73+
74+
In 2020, Huawei introduced MindSpore, the functional differential
75+
programming architecture of which allows users to concentrate on the
76+
native mathematical expressions of machine learning models. In 2022,
77+
taking inspiration from Google's JAX, PyTorch launched functorch.
78+
Functorch is essentially a library aimed at providing composable vmap
79+
(vectorization) and autodiff transforms compatible with PyTorch modules
80+
and PyTorch autograd, thereby achieving excellent eager-mode
81+
performance. It can be inferred that functorch meets the requirements
82+
for distributed parallelism in PyTorch static graphs. Code
83+
`ch02/code2.4` gives an example of functorch.
84+
85+
**ch02/code2.4**
86+
```
87+
from functorch import combine_state_for_ensemble, vmap
88+
minibatches = data[:num_models]
89+
models = [MLP().to(device) for _ in range(num_models)]
90+
fmodel, params, buffers = combine_state_for_ensemble(models)
91+
predictions1_vmap = vmap(fmodel, out_dims=1)(params, buffers, minibatches)
92+
```
93+
94+
Functorch introduces *vmap*, standing for \"vectorized map\". Its role
95+
is to adapt functions designed for individual inputs so that they can
96+
handle batches of inputs, therefore facilitating efficient vectorized
97+
calculations. Unlike the batch processing capabilities of standard
98+
PyTorch modules, vmap can convert any operation to be batch-aware
99+
without the need to alter the operation's original structure. Moreover,
100+
vmap offers greater flexibility to batch dimensions, allowing users to
101+
specify which dimension should be treated as the batch dimension
102+
(specifying the $out\_dim$ argument), a contrast to the default
103+
behaviour of the standard PyTorch where the first dimension is usually
104+
chosen as the batch dimension.
105+
106+
By tracing the development of machine learning frameworks, it becomes
107+
evident that the functional programming paradigm become increasingly
108+
popular. This can be attributed to functional programming's ability to
109+
express machine learning models intuitively and its convenience for
110+
implementing automatic differentiation, high-order derivation, and
111+
parallel execution. Consequently, future machine learning frameworks are
112+
likely to adopt layered frontend interfaces that are not exclusively
113+
designed for machine learning scenarios. Instead, they will primarily
114+
offer differential programming in their abstraction designs, making
115+
gradient-based software easy to be developed for various applications.

0 commit comments

Comments
 (0)