You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
** Changes in TSFC so that PyOP2 could have a better understanding of the variable names
3
+
- [[https://github.com/OP2/PyOP2/blob/630e55118013966e84dcc62328c45fc9061196e6/pyop2/gpu/tile.py#L65-L79][Currently]] variable names have been hard coded for CG type FE kernel on
4
+
triangular meshes.
5
+
- Once this has been done it would then be reasonable to tackle other elements
6
+
7
+
*** Information to be fed from TSFC
8
+
- [ ] variable name of the action input
9
+
- [ ] variable name of the action output
10
+
- [ ] variable name of mesh coordinates
11
+
- [ ] variable name of quadrature weights
12
+
- [ ] quadrature iname
13
+
- [ ] DOF iname(s)
14
+
- [ ] tagging instructions responsible for computing the Jacobian
15
+
- [ ] tagging the stages(init, update, assign) for each of the two sum
16
+
reductions in the TSFC kernel
17
+
18
+
One way to solve this is tagging these names into loopy kernels from TSFC while
19
+
we are going from GEM representation to loopy kernel.
20
+
21
+
** Adding support for explicit matrix assembly
22
+
*** Proposed path
23
+
- The pyop2 configuration should have a configuration parameter ~backend~ which
24
+
would be one of ~"cpu", "gpu.cuda", "gpu.opencl"~
25
+
- And based on the "backend" parameter the appropriate instance of ~Dat, Mat, Map, ...~
26
+
should be init-ed at runtime.
27
+
28
+
*** Obstacles
29
+
- [[https://github.com/OP2/PyOP2/blob/8e1c5720fe0a8f7b4e870a49c43608d97c66ad14/pyop2/op2.py#L45-L49][Current in PyOP2]], backend selection happens only once which would be incorrect
30
+
for ex. when we are running the matrix-free kernel ~op2.Map~ should stay in
31
+
the device's address space while during explicit assembly it should be a part
32
+
of host's address space.(similarly the kernel execution in matrix free
33
+
happens on device which is not the case for explicit assembly)
34
+
- Transformation strategy selection, sufficient?
35
+
- This might lead to some refactoring in ~firedrake~, especially where the
36
+
objects are instantiated.
37
+
- Backend switching would be a bit tricky for subclasses like [[https://github.com/firedrakeproject/firedrake/blob/3498fdf3e33721adda448755addc11c20bef75a9/firedrake/preconditioners/patch.py#L77][here.]]
38
+
39
+
** Global reduction kernels. For ex. ~assemble(dot(f,f)*dx)~
40
+
- Currently all the threads write to a single memory location atomically,
41
+
thereby losing concurrency.
42
+
- Possible solution:
43
+
- Fix the block size, say 256.
44
+
- Map single cell to single thread.
45
+
- Reduce across threads and get the result for each block
46
+
- Write the solution of each group to a global intermediary variable.
47
+
- Finally another reduction across the newly created intermediary variable.
48
+
- One starting step would be to map the '+=' to a loopy's sum node.
49
+
50
+
** Do we need atomic additions of the output DoFs for a DG kernel?
51
+
52
+
** Tiling transformation logic fails for low orders
53
+
- The TSFC kernel receive has a slightly different representation at low orders
54
+
like P_0, P_1, DG0, DG1, etc. because some loops are unrolled, causing to
55
+
diverge from the "assumed" template of all the kernel's loop structures.
0 commit comments