Skip to content

Commit 6b665db

Browse files
committed
some clarifications
1 parent f50af5e commit 6b665db

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

pyop2/gpu/TODO.org

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
* Limitations/TODOs
22
** Changes in TSFC so that PyOP2 could have a better understanding of the variable names
3-
- [[https://github.com/OP2/PyOP2/blob/630e55118013966e84dcc62328c45fc9061196e6/pyop2/gpu/tile.py#L65-L79][Currently]] variable names have been hard coded for CG type FE kernel on
4-
triangular meshes.
5-
- Once this has been done it would then be reasonable to tackle other elements
3+
- [[https://github.com/OP2/PyOP2/blob/f50af5e819e726b97b1997f00b1ad4f66b0b574b/pyop2/gpu/tile.py#L117][Currently]], we go through a phase of metadata inference assuming a homogeneity
4+
of kernel structure.
5+
- Once this has been done it would then be reasonable to tackle more elements
66

77
*** Information to be fed from TSFC
88
- [ ] variable name of the action input
@@ -38,7 +38,7 @@ we are going from GEM representation to loopy kernel.
3838

3939
** Global reduction kernels. For ex. ~assemble(dot(f,f)*dx)~
4040
- Currently all the threads write to a single memory location atomically,
41-
thereby losing concurrency.
41+
thereby losing some concurrency.
4242
- Possible solution:
4343
- Fix the block size, say 256.
4444
- Map single cell to single thread.
@@ -47,9 +47,8 @@ we are going from GEM representation to loopy kernel.
4747
- Finally another reduction across the newly created intermediary variable.
4848
- One starting step would be to map the '+=' to a loopy's sum node.
4949

50-
** Do we need atomic additions of the output DoFs for a DG kernel?
51-
52-
** Tiling transformation logic fails for low orders
50+
** Atomic scatter redutions for DG elements, necessary?
51+
** Inner loop parallelization logic fails for low orders
5352
- The received TSFC kernel has a slightly different representation at low orders
5453
like P_0, P_1, DG0, DG1, etc. because some loops are unrolled, causing to
5554
diverge from the "assumed" template of all the kernel's loop structures.

0 commit comments

Comments
 (0)