Commit 005c946
committed
Implement C Data integration
This starts work towards supporting teh C data interface for the arrow
format, as documented
[here](https://arrow.apache.org/docs/format/CDataInterface.html#).
Currently in this PR, it includes struct definitions and basic
methods to allow getting a pointer to an `ArrowSchema`/`ArrowArray`
C-compatible struct that can then be populated by another
implementation. For example, with this PR, you can do:
```julia
using Arrow, PyCall
pd = pyimport("pandas")
pa = pyimport("pyarrow")
df = pd.DataFrame(py"""{'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']}"""o)
rb = pa.record_batch(df)
sch = Arrow.CData.getschema() do ptr
rb.schema._export_to_c(Int(ptr))
end
arr = Arrow.CData.getarray() do ptr
rb._export_to_c(Int(ptr))
end
```
Currently, these `ArrowSchema`/`ArrowArray` structs are pretty bare
bones, but it at least lays some ground work for integration. Things we
still need/want to make all this nicer to use/work with:
* Type format string parsing/converting: we need to parse the type
format strings as outlined
[here](https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings)
to figure out what type of data we'll get in the arrays. It'd
probably be best to add a `type` field to the ArrowSchema struct that
we'd populate when converting from `CArrowSchema` -> `ArrowSchema`
* Add a method like `Arrow.ArrowVector(::ArrowSchema, ::ArrowArray)`
that produced a concrete `ArrowVector` subtype, like
`Arrow.Primitive`, `Arrow.List`, etc. This will be a bit tricky,
because have to follow all the same columnar layout trickery that we
currently handle for IPC in the table.jl `build` methods. Perhaps we
can refactor all that so we can re-use some code? Otherwise, we might
just need to reimplement a bunch of that logic specific to converting
`ArrrowArray`s.
* That should give a robust consuming story; for producing, we
probably need a definition like
`Arrow.ArrowSchema(a::Arrow.ArrowVector)` that produced a valid
`ArrowSchema`, and then overloads per `ArrowVector` subtype like
`Arrow.ArrowArray(x::Arrow.Primitive)` that produced the right
`ArrowArray` for a concrete arrow array
* Then the last piece we need is just figuring out the right mechanics
for providing a pointer to the `CArrowSchema`, `CArrowArray` structs
once they're populated
If anyone would like to help out, I'm happy to provide as much guidance
as possible so others can get their feet wet in some arrow spec
nitty-gritty.1 parent bdd0e54 commit 005c946
2 files changed
Lines changed: 166 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
0 commit comments