Skip to content

Commit 2faccd8

Browse files
[Feat][Spark] Implementation of PySpark bindings to Scala API (#300)
* Implementation draft & concept On branch 297-add-pyspark-bindings Changes to be committed: modified: .gitignore new file: pyspark/graphar_pysaprk/__init__.py new file: pyspark/graphar_pysaprk/enums.py new file: pyspark/graphar_pysaprk/graph.py new file: pyspark/graphar_pysaprk/info.py new file: pyspark/graphar_pysaprk/reader.py new file: pyspark/graphar_pysaprk/writer.py new file: pyspark/poetry.lock new file: pyspark/pyproject.toml * Part of tests & update branch On branch 297-add-pyspark-bindings Changes to be committed: renamed: pyspark/graphar_pysaprk/__init__.py -> pyspark/graphar_pyspark/__init__.py renamed: pyspark/graphar_pysaprk/enums.py -> pyspark/graphar_pyspark/enums.py renamed: pyspark/graphar_pysaprk/graph.py -> pyspark/graphar_pyspark/graph.py renamed: pyspark/graphar_pysaprk/info.py -> pyspark/graphar_pyspark/info.py renamed: pyspark/graphar_pysaprk/reader.py -> pyspark/graphar_pyspark/reader.py renamed: pyspark/graphar_pysaprk/writer.py -> pyspark/graphar_pyspark/writer.py modified: pyspark/poetry.lock modified: pyspark/pyproject.toml new file: pyspark/tests/__init__.py new file: pyspark/tests/conftest.py new file: pyspark/tests/test_enums.py new file: pyspark/tests/test_info.py * Update VertexInfo.load_vertex_info & test & fixes On branch 297-add-pyspark-bindings Changes to be committed: modified: .gitignore modified: pyspark/graphar_pyspark/__init__.py modified: pyspark/graphar_pyspark/info.py modified: pyspark/poetry.lock modified: pyspark/pyproject.toml modified: pyspark/tests/test_info.py * Push changes before pulling from upstream On branch 297-add-pyspark-bindings Changes to be committed: new file: pyspark/README.rst modified: pyspark/graphar_pyspark/info.py modified: pyspark/pyproject.toml modified: pyspark/tests/test_info.py * Tests + fixes + updates from comments - update pyproject.toml - fix a lot of things - some work based on comments - license header everywhere - minor changes On branch 297-add-pyspark-bindings Changes to be committed: new file: pyspark/Makefile modified: pyspark/graphar_pyspark/graph.py modified: pyspark/graphar_pyspark/info.py modified: pyspark/graphar_pyspark/reader.py modified: pyspark/poetry.lock modified: pyspark/pyproject.toml modified: pyspark/tests/__init__.py modified: pyspark/tests/conftest.py modified: pyspark/tests/test_enums.py modified: pyspark/tests/test_info.py new file: pyspark/tests/test_reader.py Changes not staged for commit: modified: spark/pom.xml modified: spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala modified: spark/src/main/scala/com/alibaba/graphar/GraphInfo.scala modified: spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala * Fix init for GraphArSession On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/__init__.py Changes not staged for commit: modified: spark/pom.xml modified: spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala modified: spark/src/main/scala/com/alibaba/graphar/GraphInfo.scala modified: spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala * Tests and fixes from comments On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/__init__.py modified: pyspark/graphar_pyspark/enums.py modified: pyspark/graphar_pyspark/graph.py modified: pyspark/graphar_pyspark/info.py modified: pyspark/graphar_pyspark/reader.py new file: pyspark/graphar_pyspark/util.py modified: pyspark/graphar_pyspark/writer.py modified: pyspark/tests/test_info.py modified: pyspark/tests/test_reader.py new file: pyspark/tests/test_writer.py * Make PR ready for review On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/util.py modified: pyspark/graphar_pyspark/writer.py modified: pyspark/tests/test_writer.py * Fixes from comments && docs On branch 297-add-pyspark-bindings Changes to be committed: modified: .gitignore modified: docs/Makefile modified: docs/index.rst new file: docs/pyspark/api/graphar_pyspark.rst new file: docs/pyspark/api/modules.rst new file: docs/pyspark/index.rst new file: docs/pyspark/pyspark-lib.rst modified: pyspark/Makefile modified: pyspark/graphar_pyspark/__init__.py modified: pyspark/graphar_pyspark/enums.py new file: pyspark/graphar_pyspark/errors.py modified: pyspark/graphar_pyspark/graph.py modified: pyspark/graphar_pyspark/info.py modified: pyspark/graphar_pyspark/reader.py modified: pyspark/graphar_pyspark/util.py modified: pyspark/graphar_pyspark/writer.py modified: pyspark/poetry.lock modified: pyspark/pyproject.toml modified: pyspark/tests/__init__.py modified: pyspark/tests/conftest.py modified: pyspark/tests/test_enums.py modified: pyspark/tests/test_info.py modified: pyspark/tests/test_reader.py modified: pyspark/tests/test_writer.py * Add license-header to pyspark/README + add poetry-lock file to .licenserc ignore section On branch 297-add-pyspark-bindings Changes to be committed: modified: .licenserc.yaml new file: pyspark/README.md deleted: pyspark/README.rst modified: pyspark/pyproject.toml * Update tests && small fixes - new tests - improved coverage - updated Makefile for Python project - updated pyproject.toml On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/Makefile modified: pyspark/graphar_pyspark/info.py modified: pyspark/graphar_pyspark/writer.py modified: pyspark/pyproject.toml modified: pyspark/tests/test_info.py modified: pyspark/tests/test_writer.py * Drop outdated comment and TODO On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/writer.py * Fix broken commit On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/tests/test_writer.py * Tests coverage 95% && docstrings && linting pass - ruff passed - coverage 95%+ - docstrings for all the public API On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/__init__.py modified: pyspark/graphar_pyspark/enums.py modified: pyspark/graphar_pyspark/errors.py modified: pyspark/graphar_pyspark/graph.py modified: pyspark/graphar_pyspark/info.py modified: pyspark/graphar_pyspark/reader.py modified: pyspark/graphar_pyspark/util.py modified: pyspark/graphar_pyspark/writer.py modified: pyspark/poetry.lock modified: pyspark/pyproject.toml modified: pyspark/tests/test_reader.py new file: pyspark/tests/test_transform.py modified: pyspark/tests/test_writer.py * Ci & docs On branch 297-add-pyspark-bindings Changes to be committed: new file: .github/workflows/pyspark.yml new file: docs/pyspark/how-to.rst modified: docs/pyspark/index.rst * Update branch && update README On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/README.md * Fixes from comments On branch 297-add-pyspark-bindings Changes to be committed: modified: .github/workflows/pyspark.yml modified: pyspark/README.md * Fix linter errors On branch 297-add-pyspark-bindings Changes to be committed: modified: pyspark/graphar_pyspark/info.py
1 parent 955b3a5 commit 2faccd8

29 files changed

Lines changed: 6326 additions & 1 deletion

.github/workflows/pyspark.yml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Copyright 2022-2023 Alibaba Group Holding Limited.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
name: GraphAr PySpark CI
16+
17+
on:
18+
# Trigger the workflow on push or pull request,
19+
# but only for the main branch
20+
push:
21+
branches:
22+
- main
23+
pull_request:
24+
branches:
25+
- main
26+
27+
concurrency:
28+
group: ${{ github.repository }}-${{ github.event.number || github.head_ref || github.sha }}-${{ github.workflow }}
29+
cancel-in-progress: true
30+
31+
jobs:
32+
GraphAr-spark:
33+
runs-on: ubuntu-20.04
34+
steps:
35+
- uses: actions/checkout@v3
36+
with:
37+
submodules: true
38+
39+
- name: Install Python
40+
uses: actions/setup-python@v4
41+
with:
42+
python-version: 3.9
43+
44+
- name: Install Poetry
45+
uses: abatilo/actions-poetry@v2
46+
47+
- name: Install Spark Scala && PySpark
48+
run: |
49+
cd pyspark
50+
make install_test
51+
52+
- name: Run PyTest
53+
run: |
54+
cd pyspark
55+
make test
56+
57+
- name: Lint
58+
run: |
59+
cd pyspark
60+
make install_lint
61+
make lint
62+

.gitignore

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,66 @@
66
.ccls-cache
77

88
compile_commands.json
9+
10+
### Python ###
11+
# Byte-compiled / optimized / DLL files
12+
__pycache__/
13+
*.py[cod]
14+
*$py.class
15+
16+
# Distribution / packaging
17+
.Python
18+
build/
19+
develop-eggs/
20+
dist/
21+
downloads/
22+
eggs/
23+
.eggs/
24+
lib/
25+
lib64/
26+
parts/
27+
sdist/
28+
var/
29+
wheels/
30+
share/python-wheels/
31+
*.egg-info/
32+
.installed.cfg
33+
*.egg
34+
MANIFEST
35+
36+
# Unit test / coverage reports
37+
htmlcov/
38+
.tox/
39+
.nox/
40+
.coverage
41+
.coverage.*
42+
.cache
43+
nosetests.xml
44+
coverage.xml
45+
*.cover
46+
*.py,cover
47+
.hypothesis/
48+
.pytest_cache/
49+
cover/
50+
pyspark/assets
51+
52+
# Jupyter Notebook
53+
.ipynb_checkpoints
54+
*.ipynb
55+
56+
57+
# Environments
58+
.env
59+
.venv
60+
env/
61+
venv/
62+
ENV/
63+
env.bak/
64+
venv.bak/
65+
66+
# Ruff
67+
.ruff_cache
68+
69+
### Scala ###
70+
*.bloop
71+
*.metals

.licenserc.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,12 @@ header:
4040
- '*.md'
4141
- '*.rst'
4242
- '**/*.json'
43+
- 'pyspark/poetry.lock' # This file is generated automatically by Poetry-tool; there is no way to add license header
4344

4445
comment: on-failure
4546

4647
# If you don't want to check dependencies' license compatibility, remove the following part
4748
dependency:
4849
files:
4950
- spark/pom.xml # If this is a maven project.
50-
- java/pom.xml # If this is a maven project.
51+
- java/pom.xml # If this is a maven project.

docs/Makefile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,25 @@ html: cpp-apidoc spark-apidoc
5353
--quiet
5454
@echo
5555
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
56+
57+
.PHONY: pyspark-apidoc
58+
pyspark-apidoc:
59+
cd $(ROOTDIR)/pyspark && \
60+
poetry run sphinx-apidoc -o $(ROOTDIR)/docs/pyspark/api graphar_pyspark/
61+
62+
.PHONY: html-poetry
63+
html-poetry:
64+
cd $(ROOTDIR)/pyspark && \
65+
poetry run bash -c "cd $(ROOTDIR)/docs && $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html"
66+
rm -fr $(BUILDDIR)/html/spark/reference
67+
cp -fr $(ROOTDIR)/spark/target/site/scaladocs $(BUILDDIR)/html/spark/reference/
68+
cd $(ROOTDIR)/java && \
69+
mvn -P javadoc javadoc:aggregate \
70+
-Dmaven.antrun.skip=true \
71+
-DskipTests \
72+
-Djavadoc.output.directory=$(ROOTDIR)/docs/$(BUILDDIR)/html/java/ \
73+
-Djavadoc.output.destDir=reference \
74+
--quiet
75+
@echo
76+
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
77+

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
C++ <cpp/index>
2222
Java <java/index>
2323
Spark <spark/index>
24+
PySpark <pyspark/index>
2425

2526
.. toctree::
2627
:maxdepth: 2
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
graphar\_pyspark package
2+
========================
3+
4+
Submodules
5+
----------
6+
7+
graphar\_pyspark.enums module
8+
-----------------------------
9+
10+
.. automodule:: graphar_pyspark.enums
11+
:members:
12+
:undoc-members:
13+
:show-inheritance:
14+
15+
graphar\_pyspark.errors module
16+
------------------------------
17+
18+
.. automodule:: graphar_pyspark.errors
19+
:members:
20+
:undoc-members:
21+
:show-inheritance:
22+
23+
graphar\_pyspark.graph module
24+
-----------------------------
25+
26+
.. automodule:: graphar_pyspark.graph
27+
:members:
28+
:undoc-members:
29+
:show-inheritance:
30+
31+
graphar\_pyspark.info module
32+
----------------------------
33+
34+
.. automodule:: graphar_pyspark.info
35+
:members:
36+
:undoc-members:
37+
:show-inheritance:
38+
39+
graphar\_pyspark.reader module
40+
------------------------------
41+
42+
.. automodule:: graphar_pyspark.reader
43+
:members:
44+
:undoc-members:
45+
:show-inheritance:
46+
47+
graphar\_pyspark.util module
48+
----------------------------
49+
50+
.. automodule:: graphar_pyspark.util
51+
:members:
52+
:undoc-members:
53+
:show-inheritance:
54+
55+
graphar\_pyspark.writer module
56+
------------------------------
57+
58+
.. automodule:: graphar_pyspark.writer
59+
:members:
60+
:undoc-members:
61+
:show-inheritance:
62+
63+
Module contents
64+
---------------
65+
66+
.. automodule:: graphar_pyspark
67+
:members:
68+
:undoc-members:
69+
:show-inheritance:

docs/pyspark/api/modules.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
graphar_pyspark
2+
===============
3+
4+
.. toctree::
5+
:maxdepth: 4
6+
7+
graphar_pyspark

0 commit comments

Comments
 (0)