Skip to content

Commit da85682

Browse files
authored
docs(python): improve Python SDK documentation (#931)
* main readme * cli readme * after self-review
1 parent 16f0a56 commit da85682

2 files changed

Lines changed: 137 additions & 107 deletions

File tree

python/README.md

Lines changed: 76 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,135 @@
11
# GraphAr Python SDK
22

3-
GraphAr Python SDK provides Python bindings for the GraphAr C++ library, allowing users to work with GraphAr formatted graph data in Python environments. It includes both a high-level API for data manipulation and a command-line interface for common operations.
3+
The GraphAr Python SDK provides Python bindings for the GraphAr C++ library.
4+
It lets Python applications read GraphAr metadata, use the high-level graph APIs,
5+
and run the bundled `graphar` command-line tool.
46

5-
## Installation
7+
This package is separate from the PySpark package in [`../pyspark`](../pyspark).
68

7-
### Prerequisites
9+
## Requirements
810

911
- Python >= 3.9
10-
- pip (latest version recommended)
11-
- CMake >= 3.15 (for building from source)
12-
- Apache Arrow >= 12.0 (for building from source)
12+
- pip
13+
- CMake >= 3.15, Apache Arrow >= 12.0, and a C++ toolchain when building from source
1314

14-
### Install from Pypi
15-
Install the latest released version from PyPI:
15+
## Install
16+
17+
### From PyPI
1618

1719
```bash
1820
pip install -U graphar
1921
```
2022

21-
### Install from Source
22-
23-
Clone the repository and install the Python package:
23+
Verify the installation:
2424

2525
```bash
26-
git clone https://github.com/apache/incubator-graphar.git
27-
cd incubator-graphar
28-
pip install ./python
26+
python -c "import graphar; print(graphar.GraphInfo)"
27+
graphar --help
2928
```
3029

31-
For verbose output during installation:
30+
### From Source
31+
32+
Clone the repository, then from its root:
3233

3334
```bash
34-
pip install -v ./python
35+
pip install ./python
36+
37+
# for local development
38+
pip install -e ./python
3539
```
3640

37-
### Using Docker (Recommended)
41+
### Docker
3842

39-
The easiest way to get started is by using our pre-configured Docker environment:
43+
The project also publishes a development image:
4044

4145
```bash
4246
docker run -it ghcr.io/apache/graphar-dev
4347
```
4448

4549
## Quick Start
4650

47-
### Importing the Package
48-
49-
After installation, you can import the GraphAr Python SDK in your Python scripts:
51+
Load graph metadata from a GraphAr YAML file:
5052

5153
```python
5254
import graphar
55+
56+
graph_info = graphar.GraphInfo.load("path/to/graph.graph.yml")
57+
58+
print(graph_info.get_name())
59+
print(graph_info.get_vertex_info("person").get_type())
60+
print(graph_info.get_edge_info("person", "knows", "person").get_edge_type())
5361
```
5462

55-
### Basic Usage
63+
Replace `path/to/graph.graph.yml` with the path to a GraphAr graph metadata file.
5664

57-
Loading graph information:
65+
## Modules
5866

59-
```python
60-
import graphar
67+
The Python SDK exposes the core GraphAr functionality through these modules:
68+
69+
- [`graphar.graph_info`](src/graphar/graph_info.py): graph, vertex, edge, property, and metadata APIs.
70+
- [`graphar.high_level`](src/graphar/high_level.py): high-level vertex and edge collection APIs.
71+
- [`graphar.types`](src/graphar/types.py): GraphAr enum types used by metadata and high-level APIs.
72+
73+
## Examples
74+
75+
Example scripts are available in [`python/example`](example):
6176

62-
# Load graph info from a YAML file
63-
graph_info = graphar.graph_info.GraphInfo.load("path/to/graph.yaml")
77+
- [`graph_info_example.py`](example/graph_info_example.py) shows how to load graph metadata and inspect vertex and edge information.
78+
- [`high_level_example.py`](example/high_level_example.py) shows how to use the high-level vertex and edge collection APIs.
6479

65-
# Access vertex information
66-
vertex_info = graph_info.get_vertex_info("person")
67-
print(f"Vertex type: {vertex_info.get_type()}")
80+
The examples expect `GAR_TEST_DATA` to point to a directory that contains the
81+
`ldbc_sample/parquet/ldbc_sample.graph.yml` test graph:
6882

69-
# Access edge information
70-
edge_info = graph_info.get_edge_info("person", "knows", "person")
71-
print(f"Edge type: {edge_info.get_edge_type()}")
83+
```bash
84+
bash dev/download_test_data.sh
85+
export GAR_TEST_DATA=/tmp/graphar-testing
86+
python python/example/graph_info_example.py
87+
python python/example/high_level_example.py
7288
```
7389

7490
## Command-Line Interface
7591

76-
GraphAr Python SDK also provides a command-line interface for common operations such as checking metadata, showing graph information, and importing data.
92+
The package installs a `graphar` command-line tool:
93+
94+
```bash
95+
graphar --help
96+
graphar show --path path/to/graph.graph.yml
97+
graphar check --path path/to/graph.graph.yml
98+
```
7799

78-
For detailed information about the CLI functionality, please see [CLI Documentation](src/cli/README.md).
100+
See [`python/src/cli/README.md`](src/cli/README.md) for more CLI examples.
79101

80102
## API Documentation
81103

82-
### build docs
104+
Build the Python API documentation from the `python` directory:
105+
83106
```bash
84107
make install_docs
85108
make docs
86109
```
87110

88-
The Python SDK exposes the core GraphAr functionality through several modules:
111+
The generated documentation is written to `python/docs`.
89112

90-
- `graphar.graph_info`: Main API for working with graph, vertex, and edge information
91-
- `graphar.high_level`: High-level API for data reading and writing
113+
## Development
92114

93-
## Examples
94-
> [!NOTE]
95-
> under development.
115+
Install test dependencies from the repository root:
96116

97-
You can find various examples in the [examples directory](../cpp/examples/) which demonstrate usage of the underlying C++ library. These concepts translate directly to the Python SDK.
117+
```bash
118+
pip install -e "./python[test]"
119+
```
98120

99-
## Development
121+
Run the Python tests from the `python` directory:
122+
123+
```bash
124+
pytest
125+
```
126+
127+
Some tests require `GAR_TEST_DATA`; use [`dev/download_test_data.sh`](../dev/download_test_data.sh)
128+
if the test data is not available locally.
100129

101-
To contribute to the Python SDK, please follow the guidelines in the main [CONTRIBUTING.md](../CONTRIBUTING.md) file.
130+
For general contribution guidelines, see [`../CONTRIBUTING.md`](../CONTRIBUTING.md).
102131

103132
## License
104133

105-
**GraphAr** is distributed under [Apache License
106-
2.0](https://github.com/apache/incubator-graphar/blob/main/LICENSE).
107-
Please note that third-party libraries may not have the same license as
108-
GraphAr.
134+
GraphAr is distributed under the Apache License 2.0. See [`../LICENSE`](../LICENSE)
135+
and [`../NOTICE`](../NOTICE) for details.

python/src/cli/README.md

Lines changed: 61 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,96 +1,99 @@
11
# GraphAr Python CLI
22

3-
GraphAr python cli uses [pybind11][] and [scikit-build-core][] to bind C++ code into Python and build command line tools through Python. Command line tools developed using [typer][].
3+
The GraphAr Python package installs a `graphar` command-line tool for inspecting
4+
GraphAr metadata and importing data into GraphAr format.
45

5-
[pybind11]: https://pybind11.readthedocs.io
6-
[scikit-build-core]: https://scikit-build-core.readthedocs.io
7-
[typer]: https://typer.tiangolo.com/
8-
9-
## Requirements
10-
11-
- Linux (work fine on Ubuntu 22.04)
12-
- Cmake >= 3.15
13-
- Arrow >= 12.0
14-
- Python >= 3.7
15-
- pip == latest
6+
The CLI is implemented with [Typer][] and uses the same Python bindings as the
7+
[`graphar` Python package](../../README.md).
168

9+
[Typer]: https://typer.tiangolo.com/
1710

18-
The best testing environment is `ghcr.io/apache/graphar-dev` Docker environment.
11+
## Requirements
1912

20-
And using Python in conda or venv is a good choice.
13+
- Python >= 3.9
14+
- pip
15+
- CMake >= 3.15, Apache Arrow >= 12.0, and a C++ toolchain when building from source
2116

2217
## Installation
2318

24-
### Install from Pypi
2519
Install the latest released version from PyPI:
2620

2721
```bash
2822
pip install -U graphar
2923
```
3024

31-
### Install from Source
25+
Or install from the repository root:
3226

33-
- Clone this repository
34-
- `pip install ./python` or set verbose level `pip install -v ./python`
27+
```bash
28+
pip install ./python
29+
```
3530

36-
## Usage
31+
Verify the CLI is available:
3732

3833
```bash
3934
graphar --help
40-
41-
# check the metadata, verify whether the vertex edge information and attribute information of the graph are valid
42-
graphar check -p ../testing/neo4j/MovieGraph.graph.yml
43-
44-
# show the vertex
45-
graphar show -p ../testing/neo4j/MovieGraph.graph.yml -v Person
46-
47-
# show the edge
48-
graphar show -p ../testing/neo4j/MovieGraph.graph.yml -es Person -e ACTED_IN -ed Movie
49-
50-
# import graph data by using a config file
51-
graphar import -c ../testing/neo4j/data/import.mini.yml
5235
```
5336

54-
## Import config file
37+
## Usage
5538

56-
The config file supports `yaml` data type. We provide two reference templates for it: full and mini.
39+
Replace the paths below with paths to your GraphAr metadata or import config
40+
files.
5741

58-
The full version of the configuration file contains all configurable fields, and additional fields will be automatically ignored.
42+
```bash
43+
# Show all graph metadata.
44+
graphar show --path path/to/graph.graph.yml
5945

60-
The mini version of the configuration file is a simplified version of the full configuration file, retaining the same functionality. It shows the essential parts of the configuration information.
46+
# Validate graph metadata.
47+
graphar check --path path/to/graph.graph.yml
6148

62-
For the full configuration file, if all fields can be set to their default values, you can simplify it to the mini version. However, it cannot be further reduced beyond the mini version.
49+
# Show one vertex type.
50+
graphar show --path path/to/graph.graph.yml --vertex Person
6351

64-
In the full `yaml` config file, we provide brief comments on the fields, which can be used as a reference.
52+
# Show one edge type.
53+
graphar show \
54+
--path path/to/graph.graph.yml \
55+
--edge-src Person \
56+
--edge ACTED_IN \
57+
--edge-dst Movie
6558

66-
**Example**
59+
# Import data with a config file.
60+
graphar import --config path/to/import.yml
61+
```
6762

68-
To import the movie graph data from the `testing` directory, you first need to prepare data files. Supported file types include `csv`, `json`(as well as`jsonline`, but should have the `.json` extension), `parquet`, and `orc` files. Please ensure the correct file extensions are set in advance, or specify the `file_type` field in the source section of the configuration. The `file_type` field will ignore the file extension.
63+
Short options are also available:
6964

70-
Next, write a configuration file following the provided sample. Any empty fields in the `graphar` configuration will be filled with default values. In the `import_schema`, empty fields will use the global configuration values from `graphar`. If fields in `import_schema` are not empty, they will override the values from `graphar`.
65+
```bash
66+
graphar show -p path/to/graph.graph.yml -v Person
67+
graphar show -p path/to/graph.graph.yml -es Person -e ACTED_IN -ed Movie
68+
graphar import -c path/to/import.yml
69+
```
7170

72-
A few important notes:
71+
## Import Config
7372

74-
1. The sources list specifies configuration for the data source files. For `csv` files, you can set the `delimiter`. The format of the `json` file should be given in the format of `jsonline`.
73+
The import command reads a YAML config file. A config describes source files,
74+
GraphAr output settings, and how source columns map to vertex or edge
75+
properties.
7576

76-
2. The columns dictionary maps column names in the data source to node or edge properties. Keys represent column names in the data source, and values represent property names.
77+
Supported source file types are `csv`, `json`, `parquet`, and `orc`. JSON input
78+
uses JSON Lines format and should use the `.json` extension. You can override
79+
extension-based detection by setting `file_type` in the source config.
7780

78-
3. Currently, edge properties cannot have the same names as the edge endpoints' properties; doing so will raise an exception.
81+
Important fields:
7982

80-
4. The following table lists the default fields, more of which are included in the full configuration.
83+
1. `sources` describes the input files. CSV sources can set a `delimiter`.
84+
2. `columns` maps source column names to GraphAr property names.
85+
3. Edge property names must not duplicate endpoint property names.
86+
4. Empty fields in `import_schema` use values from the top-level `graphar`
87+
config. Explicit `import_schema` values override the top-level defaults.
8188

89+
Common defaults:
8290

8391
| Field | Default value |
84-
| ----------- | ----------- |
85-
| `graphar.vertex_chunk_size` | `100` |
86-
| `graphar.edge_chunk_size` | `1024` |
87-
| `graphar.file_type` | `parquet` |
88-
| `graphar.adj_list_type` | `ordered_by_source` |
89-
| `graphar.validate_level` | `weak` |
90-
| `graphar.version` | `gar/v1` |
91-
| `property.nullable` | `true` |
92-
93-
94-
95-
96-
Wish you a happy use!
92+
| --------------------------------- | -------------------- |
93+
| `graphar.vertex_chunk_size` | `100` |
94+
| `graphar.edge_chunk_size` | `1024` |
95+
| `graphar.file_type` | `parquet` |
96+
| `graphar.adj_list_type` | `ordered_by_source` |
97+
| `graphar.validate_level` | `weak` |
98+
| `graphar.version` | `gar/v1` |
99+
| `property.nullable` | `true` |

0 commit comments

Comments
 (0)