|
1 | 1 | # GraphAr Python CLI |
2 | 2 |
|
3 | | -GraphAr python cli uses [pybind11][] and [scikit-build-core][] to bind C++ code into Python and build command line tools through Python. Command line tools developed using [typer][]. |
| 3 | +The GraphAr Python package installs a `graphar` command-line tool for inspecting |
| 4 | +GraphAr metadata and importing data into GraphAr format. |
4 | 5 |
|
5 | | -[pybind11]: https://pybind11.readthedocs.io |
6 | | -[scikit-build-core]: https://scikit-build-core.readthedocs.io |
7 | | -[typer]: https://typer.tiangolo.com/ |
8 | | - |
9 | | -## Requirements |
10 | | - |
11 | | -- Linux (work fine on Ubuntu 22.04) |
12 | | -- Cmake >= 3.15 |
13 | | -- Arrow >= 12.0 |
14 | | -- Python >= 3.7 |
15 | | -- pip == latest |
| 6 | +The CLI is implemented with [Typer][] and uses the same Python bindings as the |
| 7 | +[`graphar` Python package](../../README.md). |
16 | 8 |
|
| 9 | +[Typer]: https://typer.tiangolo.com/ |
17 | 10 |
|
18 | | -The best testing environment is `ghcr.io/apache/graphar-dev` Docker environment. |
| 11 | +## Requirements |
19 | 12 |
|
20 | | -And using Python in conda or venv is a good choice. |
| 13 | +- Python >= 3.9 |
| 14 | +- pip |
| 15 | +- CMake >= 3.15, Apache Arrow >= 12.0, and a C++ toolchain when building from source |
21 | 16 |
|
22 | 17 | ## Installation |
23 | 18 |
|
24 | | -### Install from Pypi |
25 | 19 | Install the latest released version from PyPI: |
26 | 20 |
|
27 | 21 | ```bash |
28 | 22 | pip install -U graphar |
29 | 23 | ``` |
30 | 24 |
|
31 | | -### Install from Source |
| 25 | +Or install from the repository root: |
32 | 26 |
|
33 | | -- Clone this repository |
34 | | -- `pip install ./python` or set verbose level `pip install -v ./python` |
| 27 | +```bash |
| 28 | +pip install ./python |
| 29 | +``` |
35 | 30 |
|
36 | | -## Usage |
| 31 | +Verify the CLI is available: |
37 | 32 |
|
38 | 33 | ```bash |
39 | 34 | graphar --help |
40 | | - |
41 | | -# check the metadata, verify whether the vertex edge information and attribute information of the graph are valid |
42 | | -graphar check -p ../testing/neo4j/MovieGraph.graph.yml |
43 | | - |
44 | | -# show the vertex |
45 | | -graphar show -p ../testing/neo4j/MovieGraph.graph.yml -v Person |
46 | | - |
47 | | -# show the edge |
48 | | -graphar show -p ../testing/neo4j/MovieGraph.graph.yml -es Person -e ACTED_IN -ed Movie |
49 | | - |
50 | | -# import graph data by using a config file |
51 | | -graphar import -c ../testing/neo4j/data/import.mini.yml |
52 | 35 | ``` |
53 | 36 |
|
54 | | -## Import config file |
| 37 | +## Usage |
55 | 38 |
|
56 | | -The config file supports `yaml` data type. We provide two reference templates for it: full and mini. |
| 39 | +Replace the paths below with paths to your GraphAr metadata or import config |
| 40 | +files. |
57 | 41 |
|
58 | | -The full version of the configuration file contains all configurable fields, and additional fields will be automatically ignored. |
| 42 | +```bash |
| 43 | +# Show all graph metadata. |
| 44 | +graphar show --path path/to/graph.graph.yml |
59 | 45 |
|
60 | | -The mini version of the configuration file is a simplified version of the full configuration file, retaining the same functionality. It shows the essential parts of the configuration information. |
| 46 | +# Validate graph metadata. |
| 47 | +graphar check --path path/to/graph.graph.yml |
61 | 48 |
|
62 | | -For the full configuration file, if all fields can be set to their default values, you can simplify it to the mini version. However, it cannot be further reduced beyond the mini version. |
| 49 | +# Show one vertex type. |
| 50 | +graphar show --path path/to/graph.graph.yml --vertex Person |
63 | 51 |
|
64 | | -In the full `yaml` config file, we provide brief comments on the fields, which can be used as a reference. |
| 52 | +# Show one edge type. |
| 53 | +graphar show \ |
| 54 | + --path path/to/graph.graph.yml \ |
| 55 | + --edge-src Person \ |
| 56 | + --edge ACTED_IN \ |
| 57 | + --edge-dst Movie |
65 | 58 |
|
66 | | -**Example** |
| 59 | +# Import data with a config file. |
| 60 | +graphar import --config path/to/import.yml |
| 61 | +``` |
67 | 62 |
|
68 | | -To import the movie graph data from the `testing` directory, you first need to prepare data files. Supported file types include `csv`, `json`(as well as`jsonline`, but should have the `.json` extension), `parquet`, and `orc` files. Please ensure the correct file extensions are set in advance, or specify the `file_type` field in the source section of the configuration. The `file_type` field will ignore the file extension. |
| 63 | +Short options are also available: |
69 | 64 |
|
70 | | -Next, write a configuration file following the provided sample. Any empty fields in the `graphar` configuration will be filled with default values. In the `import_schema`, empty fields will use the global configuration values from `graphar`. If fields in `import_schema` are not empty, they will override the values from `graphar`. |
| 65 | +```bash |
| 66 | +graphar show -p path/to/graph.graph.yml -v Person |
| 67 | +graphar show -p path/to/graph.graph.yml -es Person -e ACTED_IN -ed Movie |
| 68 | +graphar import -c path/to/import.yml |
| 69 | +``` |
71 | 70 |
|
72 | | -A few important notes: |
| 71 | +## Import Config |
73 | 72 |
|
74 | | -1. The sources list specifies configuration for the data source files. For `csv` files, you can set the `delimiter`. The format of the `json` file should be given in the format of `jsonline`. |
| 73 | +The import command reads a YAML config file. A config describes source files, |
| 74 | +GraphAr output settings, and how source columns map to vertex or edge |
| 75 | +properties. |
75 | 76 |
|
76 | | -2. The columns dictionary maps column names in the data source to node or edge properties. Keys represent column names in the data source, and values represent property names. |
| 77 | +Supported source file types are `csv`, `json`, `parquet`, and `orc`. JSON input |
| 78 | +uses JSON Lines format and should use the `.json` extension. You can override |
| 79 | +extension-based detection by setting `file_type` in the source config. |
77 | 80 |
|
78 | | -3. Currently, edge properties cannot have the same names as the edge endpoints' properties; doing so will raise an exception. |
| 81 | +Important fields: |
79 | 82 |
|
80 | | -4. The following table lists the default fields, more of which are included in the full configuration. |
| 83 | +1. `sources` describes the input files. CSV sources can set a `delimiter`. |
| 84 | +2. `columns` maps source column names to GraphAr property names. |
| 85 | +3. Edge property names must not duplicate endpoint property names. |
| 86 | +4. Empty fields in `import_schema` use values from the top-level `graphar` |
| 87 | + config. Explicit `import_schema` values override the top-level defaults. |
81 | 88 |
|
| 89 | +Common defaults: |
82 | 90 |
|
83 | 91 | | Field | Default value | |
84 | | -| ----------- | ----------- | |
85 | | -| `graphar.vertex_chunk_size` | `100` | |
86 | | -| `graphar.edge_chunk_size` | `1024` | |
87 | | -| `graphar.file_type` | `parquet` | |
88 | | -| `graphar.adj_list_type` | `ordered_by_source` | |
89 | | -| `graphar.validate_level` | `weak` | |
90 | | -| `graphar.version` | `gar/v1` | |
91 | | -| `property.nullable` | `true` | |
92 | | - |
93 | | - |
94 | | - |
95 | | - |
96 | | -Wish you a happy use! |
| 92 | +| --------------------------------- | -------------------- | |
| 93 | +| `graphar.vertex_chunk_size` | `100` | |
| 94 | +| `graphar.edge_chunk_size` | `1024` | |
| 95 | +| `graphar.file_type` | `parquet` | |
| 96 | +| `graphar.adj_list_type` | `ordered_by_source` | |
| 97 | +| `graphar.validate_level` | `weak` | |
| 98 | +| `graphar.version` | `gar/v1` | |
| 99 | +| `property.nullable` | `true` | |
0 commit comments