Skip to content

Commit c188cd7

Browse files
authored
docs: update readme (#76)
* feat: readme for example module * feat: table of contents * feat: update readme * refactor: example reademe * refactor: readme * chore: code format
1 parent 6b642e2 commit c188cd7

2 files changed

Lines changed: 412 additions & 20 deletions

File tree

README.md

Lines changed: 377 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,385 @@
44
![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)
55
[![Maven Central](https://img.shields.io/maven-central/v/io.greptime/greptimedb-ingester.svg?label=maven%20central)](https://central.sonatype.com/search?q=io.greptime&name=ingester-all)
66

7-
A lightweight Java ingester for GreptimeDB designed for high-throughput data writing. It utilizes the gRPC protocol to provide a non-blocking, purely asynchronous API that is easy to use and integrate into your applications.
7+
The GreptimeDB Ingester for Java is a lightweight, high-performance client designed for efficient time-series data ingestion. It leverages the gRPC protocol to provide a non-blocking, purely asynchronous API that delivers exceptional throughput while maintaining seamless integration with your applications.
8+
9+
This client offers multiple ingestion methods optimized for various performance requirements and use cases. You can select the approach that best suits your specific needs—whether you require simple unary writes for low-latency operations or high-throughput bulk streaming for maximum efficiency when handling large volumes of time-series data.
810

911
## Documentation
1012
- [API Reference](https://javadoc.io/doc/io.greptime/ingester-protocol/latest/index.html)
1113
- [Examples](https://github.com/GreptimeTeam/greptimedb-ingester-java/tree/main/ingester-example)
1214

13-
# Features
14-
TODO
15+
## Features
16+
- Writing data using
17+
- [Regular Write API](https://github.com/GreptimeTeam/greptimedb-ingester-java/tree/main/ingester-example#regular-write-api)
18+
- [Batching Write](https://github.com/GreptimeTeam/greptimedb-ingester-java/tree/main/ingester-example#batching-write)
19+
- [Streaming Write](https://github.com/GreptimeTeam/greptimedb-ingester-java/tree/main/ingester-example#streaming-write)
20+
- [Bulk Write API](https://github.com/GreptimeTeam/greptimedb-ingester-java/tree/main/ingester-example#bulk-write-api)
21+
- Management API client for managing
22+
- Health check
23+
- Authorizations
24+
25+
## High Level Architecture
26+
27+
```
28+
+-----------------------------------+
29+
| Client Applications |
30+
| +------------------+ |
31+
| | Application Code | |
32+
| +------------------+ |
33+
+-------------+---------------------+
34+
|
35+
v
36+
+-------------+---------------------+
37+
| API Layer |
38+
| +---------------+ |
39+
| | GreptimeDB | |
40+
| +---------------+ |
41+
| / \ |
42+
| v v |
43+
| +-------------+ +-------------+ | +------------------+
44+
| | BulkWrite | | Write | | | Data Model |
45+
| | Interface | | Interface | |------->| |
46+
| +-------------+ +-------------+ | | +------------+ |
47+
+-------|----------------|----------+ | | Table | |
48+
| | | +------------+ |
49+
v v | | |
50+
+-------|----------------|----------+ | v |
51+
| Transport Layer | | +------------+ |
52+
| +-------------+ +-------------+ | | | TableSchema| |
53+
| | BulkWrite | | Write | | | +------------+ |
54+
| | Client | | Client | | +------------------+
55+
| +-------------+ +-------------+ |
56+
| | \ / | |
57+
| | \ / | |
58+
| | v v | |
59+
| | +-------------+ | |
60+
| | |RouterClient | | |
61+
+-----|--+-------------|---+--------+
62+
| | | |
63+
| | | |
64+
v v v |
65+
+-----|----------------|---|--------+
66+
| Network Layer |
67+
| +-------------+ +-------------+ |
68+
| | Arrow Flight| | gRPC Client | |
69+
| | Client | | | |
70+
| +-------------+ +-------------+ |
71+
| | | |
72+
+-----|----------------|------------+
73+
| |
74+
v v
75+
+-------------------------+
76+
| GreptimeDB Server |
77+
+-------------------------+
78+
```
79+
80+
- **API Layer**: Provides high-level interfaces for client applications to interact with GreptimeDB
81+
- **Data Model**: Defines the structure and organization of time series data with tables and schemas
82+
- **Transport Layer**: Handles communication logistics, request routing, and client management
83+
- **Network Layer**: Manages low-level protocol communications using Arrow Flight and gRPC
84+
85+
## How To Use
86+
87+
- [Installation](#installation)
88+
- [Client Initialization](#client-initialization)
89+
- [Writing Data](#writing-data)
90+
- [Creating and Writing Tables](#creating-and-writing-tables)
91+
- [TableSchema](#tableschema)
92+
- [Column Types](#column-types)
93+
- [Table](#table)
94+
- [Write Operations](#write-operations)
95+
- [Streaming Write](#streaming-write)
96+
- [Bulk Write](#bulk-write)
97+
- [Configuration](#configuration)
98+
- [Resource Management](#resource-management)
99+
- [Performance Tuning](#performance-tuning)
100+
- [Compression Options](#compression-options)
101+
- [Write Operation Comparison](#write-operation-comparison)
102+
- [Buffer Size Optimization](#buffer-size-optimization)
103+
104+
### Installation
105+
106+
GreptimeDB Java Ingester is hosted in the Maven Central Repository.
107+
108+
To use it with Maven, simply add the following dependency to your project:
109+
110+
```XML
111+
<dependency>
112+
<groupId>io.greptime</groupId>
113+
<artifactId>ingester-all</artifactId>
114+
<version>${latest_version}</version>
115+
</dependency>
116+
```
117+
118+
The latest version can be viewed [here](https://central.sonatype.com/search?q=io.greptime&name=ingester-all).
119+
120+
### Client Initialization
121+
122+
The entry point to the GreptimeDB Ingester Java client is the `GreptimeDB` class. You create a client instance by calling the static create method with appropriate configuration options.
123+
124+
```java
125+
// GreptimeDB has a default database named "public" in the default catalog "greptime",
126+
// we can use it as the test database
127+
String database = "public";
128+
// By default, GreptimeDB listens on port 4001 using the gRPC protocol.
129+
// We can provide multiple endpoints that point to the same GreptimeDB cluster.
130+
// The client will make calls to these endpoints based on a load balancing strategy.
131+
// The client performs regular health checks and automatically routes requests to healthy nodes,
132+
// providing fault tolerance and improved reliability for your application.
133+
String[] endpoints = {"127.0.0.1:4001"};
134+
// Sets authentication information.
135+
AuthInfo authInfo = new AuthInfo("username", "password");
136+
GreptimeOptions opts = GreptimeOptions.newBuilder(endpoints, database)
137+
// If the database does not require authentication, we can use `AuthInfo.noAuthorization()` as the parameter.
138+
.authInfo(authInfo)
139+
// Enable secure connection if your server is secured by TLS
140+
//.tlsOptions(new TlsOptions())
141+
// A good start ^_^
142+
.build();
143+
144+
// Initialize the client
145+
GreptimeDB client = GreptimeDB.create(opts);
146+
```
147+
148+
### Writing Data
149+
150+
The ingester provides a unified approach for writing data to GreptimeDB through the `Table` abstraction. All data writing operations, including high-level APIs, are built on top of this fundamental structure. To write data, you create a `Table` with your time series data and write it to the database.
151+
152+
#### Creating and Writing Tables
153+
154+
Define a table schema and create a table:
155+
156+
```java
157+
// Create a table schema
158+
TableSchema schema = TableSchema.newBuilder("metrics")
159+
.addTag("host", DataType.String)
160+
.addTag("region", DataType.String)
161+
.addField("cpu_util", DataType.Float64)
162+
.addField("memory_util", DataType.Float64)
163+
.addTimestamp("ts", DataType.TimestampMillisecond)
164+
.build();
165+
166+
// Create a table from the schema
167+
Table table = Table.from(schema);
168+
169+
// Add rows to the table
170+
// The values must be provided in the same order as defined in the schema
171+
// In this case: addRow(host, region, cpu_util, memory_util, ts)
172+
table.addRow("host1", "us-west-1", 0.42, 0.78, System.currentTimeMillis());
173+
table.addRow("host2", "us-west-2", 0.46, 0.66, System.currentTimeMillis());
174+
// Add more rows
175+
// ..
176+
177+
// Complete the table to make it immutable. This finalizes the table for writing.
178+
// If users forget to call this method, it will automatically be called internally
179+
// before the table data is written.
180+
table.complete();
181+
182+
// Write the table to the database
183+
CompletableFuture<Result<WriteOk, Err>> future = client.write(table);
184+
```
185+
186+
##### TableSchema
187+
188+
The `TableSchema` defines the structure for writing data to GreptimeDB. It includes information about column names, semantic types, and data types.
189+
190+
##### Column Types
191+
192+
In GreptimeDB, columns are categorized into three semantic types:
193+
194+
- **Tag**: Columns used for filtering and grouping data
195+
- **Field**: Columns that store the actual measurement values
196+
- **Timestamp**: A special column that represents the time dimension
197+
198+
The timestamp column typically represents when data was sampled or when logs/events occurred. GreptimeDB identifies this column using a TIME INDEX constraint, which is why it's often referred to as the TIME INDEX column. If your schema contains multiple timestamp-type columns, only one can be designated as the TIME INDEX, while others must be defined as Field columns.
199+
200+
##### Table
201+
202+
The `Table` interface represents data that can be written to GreptimeDB. It provides methods for adding rows and manipulating the data. Essentially, `Table` temporarily stores data in memory, allowing you to accumulate multiple rows for batch processing before sending them to the database, which significantly improves write efficiency compared to writing individual rows.
203+
204+
A table goes through several distinct lifecycle stages:
205+
206+
1. **Creation**: Initialize a table from a schema using `Table.from(schema)`
207+
2. **Data Addition**: Populate the table with rows using `addRow()` method
208+
3. **Completion**: Finalize the table with `complete()` when all rows have been added
209+
4. **Writing**: Send the completed table to the database
210+
211+
Important considerations:
212+
- Tables are not thread-safe and should be accessed from a single thread
213+
- Tables cannot be reused after writing - create a new instance for each write operation
214+
- The associated `TableSchema` is immutable and can be safely reused across multiple operations
215+
216+
### Write Operations
217+
218+
You can also provide a custom context for more control:
219+
220+
```java
221+
Context ctx = Context.newDefault();
222+
// Add a hint to make the database create a table with the specified TTL (time-to-live)
223+
ctx = ctx.withHint("ttl", "3d");
224+
// Set the compression algorithm to Zstd.
225+
ctx = ctx.withCompression(Compression.Zstd)
226+
// Use the ctx when writing data to GreptimeDB
227+
CompletableFuture<Result<WriteOk, Err>> future = client.write(Arrays.asList(table1, table2), WriteOp.Insert, ctx);
228+
```
229+
230+
### Streaming Write
231+
232+
The streaming write API establishes a persistent connection to GreptimeDB, enabling continuous data ingestion over time with built-in rate limiting. This approach provides a convenient way to write data from multiple tables through a single stream, prioritizing ease of use and consistent throughput.
233+
234+
```java
235+
// Create a stream writer
236+
StreamWriter<Table, WriteOk> writer = client.streamWriter();
237+
238+
// Write multiple tables
239+
writer.write(table1)
240+
.write(table2);
241+
242+
// Complete the stream and get the result
243+
CompletableFuture<WriteOk> result = writer.completed();
244+
```
245+
246+
You can also set a rate limit for stream writing:
247+
248+
```java
249+
// Limit to 1000 points per second
250+
StreamWriter<Table, WriteOk> writer = client.streamWriter(1000);
251+
```
252+
253+
### Bulk Write
254+
255+
The Bulk Write API provides a high-performance, memory-efficient mechanism for ingesting large volumes of time-series data into GreptimeDB. It leverages off-heap memory management to achieve optimal throughput when writing batches of data.
256+
257+
This API supports writing to one table per stream and handles large data volumes (up to 200MB per write) with adaptive flow control. Performance advantages include:
258+
- Off-heap memory management with Arrow buffers
259+
- Efficient binary serialization and data transfer
260+
- Optional compression
261+
- Batched operations
262+
263+
This approach is particularly well-suited for:
264+
- Large-scale batch processing and data migrations
265+
- High-throughput log and sensor data ingestion
266+
- Time-series applications with demanding performance requirements
267+
- Systems processing high-frequency data collection
268+
269+
Here's a typical pattern for using the Bulk Write API:
270+
271+
```java
272+
// Create a BulkStreamWriter with the table schema
273+
try (BulkStreamWriter writer = greptimeDB.bulkStreamWriter(schema)) {
274+
// Write multiple batches
275+
for (int batch = 0; batch < batchCount; batch++) {
276+
// Get a TableBufferRoot for this batch
277+
Table.TableBufferRoot table = writer.tableBufferRoot(1000); // column buffer size
278+
279+
// Add rows to the batch
280+
for (int row = 0; row < rowsPerBatch; row++) {
281+
Object[] rowData = generateRow(batch, row);
282+
table.addRow(rowData);
283+
}
284+
285+
// Complete the table to prepare for transmission
286+
table.complete();
287+
288+
// Send the batch and get a future for completion
289+
CompletableFuture<Integer> future = writer.writeNext();
290+
291+
// Wait for the batch to be processed (optional)
292+
Integer affectedRows = future.get();
293+
294+
System.out.println("Batch " + batch + " wrote " + affectedRows + " rows");
295+
}
296+
297+
// Signal completion of the stream
298+
writer.completed();
299+
}
300+
```
301+
302+
#### Configuration
303+
304+
The Bulk Write API can be configured with several options to optimize performance:
305+
306+
```java
307+
BulkWrite.Config cfg = BulkWrite.Config.newBuilder()
308+
.allocatorInitReservation(64 * 1024 * 1024L) // Customize memory allocation: 64MB initial reservation
309+
.allocatorMaxAllocation(4 * 1024 * 1024 * 1024L) // Customize memory allocation: 4GB max allocation
310+
.timeoutMsPerMessage(60 * 1000) // 60 seconds timeout per request
311+
.maxRequestsInFlight(8) // Concurrency Control: Configure with 10 maximum in-flight requests
312+
.build();
313+
// Enable Zstd compression
314+
Context ctx = Context.newDefault().withCompression(Compression.Zstd);
315+
316+
BulkStreamWriter writer = greptimeDB.bulkStreamWriter(schema, cfg, ctx);
317+
```
318+
319+
### Resource Management
320+
321+
It's important to properly shut down the client when you're finished using it:
322+
323+
```java
324+
// Gracefully shut down the client
325+
client.shutdownGracefully();
326+
```
327+
328+
### Performance Tuning
329+
330+
#### Compression Options
331+
332+
The GreptimeDB Ingester Java client supports various compression algorithms to reduce network bandwidth and improve throughput.
333+
334+
```java
335+
// Set the compression algorithm to Zstd
336+
Context ctx = Context.newDefault().withCompression(Compression.Zstd);
337+
```
338+
339+
#### Write Operation Comparison
340+
341+
Understanding the performance characteristics of different write methods is crucial for optimizing data ingestion.
342+
343+
| Write Method | API | Throughput | Latency | Memory Efficiency | CPU Utilization |
344+
|--------------|-----|------------|---------|-------------------|-----------------|
345+
| Regular Write | `write(tables)` | Better | Good | High | Higher |
346+
| Stream Write | `streamWriter()` | Moderate | Good | Moderate | Moderate |
347+
| Bulk Write | `bulkStreamWriter()` | Best | Higher | Best | Moderate |
348+
349+
350+
| Write Method | API | Best For | Limitations |
351+
|-------------|-----|----------|-------------|
352+
| Regular Write | `write(tables)` | Simple applications, low latency requirements | Lower throughput for large volumes |
353+
| Stream Write | `streamWriter()` | Continuous data streams, moderate throughput | More complex to use than regular writes |
354+
| Bulk Write | `bulkStreamWriter()` | Maximum throughput, large batch operations | Higher latency, more resource-intensive |
355+
356+
#### Buffer Size Optimization
357+
358+
When using `BulkStreamWriter`, you can configure the column buffer size:
359+
360+
```java
361+
// Get the table buffer with a specific column buffer size
362+
Table.TableBufferRoot table = bulkStreamWriter.tableBufferRoot(columnBufferSize);
363+
```
364+
365+
This option can significantly improve the speed of data conversion to the underlying format. For optimal performance, we recommend setting the column buffer size to 1024 or larger, depending on your specific workload characteristics and available memory.
366+
367+
### Build Requirements
368+
369+
- Java 8+
370+
- Maven 3.6+
371+
372+
### Contributing
373+
374+
We welcome contributions to the GreptimeDB Ingester Java client! Here's how you can contribute:
375+
376+
1. Fork the repository on GitHub
377+
2. Create a feature branch for your changes
378+
3. Make your changes, ensuring they follow the project's code style
379+
4. Add appropriate tests for your changes
380+
5. Submit a pull request to the main repository
381+
382+
When submitting a pull request, please ensure that:
383+
384+
- All tests pass successfully
385+
- Code formatting is correct (run `mvn spotless:check`, this requires Java 17+)
386+
- Documentation is updated to reflect your changes if necessary
387+
388+
Thank you for helping improve the GreptimeDB Ingester Java client! Your contributions are greatly appreciated.

0 commit comments

Comments
 (0)