Skip to content

Commit 001eafd

Browse files
LoCoBench Botclaude
andcommitted
feat: US-006 - Create docgen-api-003: Kafka Consumer API reference
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 9132740 commit 001eafd

7 files changed

Lines changed: 787 additions & 1 deletion

File tree

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
FROM ubuntu:24.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
RUN apt-get update && \
6+
apt-get install -y --no-install-recommends \
7+
ca-certificates \
8+
git \
9+
openjdk-21-jdk-headless \
10+
python3 \
11+
vim \
12+
&& \
13+
rm -rf /var/lib/apt/lists/*
14+
15+
WORKDIR /repo
16+
17+
# Clone Apache Kafka at specific commit (e678b4b, trunk 2026-02-16)
18+
# Use blobless clone + sparse-checkout for speed
19+
RUN git clone --filter=blob:none --no-checkout https://github.com/apache/kafka.git . && \
20+
git sparse-checkout init --cone && \
21+
git sparse-checkout set clients/src/main/java/org/apache/kafka/clients/consumer && \
22+
git checkout e678b4b
23+
24+
WORKDIR /workspace
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Task: Generate API Reference Documentation for Kafka KafkaConsumer
2+
3+
## Objective
4+
5+
Generate comprehensive API reference documentation for the Apache Kafka `KafkaConsumer` Java API. The documentation should cover the complete API surface with emphasis on behavioral semantics, offset management strategies, rebalancing mechanics, and error handling patterns.
6+
7+
## Scope
8+
9+
Your documentation should cover the **KafkaConsumer API** from the `org.apache.kafka.clients.consumer` package, including:
10+
11+
1. **Core Consumer Lifecycle**
12+
- Constructor variants and configuration
13+
- Subscription methods (dynamic and manual assignment)
14+
- Polling and data fetching semantics
15+
- Resource cleanup and closing
16+
17+
2. **Offset Management**
18+
- Synchronous and asynchronous commit strategies
19+
- Offset queries and position control
20+
- Seek operations and offset discovery
21+
- Committed offset semantics
22+
23+
3. **Consumer Group Mechanics**
24+
- ConsumerRebalanceListener interface and callbacks
25+
- Rebalance triggers and timing
26+
- Group membership and heartbeat behavior
27+
- Partition assignment vs subscription models
28+
29+
4. **Flow Control and Position Management**
30+
- Pause and resume functionality
31+
- Position queries and manipulation
32+
- Offset-to-timestamp lookups
33+
- Beginning and end offset discovery
34+
35+
5. **Error Handling**
36+
- Exception types and recovery strategies
37+
- CommitFailedException and group fencing
38+
- WakeupException and thread interruption
39+
- Timeout and authentication errors
40+
41+
## Requirements
42+
43+
### API Methods Documentation (40%)
44+
45+
Document all public methods of the `KafkaConsumer` class with:
46+
- Method signatures including all overloads
47+
- Parameter semantics and validation rules
48+
- Return types and their meanings
49+
- Exception types thrown and conditions
50+
51+
### Behavioral Notes (30%)
52+
53+
Explain critical behavioral semantics:
54+
- **poll() blocking behavior**: when it returns immediately vs when it blocks, timeout handling, rebalance callback execution during poll
55+
- **Offset commit semantics**: difference between sync/async commits, retry behavior, commit failure handling
56+
- **Rebalance coordination**: when rebalances occur (only during poll), callback ordering (revoked then assigned), partition ownership guarantees
57+
- **Thread safety**: which methods are thread-safe, wakeup() special case, event loop model
58+
- **Group membership**: max.poll.interval.ms enforcement, proactive leave behavior, session timeout vs poll timeout
59+
- **Manual vs dynamic assignment**: mutually exclusive nature, use cases for each model
60+
- **Position vs committed offset**: the off-by-one relationship ("committed should be next offset to read")
61+
- **Transactional semantics**: read_committed isolation level, LSO boundary, filtered messages
62+
63+
### Usage Examples (20%)
64+
65+
Provide concrete code examples demonstrating:
66+
- **Basic subscription and polling loop**: subscribe to topics, poll for records, process messages
67+
- **Manual offset commit**: disable auto-commit, explicit commitSync/commitAsync after processing
68+
- **Rebalance listener**: implement ConsumerRebalanceListener, commit offsets on revoke, initialize positions on assign
69+
- **Seek operations**: seekToBeginning, seekToEnd, seek to specific offset, timestamp-based seeking
70+
- **Multi-threaded processing pattern**: single consumer thread with pause/resume coordination and worker pool
71+
72+
### Documentation Structure (10%)
73+
74+
Organize documentation with clear sections:
75+
- Overview and threading model
76+
- Core types (KafkaConsumer, ConsumerRebalanceListener, ConsumerRecords, etc.)
77+
- Subscription and assignment methods
78+
- Polling and data fetching
79+
- Offset management methods
80+
- Flow control and position queries
81+
- Metadata and monitoring
82+
- Lifecycle and resource management
83+
- Exception handling guide
84+
- Configuration-driven behaviors
85+
- Common patterns and best practices
86+
87+
## Output Format
88+
89+
Write your documentation to `/workspace/documentation.md` in Markdown format.
90+
91+
## Notes
92+
93+
- Focus on **behavioral semantics** that aren't obvious from method signatures alone
94+
- Include edge cases and gotchas (e.g., poll may block longer than timeout during rebalance callbacks)
95+
- Explain the relationship between different offset concepts (position, committed, beginning, end, LSO)
96+
- Cover both group-managed and standalone consumer patterns
97+
- Document configuration properties that significantly affect API behavior
98+
- Use the codebase to find real usage patterns in tests and internal components
99+
100+
## Evaluation
101+
102+
Your documentation will be evaluated on:
103+
1. **Completeness**: All key API methods and types documented
104+
2. **Accuracy**: Behavioral descriptions match actual implementation
105+
3. **Clarity**: Complex semantics explained clearly with examples
106+
4. **Practical value**: Real-world usage patterns and error handling strategies included
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[task]
2+
category = "api_reference"
3+
language = "java"
4+
difficulty = "hard"
5+
time_limit_sec = 1200

0 commit comments

Comments
 (0)