Skip to content

Commit b38c826

Browse files
committed
feat: harden extraction pipeline and overhaul documentation
1 parent e8e1240 commit b38c826

31 files changed

Lines changed: 1034 additions & 2082 deletions

README.MD

Lines changed: 22 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -9,144 +9,80 @@
99

1010
</div>
1111

12-
Sophisticated NLP library for comprehensive text processing, tokenization, and analysis with LLM integration.
12+
Qirrel is a TypeScript NLP library for tokenization, rule-based entity extraction, optional LLM enrichment, and pipeline orchestration.
1313

1414
[GitHub](https://github.com/dev-dami/qirrel) | [NPM](https://www.npmjs.com/package/qirrel) | Author: [Damilare Osibanjo](https://github.com/dev-dami)
1515

16-
## Quick Start
16+
## Install
1717

1818
```bash
19-
npm install qirrel
19+
bun add qirrel
2020
```
2121

22-
### Extract Multiple Entity Types
22+
## Quick Start
2323

2424
```ts
2525
import { processText } from 'qirrel';
26-
const result = await processText('Contact John at john@example.com or call +1-555-123-4567');
26+
27+
const result = await processText('Contact John at john@example.com or call +1 415 555 2671');
2728
console.log(result.data?.entities);
28-
// [
29-
// { type: 'email', value: 'john@example.com', ... },
30-
// { type: 'phone', value: '+1-555-123-4567', ... }
31-
// ]
3229
```
3330

34-
### Extract URLs and Numbers
31+
## Batch Processing
3532

3633
```ts
37-
import { processText } from 'qirrel';
38-
const result = await processText('Visit https://example.com for more info. Price: $29.99');
39-
console.log(result.data?.entities);
40-
// [
41-
// { type: 'url', value: 'https://example.com', ... },
42-
// { type: 'number', value: '29.99', ... }
43-
// ]
44-
```
34+
import { processTexts, Pipeline } from 'qirrel';
4535

46-
### Advanced Pipeline Usage
36+
const results = await processTexts(['first input', 'second input'], undefined, { concurrency: 2 });
4737

48-
```ts
49-
import { Pipeline } from 'qirrel';
5038
const pipeline = new Pipeline();
51-
const result = await pipeline.process('Check out https://github.com and email support@example.com');
52-
console.log(result.data?.entities);
39+
const ordered = await pipeline.processBatch(['a', 'b', 'c'], { concurrency: 3 });
5340
```
5441

5542
## LLM Integration
5643

57-
Connect to large language models for enhanced text analysis:
58-
5944
```ts
6045
import { Pipeline } from 'qirrel';
6146

62-
// Enable LLM in your config
6347
const pipeline = new Pipeline('./config-with-llm.yaml');
64-
const llmAdapter = pipeline.getLLMAdapter();
48+
await pipeline.init();
6549

50+
const llmAdapter = pipeline.getLLMAdapter();
6651
if (llmAdapter) {
67-
// Use LLM for advanced analysis
68-
const response = await llmAdapter.generateContent('Analyze this text...');
69-
console.log(response);
52+
const response = await llmAdapter.generate('Summarize this text: Hello world');
53+
console.log(response.content);
7054
}
7155
```
7256

7357
## Caching
7458

75-
Qirrel includes built-in caching functionality to improve performance by storing frequently accessed results:
76-
7759
```ts
78-
import { Pipeline, LruCacheManager } from 'qirrel';
60+
import { Pipeline } from 'qirrel';
7961

80-
// The pipeline automatically uses caching based on configuration
8162
const pipeline = new Pipeline();
8263

83-
// Access the cache manager directly if needed
84-
const cacheManager = pipeline.getCacheManager();
85-
86-
// Custom cache usage
87-
const cache = new LruCacheManager({
88-
maxEntries: 1000, // Maximum number of cached items
89-
ttl: 300000 // Time-to-live in milliseconds (5 minutes)
90-
});
91-
92-
// Check if a result is already cached
9364
if (pipeline.isCached('some text')) {
94-
const cachedResult = pipeline.getCached('some text');
95-
console.log('Using cached result');
65+
console.log(pipeline.getCached('some text'));
9666
} else {
9767
const result = await pipeline.process('some text');
98-
console.log('Processing new text');
68+
console.log(result.data?.entities);
9969
}
10070
```
10171

102-
## Pipeline Lifecycle Events
103-
104-
Monitor and react to pipeline execution with comprehensive lifecycle events:
105-
106-
```ts
107-
import { Pipeline, PipelineEvent } from 'qirrel';
108-
109-
const pipeline = new Pipeline();
110-
111-
// Subscribe to pipeline events
112-
pipeline.on(PipelineEvent.RunStart, (payload) => {
113-
console.log('Pipeline started:', payload.context.meta?.requestId);
114-
});
115-
116-
pipeline.on(PipelineEvent.ProcessorEnd, (payload) => {
117-
console.log(`Processor ${payload.processorName} completed in ${payload.duration}ms`);
118-
});
119-
120-
pipeline.on(PipelineEvent.RunEnd, (payload) => {
121-
console.log('Pipeline completed in', payload.duration, 'ms');
122-
});
123-
124-
const result = await pipeline.process('Your text here');
125-
```
126-
127-
## Use Cases
128-
129-
Qirrel is perfect for developers building solutions for:
130-
131-
- **Business Applications**: Extract customer contact details, lead generation from documents, document processing for legal/medical/financial use cases
132-
- **Social Media Analysis**: Process social content for mentions, hashtags, links, and user-generated content patterns
133-
- **Research & Content**: Extract structured data from academic papers, news articles, or documentation
134-
- **Data Processing**: Unstructured data extraction, log file analysis, preprocessing for ML pipelines
135-
- **Communication Tools**: Email processing, chat applications, content moderation
136-
13772
## Documentation
73+
13874
- [API Reference](./docs/api.md)
13975
- [Configuration Guide](./docs/configuration.md)
14076
- [Usage Examples](./docs/examples.md)
141-
- [Code Walkthrough](./docs/walkthrough.md)
14277
- [Basic Usage](./docs/usage/basic.md)
143-
- [Advanced Usage](./docs/usage/advanced.md)
14478
- [LLM Integration](./docs/integrations/llm.md)
14579
- [Pipeline Events](./docs/events.md)
14680
- [Caching](./docs/usage/caching.md)
14781

14882
## Contributing
149-
We welcome contributions from the community! Please read our [Contributing Guide](./docs/contributing.md) for more information.
83+
84+
See [CONTRIBUTING.md](./CONTRIBUTING.md).
15085

15186
## License
152-
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
87+
88+
MIT. See [LICENSE](./LICENSE).

default.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,5 @@ llm:
2525
model: gemini-2.5-flash
2626
temperature: 0.7
2727
maxTokens: 1024
28-
timeout: 30000
28+
timeout: 30000
29+
cacheTtl: 300000

0 commit comments

Comments
 (0)