Skip to content

Commit dc6faad

Browse files
committed
fix: test and pre-commit failures
Signed-off-by: dittops <dittops@gmail.com>
1 parent 65bde16 commit dc6faad

4 files changed

Lines changed: 415 additions & 51 deletions

File tree

PolyLingua/README.md

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ cd PolyLingua
3030
```
3131

3232
You'll be prompted for:
33+
3334
- **HuggingFace API Token** - Get from https://huggingface.co/settings/tokens
3435
- **Model ID** - Default: `swiss-ai/Apertus-8B-Instruct-2509` (translation-optimized model)
3536
- **Host IP** - Your server's IP address
@@ -42,6 +43,7 @@ You'll be prompted for:
4243
```
4344

4445
This builds:
46+
4547
- Translation backend service
4648
- Next.js UI service
4749

@@ -82,13 +84,13 @@ curl -X POST http://localhost:8888/v1/translation \
8284

8385
Key variables in `.env`:
8486

85-
| Variable | Description | Default |
86-
|----------|-------------|---------|
87-
| `HF_TOKEN` | HuggingFace API token | Required |
87+
| Variable | Description | Default |
88+
| -------------- | ---------------------------- | ----------------------------------- |
89+
| `HF_TOKEN` | HuggingFace API token | Required |
8890
| `LLM_MODEL_ID` | Model to use for translation | `swiss-ai/Apertus-8B-Instruct-2509` |
89-
| `MODEL_CACHE` | Directory for model storage | `./data` |
90-
| `host_ip` | Server IP address | `localhost` |
91-
| `NGINX_PORT` | External port for web access | `80` |
91+
| `MODEL_CACHE` | Directory for model storage | `./data` |
92+
| `host_ip` | Server IP address | `localhost` |
93+
| `NGINX_PORT` | External port for web access | `80` |
9294

9395
See `.env.example` for full configuration options.
9496

@@ -99,7 +101,6 @@ The service works with any HuggingFace text generation model. Recommended models
99101
- **swiss-ai/Apertus-8B-Instruct-2509** - Multilingual translation (default)
100102
- **haoranxu/ALMA-7B** - Specialized translation model
101103

102-
103104
## 🛠️ Development
104105

105106
### Project Structure
@@ -128,6 +129,7 @@ PolyLingua/
128129
### Running Locally (Development)
129130

130131
**Backend:**
132+
131133
```bash
132134
# Install dependencies
133135
pip install -r requirements.txt
@@ -142,6 +144,7 @@ python polylingua.py
142144
```
143145

144146
**Frontend:**
147+
145148
```bash
146149
cd ui
147150
npm install
@@ -155,6 +158,7 @@ npm run dev
155158
Translate text between languages.
156159

157160
**Request:**
161+
158162
```json
159163
{
160164
"language_from": "English",
@@ -164,17 +168,20 @@ Translate text between languages.
164168
```
165169

166170
**Response:**
171+
167172
```json
168173
{
169174
"model": "polylingua",
170-
"choices": [{
171-
"index": 0,
172-
"message": {
173-
"role": "assistant",
174-
"content": "Translated text here"
175-
},
176-
"finish_reason": "stop"
177-
}],
175+
"choices": [
176+
{
177+
"index": 0,
178+
"message": {
179+
"role": "assistant",
180+
"content": "Translated text here"
181+
},
182+
"finish_reason": "stop"
183+
}
184+
],
178185
"usage": {}
179186
}
180187
```
@@ -224,11 +231,13 @@ docker compose down -v
224231
### Service won't start
225232

226233
1. Check if ports are available:
234+
227235
```bash
228236
sudo lsof -i :80,8888,9000,8028,5173
229237
```
230238

231239
2. Verify environment variables:
240+
232241
```bash
233242
cat .env
234243
```
@@ -258,8 +267,6 @@ docker compose down -v
258267
- Check if backend is running: `docker compose ps`
259268
- Test API directly: `curl http://localhost:8888/v1/translation`
260269

261-
262-
263270
## 🔗 Resources
264271

265272
- [OPEA Project](https://github.com/opea-project)
@@ -270,6 +277,7 @@ docker compose down -v
270277
## 📧 Support
271278

272279
For issues and questions:
280+
273281
- Open an issue on GitHub
274282
- Check existing issues for solutions
275283
- Review OPEA documentation

PolyLingua/polylingua.py

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,6 @@
1717
import os
1818
import tempfile
1919
from pathlib import Path
20-
from langdetect import detect, LangDetectException
21-
from docling.document_converter import DocumentConverter
22-
from docling.datamodel.base_models import InputFormat
2320

2421
from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
2522
from comps.cores.proto.api_protocol import (
@@ -29,8 +26,11 @@
2926
ChatMessage,
3027
UsageInfo,
3128
)
32-
from fastapi import Request, UploadFile, File, Form, HTTPException
29+
from docling.datamodel.base_models import InputFormat
30+
from docling.document_converter import DocumentConverter
31+
from fastapi import File, Form, HTTPException, Request, UploadFile
3332
from fastapi.responses import StreamingResponse
33+
from langdetect import LangDetectException, detect
3434

3535
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
3636
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
@@ -79,8 +79,7 @@ def __init__(self):
7979
self.converter = DocumentConverter()
8080

8181
async def process_file(self, file: UploadFile) -> list[str]:
82-
"""
83-
Process an uploaded file and extract text content in chunks.
82+
"""Process an uploaded file and extract text content in chunks.
8483
8584
Args:
8685
file: The uploaded file
@@ -100,8 +99,7 @@ async def process_file(self, file: UploadFile) -> list[str]:
10099
file_ext = Path(file.filename).suffix.lower()
101100
if file_ext not in SUPPORTED_EXTENSIONS:
102101
raise ValueError(
103-
f"Unsupported file type: {file_ext}. "
104-
f"Supported types: {', '.join(sorted(SUPPORTED_EXTENSIONS))}"
102+
f"Unsupported file type: {file_ext}. " f"Supported types: {', '.join(sorted(SUPPORTED_EXTENSIONS))}"
105103
)
106104

107105
page_texts = []
@@ -112,19 +110,19 @@ async def process_file(self, file: UploadFile) -> list[str]:
112110
print(f"Reading text file {file.filename}...")
113111
try:
114112
# Try UTF-8 first
115-
text_content = contents.decode('utf-8')
113+
text_content = contents.decode("utf-8")
116114
except UnicodeDecodeError:
117115
# Fallback to latin-1 for other encodings
118116
print("UTF-8 decode failed, trying latin-1...")
119-
text_content = contents.decode('latin-1')
117+
text_content = contents.decode("latin-1")
120118

121119
print(f"Read {len(text_content)} characters from text file")
122120

123121
# Split into chunks if needed
124122
if len(text_content) > CHUNK_SIZE:
125123
print(f"Splitting into chunks of {CHUNK_SIZE} chars")
126124
for i in range(0, len(text_content), CHUNK_SIZE):
127-
chunk = text_content[i:i + CHUNK_SIZE]
125+
chunk = text_content[i : i + CHUNK_SIZE]
128126
page_texts.append(chunk)
129127
print(f"Chunk {len(page_texts)}: {len(chunk)} chars")
130128
else:
@@ -145,7 +143,7 @@ async def process_file(self, file: UploadFile) -> list[str]:
145143
# Convert document using docling
146144
print(f"Converting document {file.filename}...")
147145
result = self.converter.convert(tmp_path)
148-
print(f"Conversion completed")
146+
print("Conversion completed")
149147

150148
# Export entire document to markdown
151149
full_markdown = result.document.export_to_markdown()
@@ -156,7 +154,7 @@ async def process_file(self, file: UploadFile) -> list[str]:
156154
print(f"Splitting into chunks of {CHUNK_SIZE} chars")
157155
# Split into manageable chunks
158156
for i in range(0, len(full_markdown), CHUNK_SIZE):
159-
chunk = full_markdown[i:i + CHUNK_SIZE]
157+
chunk = full_markdown[i : i + CHUNK_SIZE]
160158
page_texts.append(chunk)
161159
print(f"Chunk {len(page_texts)}: {len(chunk)} chars")
162160
else:
@@ -209,16 +207,14 @@ async def translate_page(self, page_text: str, language_from: str, language_to:
209207
{source_language}
210208
211209
"""
212-
prompt = prompt_template.format(
213-
language_from=language_from, language_to=language_to, source_language=page_text
214-
)
210+
prompt = prompt_template.format(language_from=language_from, language_to=language_to, source_language=page_text)
215211

216212
# Create chat completion request with streaming
217213
chat_request_dict = {
218214
"model": LLM_MODEL_ID,
219215
"messages": [{"role": "user", "content": prompt}],
220216
"max_tokens": 4096,
221-
"stream": True
217+
"stream": True,
222218
}
223219

224220
result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs=chat_request_dict)
@@ -235,21 +231,21 @@ async def translate_page(self, page_text: str, language_from: str, language_to:
235231

236232
# Get the response body iterator
237233
async for chunk in response.body_iterator:
238-
chunk_str = chunk.decode('utf-8') if isinstance(chunk, bytes) else chunk
234+
chunk_str = chunk.decode("utf-8") if isinstance(chunk, bytes) else chunk
239235

240236
# Parse SSE format
241-
lines = chunk_str.split('\n')
237+
lines = chunk_str.split("\n")
242238
for line in lines:
243-
if line.startswith('data: '):
239+
if line.startswith("data: "):
244240
data = line[6:] # Remove "data: " prefix
245241

246-
if data == '[DONE]':
242+
if data == "[DONE]":
247243
continue
248244

249245
try:
250246
parsed = json.loads(data)
251247
# Extract content from chat completion format
252-
text = parsed.get('choices', [{}])[0].get('delta', {}).get('content', '')
248+
text = parsed.get("choices", [{}])[0].get("delta", {}).get("content", "")
253249
if text:
254250
accumulated_text += text
255251
except:
@@ -274,7 +270,7 @@ async def handle_request(self, request: Request):
274270
language_to = form_data.get("language_to")
275271
file = form_data.get("file")
276272

277-
if not file or not hasattr(file, 'filename'):
273+
if not file or not hasattr(file, "filename"):
278274
raise HTTPException(status_code=400, detail="No file uploaded")
279275

280276
if not language_to:
@@ -368,7 +364,7 @@ async def handle_request(self, request: Request):
368364
chat_request_dict = {
369365
"model": LLM_MODEL_ID,
370366
"messages": [{"role": "user", "content": prompt}],
371-
"stream": True
367+
"stream": True,
372368
}
373369

374370
result_dict, runtime_graph = await self.megaservice.schedule(initial_inputs=chat_request_dict)

PolyLingua/requirements.txt

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
# OPEA GenAIComps Framework
2-
opea-comps>=1.3.0
3-
4-
# Core Dependencies
5-
fastapi>=0.109.0
6-
uvicorn[standard]>=0.27.0
7-
python-multipart>=0.0.9
81

92
# Async Support
103
aiohttp>=3.9.0
114
asyncio>=3.4.3
125

13-
# Language Detection
14-
langdetect>=1.0.9
15-
166
# Document Processing
177
docling>=2.0.0
8+
9+
# Core Dependencies
10+
fastapi>=0.109.0
11+
12+
# Language Detection
13+
langdetect>=1.0.9
14+
# OPEA GenAIComps Framework
15+
opea-comps>=1.3.0
16+
python-multipart>=0.0.9
17+
uvicorn[standard]>=0.27.0

0 commit comments

Comments
 (0)