HEO Lachemat Houss3m

Hi there, I'm Houssam 👋

I am a Speech and AI researcher/engineer working on automatic speech recognition (ASR), Arabic speech technologies, and robust speech modeling beyond adult read speech.

My current work focuses on building and evaluating ASR systems for challenging real-world conditions, including Arabic dialects, code-switching, children’s speech, long-form speech, and streaming ASR. I am especially interested in the gap between research prototypes and reliable production-ready speech systems.

About Me

🎙️ Research Assistant working on speech and AI at QCRI
🔭 Currently working on:
- Arabic and multilingual ASR
- Algerian dialect and code-switching ASR
- Children’s speech recognition
- Streaming ASR systems
- Robustness and adaptation under distribution shift
🧠 Interested in both research and engineering: model training, evaluation, deployment, and reproducibility

Research & Engineering Interests

Automatic Speech Recognition

End-to-end ASR systems
Arabic dialect ASR
Code-switching ASR
Children’s ASR
Streaming and low-latency ASR
Long-form ASR
ASR evaluation, normalization, and benchmarking

Speech Model Adaptation

Adult-to-child ASR adaptation
Robustness under domain shift
Fine-tuning and LoRA-based adaptation
Weight-space model merging
Retention-aware adaptation

NLP & Speech-Language Technologies

Arabic text normalization
Punctuation restoration
Speech-to-text post-processing
Multilingual and code-switched language modeling

ML Engineering

PyTorch and Hugging Face workflows
ESPnet, k2/icefall, Sherpa-ONNX, and Whisper-based pipelines
Dataset preparation and large-scale evaluation
Reproducible experiments and benchmarking
Deployment-oriented ASR pipelines

Current Projects

🎧 AlgerianSpeech Platform

AlgerianSpeech is a platform dedicated to advancing speech recognition for Algerian Arabic, especially in realistic multilingual and code-switched settings involving Arabic, French, and English.

The project includes:

A speech annotation platform for Algerian dialect and code-switching speech
Real-world spontaneous speech collected from online recordings
Transcription and annotation workflows for multilingual speech
ASR evaluation pipelines using metrics such as WER, CER, and MER
Resources for building more robust ASR systems for underrepresented Arabic dialects

This work supports the broader goal of improving speech technology for Arabic dialects and low-resource multilingual communities.

Selected Areas I Work On

Arabic ASR benchmarking
Streaming ASR for Arabic
Code-switching recognition and analysis
Children’s speech recognition
ASR robustness and domain adaptation
Punctuation restoration for Arabic ASR output
Dataset curation, normalization, and evaluation design

Tools & Frameworks

Languages: Python, Bash, LaTeX
ML/DL: PyTorch, Hugging Face Transformers, NumPy, pandas
ASR: ESPnet, k2/icefall, Whisper, Sherpa-ONNX, NeMo
Evaluation: jiwer, custom WER/CER pipelines, Arabic normalization tools
Experimentation: Slurm, Conda, Git, Linux, GPU-based training/inference
Deployment/Inference: ONNX, streaming ASR pipelines, model serving workflows

Let's Collaborate

I am open to collaboration on projects related to:

Arabic ASR and dialectal speech technologies
Code-switching speech recognition
Children’s speech recognition
ASR benchmarking and evaluation
Open-source speech tools and datasets
Robust and deployable speech AI systems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly