Responsibilities

Benchmark and evaluate TTS and ASR models using Arabic-specific test sets, measuring metrics such as Word Error Rate (WER), naturalness, and dialect coverage.
Fine-tune generative models for voice cloning, zero-shot speaker adaptation, and speech synthesis.
Build and maintain Arabic-focused data pipelines, including: Audio collection and preprocessing Diacritization (Tashkil) Data cleaning and augmentation
Audio collection and preprocessing
Diacritization (Tashkil)
Data cleaning and augmentation
Optimize model inference for production environments using: Quantization KV-cache tuning Streaming inference techniques
Quantization
KV-cache tuning
Streaming inference techniques
Integrate and evaluate complete speech-to-speech conversational pipelines.
Conduct experiments based on recent research papers and convert findings into production-ready solutions.
Collaborate with engineering and product teams to deploy robust and scalable speech systems.

Basic qualifications

5+ years of experience in Machine Learning, Applied AI, or AI Research.
Strong programming skills in Python.
Extensive hands-on experience with PyTorch and the Hugging Face ecosystem.
Proven experience training and fine-tuning neural models for: Text-to-Speech (TTS) Automatic Speech Recognition (ASR) Audio codecs
Text-to-Speech (TTS)
Automatic Speech Recognition (ASR)
Audio codecs
Deep understanding of modern speech architectures such as: Whisper Conformer HiFi-GAN Diffusion-based models
Whisper
Conformer
HiFi-GAN
Diffusion-based models
Experience with audio processing techniques including: Voice Activity Detection (VAD) Speaker Diarization Neural Vocoders
Voice Activity Detection (VAD)
Speaker Diarization
Neural Vocoders
Demonstrated ability to implement and adapt research papers into practical production experiments.
Strong understanding of Arabic language challenges, including: Diacritization (Tashkil) Dialectal variations Code-switching
Diacritization (Tashkil)
Dialectal variations
Code-switching
Experience with inference optimization techniques such as: Quantization Streaming inference NVIDIA TensorRT
Quantization
Streaming inference
NVIDIA TensorRT

Experience developing custom NVIDIA CUDA kernels for high-performance model inference.
Familiarity with speculative decoding and other advanced acceleration techniques.
Experience deploying models at scale in cloud or GPU-based production environments.
Contributions to open-source speech or machine learning projects.

All employees benefits for free (our famous games room, daily breakfast, fruits, coffee and other hot drinks, soft drinks and juices, company days out and parties…).
Flexible and comfortable schedule.
Social insurance.
Paid annual and national vacation.
Working remotely.
Competitive salaries.
Monetary rewards and incentives.
Career possibilities with growing team.
Open-door management policy.
Full Medical insurance.
Accommodation and transportation allowance.
Friendly environment that values innovation and efficiency.
Exciting opportunities for career growth and talent development.
Feedback encouragement.
Recognition and reward programs.
Friendly environment.
Fun committees.
Fun, smart and creative people.
Social benefits.
Natural Text-to-Speech (TTS)
Real-Time Automatic Speech Recognition (ASR)
End-to-End Speech-to-Speech Conversational Systems

Used for matching and alerts on DevFound

Fulltime Remote Machine Learning Ai