Responsibilities
- Benchmark and evaluate TTS and ASR models using Arabic-specific test sets, measuring metrics such as Word Error Rate (WER), naturalness, and dialect coverage.
- Fine-tune generative models for voice cloning, zero-shot speaker adaptation, and speech synthesis.
- Build and maintain Arabic-focused data pipelines, including: Audio collection and preprocessing Diacritization (Tashkil) Data cleaning and augmentation
- Audio collection and preprocessing
- Diacritization (Tashkil)
- Data cleaning and augmentation
- Optimize model inference for production environments using: Quantization KV-cache tuning Streaming inference techniques
- Quantization
- KV-cache tuning
- Streaming inference techniques
- Integrate and evaluate complete speech-to-speech conversational pipelines.
- Conduct experiments based on recent research papers and convert findings into production-ready solutions.
- Collaborate with engineering and product teams to deploy robust and scalable speech systems.
Basic qualifications
- 5+ years of experience in Machine Learning, Applied AI, or AI Research.
- Strong programming skills in Python.
- Extensive hands-on experience with PyTorch and the Hugging Face ecosystem.
- Proven experience training and fine-tuning neural models for: Text-to-Speech (TTS) Automatic Speech Recognition (ASR) Audio codecs
- Text-to-Speech (TTS)
- Automatic Speech Recognition (ASR)
- Audio codecs
- Deep understanding of modern speech architectures such as: Whisper Conformer HiFi-GAN Diffusion-based models
- Whisper
- Conformer
- HiFi-GAN
- Diffusion-based models
- Experience with audio processing techniques including: Voice Activity Detection (VAD) Speaker Diarization Neural Vocoders
- Voice Activity Detection (VAD)
- Speaker Diarization
- Neural Vocoders
- Demonstrated ability to implement and adapt research papers into practical production experiments.
- Strong understanding of Arabic language challenges, including: Diacritization (Tashkil) Dialectal variations Code-switching
- Diacritization (Tashkil)
- Dialectal variations
- Code-switching
- Experience with inference optimization techniques such as: Quantization Streaming inference NVIDIA TensorRT
- Quantization
- Streaming inference
- NVIDIA TensorRT
Preferred qualifications
- Experience developing custom NVIDIA CUDA kernels for high-performance model inference.
- Familiarity with speculative decoding and other advanced acceleration techniques.
- Experience deploying models at scale in cloud or GPU-based production environments.
- Contributions to open-source speech or machine learning projects.
Benefits
- All employees benefits for free (our famous games room, daily breakfast, fruits, coffee and other hot drinks, soft drinks and juices, company days out and parties…).
- Flexible and comfortable schedule.
- Social insurance.
- Paid annual and national vacation.
- Working remotely.
- Competitive salaries.
- Monetary rewards and incentives.
- Career possibilities with growing team.
- Open-door management policy.
- Full Medical insurance.
- Accommodation and transportation allowance.
- Friendly environment that values innovation and efficiency.
- Exciting opportunities for career growth and talent development.
- Feedback encouragement.
- Recognition and reward programs.
- Friendly environment.
- Fun committees.
- Fun, smart and creative people.
- Social benefits.
- Natural Text-to-Speech (TTS)
- Real-Time Automatic Speech Recognition (ASR)
- End-to-End Speech-to-Speech Conversational Systems
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Remote Machine Learning Ai