Deep Learning Engineer

Norconsult Telematics

Riyadh, Saudi ArabiaSAR 12,500-16,667/moToday

Saudi ArabiaIT & TechnologyFull Time

Skills Required

AwsAzureDevopsErpArabicEnglish

Job Description

<div><ul><li>Lead fine-tuning, compression, and deployment of deep learning models, especially LLMs, using distributed multi-GPU frameworks (e.g., DeepSpeed, FSDP) to enhance performance, scalability, and efficiency.</li><li>Optimize model inference through quantization, pruning, and distillation, and deploy using ONNX, TensorRT, OpenVINO, or platforms like ModelMesh and NVIDIA Triton.</li><li>Support Generative AI and RAG pipelines by improving latency, throughput, and GPU resource utilization across modern AI infrastructures in alignment with MLOps/DevOps workflows.</li></ul><h3>Job Description&Responsibilities:</h3><ul><li>Fine-tune and optimize LLMs and deep learning models using distributed multi-GPU frameworks such as DeepSpeed, FSDP, and Hugging Face Accelerate.</li><li>Apply model compression techniques including quantization (e.g., INT8), pruning, and knowledge distillation to improve inference efficiency.</li><li>Convert and optimize models for low-latency inference using tools like ONNX Runtime, TensorRT, OpenVINO, and TF-Serving.</li><li>Deploy and serve models using high-performance platforms such as ModelMesh and NVIDIA Triton.</li><li>Collaborate with MLOps and DevOps teams on GPU resource planning, benchmarking, and building automated deployment pipelines.</li><li>Support and enhance RAG pipelines, embedding models, and inference tuning for Generative AI applications.</li><li>Ensure deployment readiness by focusing on model scalability, latency, and throughput, aligned with enterprise infrastructure standards.</li></ul><h3>Qualification&Experience:</h3><ul><li>Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or a related field.</li><li>Minimum 4 years of hands-on experience in deep learning, including multi-GPU distributed training and high-performance model optimization.</li><li>Proficient in PyTorch, TensorFlow, and deep learning toolkits such as Hugging Face Transformers and ONNX Runtime.</li><li>Strong experience with model compression techniques (quantization, pruning, distillation) and inference optimization using ONNX, TensorRT, OpenVINO, or TF-Serving.</li><li>Knowledge of Transformer architectures, LLM internals, and generative AI workflows, including RAG.</li><li>Familiarity with model serving platforms (e.g., Triton, ModelMesh) and deployment on OpenShift AI environments.</li><li>Exposure to MLOps, GPU benchmarking, and cloud platforms (AWS, Azure, GCP) is preferred.</li><li>Fluent in English (mandatory); Arabic proficiency is an added advantage.</li><li>Certifications in AI/ML, DL, cloud platforms, or Red Hat/OpenShift AI are a plus.</li></ul></div>#J-18808-Ljbffr