Deep Learning Engineer

Norconsult Telematics

Riyadh, Saudi ArabiaAED 7,000-18,000/mo≈ SAR 7.1K-18.4K/moToday

Saudi ArabiaIT & TechnologyFull Time

Skills Required

AwsAzureDevopsErpArabicEnglish

Job Description

Lead fine-tuning, compression, and deployment of deep learning models, especially LLMs, using distributed multi-GPU frameworks (e.g., DeepSpeed, FSDP) to enhance performance, scalability, and efficiency.Optimize model inference through quantization, pruning, and distillation, and deploy using ONNX, TensorRT, OpenVINO, or platforms like ModelMesh and NVIDIA Triton.Support Generative AI and RAG pipelines by improving latency, throughput, and GPU resource utilization across modern AI infrastructures in alignment with MLOps/DevOps workflows.Job Description & Responsibilities:Fine-tune and optimize LLMs and deep learning models using distributed multi-GPU frameworks such as DeepSpeed, FSDP, and Hugging Face Accelerate.Apply model compression techniques including quantization (e.g., INT8), pruning, and knowledge distillation to improve inference efficiency.Convert and optimize models for low-latency inference using tools like ONNX Runtime, TensorRT, OpenVINO, and TF-Serving.Deploy and serve models using high-performance platforms such as ModelMesh and NVIDIA Triton.Collaborate with MLOps and DevOps teams on GPU resource planning, benchmarking, and building automated deployment pipelines.Support and enhance RAG pipelines, embedding models, and inference tuning for Generative AI applications.Ensure deployment readiness by focusing on model scalability, latency, and throughput, aligned with enterprise infrastructure standards.Qualification & Experience:Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or a related field.Minimum 4 years of hands-on experience in deep learning, including multi-GPU distributed training and high-performance model optimization.Proficient in PyTorch, TensorFlow, and deep learning toolkits such as Hugging Face Transformers and ONNX Runtime.Strong experience with model compression techniques (quantization, pruning, distillation) and inference optimization using ONNX, TensorRT, OpenVINO, or TF-Serving.Knowledge of Transformer architectures, LLM internals, and generative AI workflows, including RAG.Familiarity with model serving platforms (e.g., Triton, ModelMesh) and deployment on OpenShift AI environments.Exposure to MLOps, GPU benchmarking, and cloud platforms (AWS, Azure, GCP) is preferred.Fluent in English (mandatory); Arabic proficiency is an added advantage.Certifications in AI/ML, DL, cloud platforms, or Red Hat/OpenShift AI are a plus.#J-18808-Ljbffr