AI Infrastructure Consultant

Yallo Retail

Riyadh, Saudi ArabiaAED 8,000-22,000/mo≈ SAR 8.2K-22.4K/moToday

Saudi ArabiaIT & TechnologyFull Time

Skills Required

AwsAzureDockerKubernetesDevopsMachine Learning

Job Description

Job Title: AI Infrastructure ConsultantJob Type: PermanentJob Location: Riyadh, Saudi ArabiaJob Summary:We are seeking a seasoned AI Infrastructure Consultant to lead the design, implementation, and optimization of our high-performance computing environment. This role is critical for bridging the gap between raw hardware capabilities (GPUs) and scalable AI/ML model deployment. You will be responsible for ensuring our infrastructure is robust, cost-effective, and capable of supporting complex machine learning workloads at scale.Roles and Responsibilities:Architecture & DesignAssess AI/ML workload requirements to design end-to-end compute, storage, and networking architectures.Architect specialized GPU clusters (NVIDIA A100/H100 or similar) tailored for training and inference.Define high-speed networking requirements (e.g., InfiniBand, RoCE) and low-latency storage solutions for massive datasets.Containerization & OrchestrationImplement and manage Docker containerization for consistent model environments.Deploy and scale AI workloads using Kubernetes (or managed services like EKS/GKE/AKS), ensuring high availability and seamless resource scheduling.MLOps & CI/CD IntegrationBuild and maintain robust CI/CD pipelines specifically for AI models, automating the journey from code to production.Integrate automated testing, versioning for models/data, and deployment strategies (Canary, Blue-Green).Monitoring & Cost OptimizationEstablish comprehensive monitoring frameworks to track infrastructure utilization and GPU health.Analyze performance bottlenecks and implement strategies to optimize cost-performance, ensuring maximum ROI on expensive compute resources.Required Qualifications & Skills:Total Experience: 10+ years in IT Infrastructure, Systems Engineering, or DevOps.AI Specialization: 2–3 years of hands‑on experience specifically in AI/ML infrastructure.GPU Expertise: Proven track record in GPU setup, CUDA configurations, and managing hardware acceleration for deep learning.Orchestration: Expert‑level knowledge of Kubernetes and the CNCF ecosystem.Cloud & Hybrid: Proficiency in major cloud providers (AWS/Azure/GCP) and on‑premise data center environments.Soft Skills: Strong consultancy mindset with the ability to translate complex technical requirements into actionable architectural roadmaps.#J-18808-Ljbffr