M
On-Prem AI Infra Engineer | Kubernetes & GitOps
MBR Partners
Dubai, UAEAED 7,000-18,000/moToday
UAEIT & TechnologyFull Time
Skills Required
PythonKubernetesGitExcelDevopsErpSupply Chain
Job Description
Our client is a young high-tech company incorporated in the heart of one of the world's fastest-growingtech hubs—Dubai, UAE.As the exclusive software partner to one of the world's largest ODMs in the networking equipment space, they develop the Network Operating Systems that power critical data centre and telecom routing & switching infrastructure. Building on this foundation, they have recently launched an AI division focused on designing our own chips to accelerate inference andtraining workloads.What sets them apart is their unique position at the centre of a historic development: our ODM partneris establishing the first networking equipment factory of its kind in the GCC region, and they are thesoftware engine driving this groundbreaking initiative.They are not just building technology—they are building a true networking vendor that serves regional interests while meeting the growing demandfor networking equipment across the MENA region and further.Their long-term vision extends beyond products to people: creating a thriving ecosystem forembedded systems and ASIC design talent that will produce generations of world-classprofessionals, establishing our region as a global centre of excellence for Enterprise Computeinnovation.As a rapidly growing company at the forefront of AI hardware innovation, they are constantly seeking talented and motivated individuals to join their team. We offer a dynamic and challenging work environment, with opportunities to make a significant impact on the future of AI technology.Your MissionOwn the end-to-end design and operation of our on-premise infrastructure for AI and enterpriseworkloads—built as code, automated, observable, and secure. You will architect and runKubernetes clusters for training/inference, manage servers, networks, and core services, andenable developers with reliable CI/CD and platform tooling. This is where minutes, time-to-recovery and cost-per-job directly impact AI velocity at scale.ResponsibilitiesDesign and operate on-prem infrastructure as code: author reusable Terraform/Ansible/Helmmodules; build GitOps workflows (e.g., Argo CD) for repeatable, audited changes acrossenvironments.Build and run Kubernetes for AI: configure multi-tenant GPU clusters (MIG/GPUDirect RDMA,NVIDIA device plugins/DCGM), scheduling/quotas, HPA/Cluster Autoscaler (where applicable),and workload isolation.Administer servers, networks, and core services: OS lifecycle (Linux), identity/SSO(Keycloak/LDAP), secrets (Vault), DNS/DHCP/NTP, artifact registries, and internal packagemirrors.Provide storage for AI pipelines: integrate and operate high-bandwidth/low-latency storage,tune for dataset staging and checkpointing patterns.Enable CI/CD: partner with developers to design fast, reproducible pipelines (GitLab CI/GitHubActions), caching and runners on GPU/CPU nodes, artifact provenance (SBOM, SLSA).You'll Collaborate WithPlatform and ML engineers running training/inference at scale, silicon and systems teamsintegrating hardware in the lab, security engineers safeguarding credentials and supply chain,application developers delivering services via CI/CD, and site ops supporting data centredeployments — together, we turn infrastructure into a product that accelerates the business.Minimum Qualifications5+ years in DevOps/SRE/Platform Engineering with hands-on ownership of on-premenvironments.Proven experience operating Kubernetes in production (multi-tenant RBAC, networking/CNI,storage, ingress, monitoring).Proficiency with IaC and automation (Terraform, Ansible, Helm; GitOps with Argo CD/Flux).Strong Linux administration, scripting (Bash/Python), and troubleshooting across the stack(compute, network, storage).CI/CD expertise (GitLab CI/GitHub Actions), container build security (SBOM, image signing), andartifact management.Solid networking fundamentals (L2/L3, routing, BGP, VLANs, EVPN/VXLAN, load balancing,TLS/mTLS).Experience implementing observability (Prometheus/Grafana, logs, tracing) and running incidentresponse.Preferred (Nice-to-Haves)GPU cluster operations for AI (NVIDIA drivers/operator, DCGM, MIG, GPUDirect RDMA, Slurmintegration).Storage for data-intensive workloads (Ceph, parallel filesystems, NVMe-oF) and performancetuning.Secrets/identity platforms (Vault, Keycloak/LDAP/SSO), policy-as-code (OPA/Gatekeeper,Kyverno).Security/compliance practices (CIS benchmarks, SLSA, supply-chain scanning) and zero-trustnetworking.Data centre experience (rack/stack, power/cooling basics) and remote site rollout automation.Familiarity with configuration management for network devices and API-driven switches/routers.Reproducible environments by default: any engineer can spin up an identical dev/test stack(K8s namespace, storage, secrets, runners) from Git in ≤30 minutes, with audit trails for everychange.Solid CI/CD for AI workflows: model/build/test pipelines are deterministic and cache-efficient;median pipeline time down 30–50%, with artifact provenance (SBOM, signatures) and traceabled
Similar Opportunities
Senior IT HR Partner — Remote & Flexible Schedule
Qureos
Dubai, UAEAED 4,000-10,000/moToday
UAEIT & Technology
Senior Alliances Leader — MEA/APAC SaaS Growth
JAGGAER
Dubai, UAEAED 4,000-10,000/moToday
UAEIT & Technology
Data Engineer: PySpark, Data Modeling & Scalable Pipelines
Global Software Solutions Group
Dubai, UAEAED 7,000-18,000/moToday
UAEIT & Technology
Foundry & AIP Engineer: AI Workflows & Data Pipelines
Walter
Abu Dhabi, UAEAED 7,000-18,000/moToday
UAEIT & Technology
Middle East Alliances Partner Manager
Salesforce
Saudi ArabiaAED 8,000-20,000/mo≈ SAR 8.2K-20.4K/moToday
Saudi ArabiaIT & Technology
Technical Support Engineer-KSA
Streamax Mena
Abu Dhabi, UAEAED 7,000-18,000/moToday
UAEIT & Technology