JobsAisle
S

Operation Architect

Starlink Qatar

Doha, QatarQAR 8,400-23,100/moToday
QatarIT & TechnologyFull Time

Skills Required

AwsAzureKubernetesDevopsErpLeadership

Job Description

The Operations Architect defines and governs the operational model for enterprise platform capabilities delivered by multiple vendors, ensuring solutions are production-ready, observable, secure, and supportable at scale. The role designs end-to-end service management practices (SLOs/SLAs, monitoring, incident/change/problem management, DR, and capacity/cost controls) and ensures operational requirements are embedded from design through delivery.Working with platform/cloud, security, and solution architects, as well as vendor teams and operations teams, the architect drives operations readiness reviews, creates runbooks and support processes, and enables a consistent, efficient operating model across cloud-agnostic deployments.Duties & ResponsibilitiesDefine operational architecture and service management model across capabilities (ITIL-aligned where applicable).Establish observability standards: metrics/logs/traces/audits, OpenTelemetry instrumentation, dashboarding, alerting, and anomaly detection.Define SLOs/SLAs/OLAs, error budgets, and operational KPIs; ensure vendors deliver evidence and meet acceptance gates.Design incident management workflows (triage, escalation, RCA), integrate with ITSM, and standardize runbooks/playbooks.Define change and release management practices (CAB inputs, deployment rings, canary/rollback, feature flags coordination).Establish resiliency and DR requirements: backup/restore patterns, RPO/RTO targets, DR testing cadence, and failover runbooks.Define capacity, performance, and availability engineering processes (load testing, scaling policies, GPU/TPU capacity planning).Implement security operations integration: SIEM/SOAR alignment, alert routing, vulnerability/patch management SLAs.Define FinOps operational controls: tagging standards, showback/chargeback, budgets, anomaly detection, cost optimization playbooks.Lead operational readiness and handover: L1/L2/L3 training, reverse-shadowing, SOPs, and post-go-live stabilization plans.Skills & AbilitiesStrong expertise in operating cloud-native platforms: SRE/ITIL practices, reliability engineering, and service management.Ability to turn NFRs into measurable SLOs, monitoring, and operational acceptance criteria.Solid understanding of observability stacks and telemetry design (OTel, APM, SIEM integration).Experience designing DR/BCP, backup strategies, and operational test plans in regulated environments.Proven capability to drive operational standardization across multiple vendors and teams.Education & BackgroundBachelor’s degree in Computer Science, Information Technology, Cybersecurity, or related field; Master’s degree highly preferred.8+ years in operations architecture, SRE, DevOps leadership, or service management for enterprise platforms.Experience running production systems on Azure plus exposure to at least one other cloud (GCP/AWS) and hybrid setups.Experience with ITSM tooling and processes (incident/change/problem, CMDB), including KPI/SLA reporting.Proven experience with monitoring/APM and security operations integration (SIEM, vulnerability management).Certifications desirable: ITIL, SRE-related training, Azure/AWS/GCP ops certs, Kubernetes CKA/CKS (optional).Preferred Tools / Soft SkillsPreferred ToolsITSM & operations: ServiceNow (or equivalent), CMDB, PagerDuty/Opsgenie-style on-call toolingSecurity & cloud ops: Microsoft Sentinel, Defender for Cloud, Azure Monitor/Log Analytics, Kubernetes toolingSoft SkillsCalm, structured leadership during incidents and high-pressure escalationsStrong facilitation skills for readiness reviews, RCAs, and cross-vendor alignmentClear documentation and operational discipline (runbooks, SOPs, checklists)Continuous improvement mindset and ability to drive measurable reliability gainsStrong collaboration and influencing skills across engineering, security, and vendor teams#J-18808-Ljbffr