JobsAisle
A

Site Reliability Engineer

Avrioc Technologies

Abu Dhabi, UAEAED 7,000-18,000/moYesterday
UAEIT & TechnologyFull Time

Skills Required

PythonSqlAwsAzureKubernetesExcelDevopsErpCommunication

Job Description

We’re looking for a Seasoned DevOps & Site Reliability Engineer (SRE) Lead to design, scale, and enhance our cloud infrastructure and observability ecosystem.If you’re passionate about automation, resilience, and reliability — this role is for you!Architect and deploy scalable, highly available cloud infrastructure for production workloads.Lead and implement SRE best practices, ensuring system reliability, performance, and scalability.Oversee and optimize CI/CD pipelines (Jenkins, Argo CD or similar) for seamless deployments.Define and monitor SLOs & SLIs to ensure service reliability and uptime.Design and manage observability frameworks — monitoring, logging, and alerting (Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic).Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases.Implement auto-healing and proactive monitoring systems to prevent outages.Drive fault injection testing & chaos engineering (Chaos Mesh, Litmus, AWS FIS) for resilience validation.Collaborate with engineering and product teams to embed reliability into every phase of development.Maintain clear documentation on infrastructure, incidents, and operational processes.8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations.Hands‑on with AWS, GCP, or Azure (EC2, S3, RDS, Lambda, etc.).Strong with IaC tools (Terraform, CloudFormation, Ansible).Proven experience in CI/CD automation, monitoring, and incident response.Skilled in observability tools — Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic.Experience with AWS managed & self‑managed databases (MySQL, Cassandra, etc.).Skilled in Python, Bash, or Go scripting.Experience designing and testing BCP/DR strategies.Proactive in capacity planning, ensuring scalability and resilience across cloud environments.Excellent communication, documentation, and troubleshooting skills.Comply with Avrioc’s Information Security & Service Management policies.Maintain the confidentiality and integrity of all information assets.Attend mandatory information security trainings.Report any security incidents through official channels.#J-18808-Ljbffr