Senior Site Reliability Engineer

NTT DATA Corporation

Pune, India₹20,000–₹50,000/mo≈ AED 880-2.2K/moToday

IndiaAutomationMonitoringCapacity PlanningDocumentationAnsiblePuppetPythonBashMonitoring ToolsGoogle Cloud Platform GCPGoogle BIAIML ToolsLookerBigQuery MLVertex AIRedHat OpenShift AdministrationIncident ResponseTerraformCICD PipelinesNetworking ConceptsFull Time

Skills Required

PythonAzureKubernetesGitDevopsCommunication

Job Description

Job Description Role Overview: As a Senior Site Reliability Engineer (SRE) at NTT DATA America, you will play a crucial role in ensuring the reliability, performance, and scalability of both on-premises and cloud-based systems. You will leverage your expertise in Google Cloud Platform (GCP), Google BI, AI/ML tools, and RedHat OpenShift to design, implement, and manage cloud infrastructure. Your focus will be on reducing costs for Google Cloud while collaborating closely with development and operations teams to enhance system reliability and performance. Key Responsibilities: - Ensure the reliability and uptime of critical services and infrastructure to maintain system stability. - Design, implement, and manage cloud infrastructure using Google Cloud services to optimize performance. - Develop and maintain automation scripts and tools to enhance system efficiency and reduce manual intervention. - Implement monitoring solutions and respond to incidents promptly to minimize downtime and ensure quick recovery. - Collaborate with cross-functional teams to improve system reliability and performance through effective communication. - Conduct capacity planning and performance tuning to ensure systems can accommodate future growth. - Create and maintain comprehensive documentation for system configurations, processes, and procedures to ensure transparency and knowledge sharing. Qualifications: - 5+ years of experience in site reliability engineering or a similar role. - Proficiency in Google Cloud services (Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery, Pub/Sub, etc.). - Experience with Google BI and AI/ML tools (Looker, BigQuery ML, Vertex AI, etc.). - Proficiency in automation tools such as Terraform, Ansible, and Puppet. - Experience with CI/CD pipelines and tools like Azure pipelines, Jenkins, GitLab CI, etc. - Strong scripting skills in Python, Bash, etc. - Previous experience with networking concepts and protocols. - Experience with monitoring tools like Prometheus, Grafana, etc. Preferred Qualifications: - Google Cloud Professional DevOps Engineer certification. - Google Cloud Professional Cloud Architect certification. - Red Hat Certified Engineer (RHCE) or similar Linux certification. - Bachelors degree in computer science, Engineering, or a related field. Please note: Shift Timing Requirement is 1:30 pm IST - 10:30 pm IST. Role Overview: As a Senior Site Reliability Engineer (SRE) at NTT DATA America, you will play a crucial role in ensuring the reliability, performance, and scalability of both on-premises and cloud-based systems. You will leverage your expertise in Google Cloud Platform (GCP), Google BI, AI/ML tools, and RedHat OpenShift to design, implement, and manage cloud infrastructure. Your focus will be on reducing costs for Google Cloud while collaborating closely with development and operations teams to enhance system reliability and performance. Key Responsibilities: - Ensure the reliability and uptime of critical services and infrastructure to maintain system stability. - Design, implement, and manage cloud infrastructure using Google Cloud services to optimize performance. - Develop and maintain automation scripts and tools to enhance system efficiency and reduce manual intervention. - Implement monitoring solutions and respond to incidents promptly to minimize downtime and ensure quick recovery. - Collaborate with cross-functional teams to improve system reliability and performance through effective communication. - Conduct capacity planning and performance tuning to ensure systems can accommodate future growth. - Create and maintain comprehensive documentation for system configurations, processes, and procedures to ensure transparency and knowledge sharing. Qualifications: - 5+ years of experience in site reliability engineering or a similar role. - Proficiency in Google Cloud services (Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery, Pub/Sub, etc.). - Experience with Google BI and AI/ML tools (Looker, BigQuery ML, Vertex AI, etc.). - Proficiency in automation tools such as Terraform, Ansible, and Puppet. - Experience with CI/CD pipelines and tools like Azure pipelines, Jenkins, GitLab CI, etc. - Strong scripting skills in Python, Bash, etc. - Previous experience with networking concepts and protocols. - Experience with monitoring tools like Prometheus, Grafana, etc. Preferred Qualifications: - Google Cloud Professional DevOps Engineer certification. - Google Cloud Professional Cloud Architect certification. - Red Hat Certified Engineer (RHCE) or similar Linux certification. - Bachelors degree in computer science, Engineering, or a related field. Please note: Shift Timing Requirement is 1:30 pm IST - 10:30 pm IST.