JobsAisle
B

Site Reliability Engineer SRE

BHFT

Ajman, UAEAED 16,667-25,000/moYesterday
UAEIT & TechnologyFull Time

Skills Required

PythonEnglish

Job Description

<div><p>We are looking for a Site Reliability Engineer who will be responsible for ensuring the reliable operation of our platform working with metrics to improve production process efficiency and participating in testing new product versions.</p><h3>Responsibilities</h3><ul><li><p>Production Stability Management: Ensure continuous compliance with external regulatory requirements and internal standards including risk security technology and trader needs. Support and automate validation and monitoring processes for adherence to necessary standards.</p></li><li><p>Incident Monitoring&Management: Develop and improve monitoring and alerting systems to detect anomalies in key production metrics. Implement rapid response mechanisms and efficient solutions to maintain strategy performance.</p></li><li><p>Release&Change Management: enforce standards for managing releases and changes to minimize deployment risks. Implement strict acceptance testing for all releases.</p></li><li><p>Process Management: Develop and maintain Standard Operating Procedures (SOPs) for the team manage task queues and organize shift schedules to ensure continuous support and high availability of trading strategies.</p></li><li><p>Integration Projects: Lead initiatives to connect with new exchanges brokers and trading platforms ensuring smooth and secure service integration.</p></li><li><p>Technical Performance Optimization: Continuously improve system availability resilience (MTTR MTBF) and latency reduction while optimizing data exchange performance and order routing to maximize profitability.</p></li></ul><p><b>Qualifications</b></p><h3>Requirements</h3><ul><li>Deep understanding of trading processes and market microstructure including colocation trading on native exchange protocols and algorithmic trading.</li><li>Experience in monitoring alerting systems and incident management for highload environments.</li><li>Knowledge of regulatory compliance and security standards.</li><li>Proficiency in monitoring and incident management tools such as Grafana ClickHouse Prometheus Opsgenie Grafana OnCall PagerDuty etc.</li><li>Experience developing and managing SOPs and KPIs for service teams.</li><li>Experience managing integration projects with brokers and exchanges.</li></ul><p><b>Strong technical skill set including:</b></p><ul><li>Linux systems administration and optimization.</li><li>TCP/UDP multicast networking.</li><li>FIXbased and native exchange protocols</li><li>Colocation infrastructure setup and management.</li><li>Python scripting for automation and monitoring.</li><li>English proficiency at C1 level or higher.</li></ul><h3>Remote Work</h3><p>Yes</p><h3>Employment Type</h3><p>Fulltime</p></div>#J-18808-Ljbffr