More details will be shared during screening.
Job Description:
As a Site Reliability Engineer at a global organisation, you will be instrumental in ensuring the availability, scalability, and security of critical distributed systems. Working within a dynamic team passionate about operational excellence, you'll help design and maintain reliable infrastructure while driving continuous improvement across production environments.
Your expertise in Kubernetes, monitoring, and incident management will support our mission to deliver seamless, secure services that empower users worldwide.
Job Requirements:
What You’ll Bring
We’re looking for candidates with a strong foundation in site reliability principles and hands-on experience in the following areas:
- Proficiency with Kubernetes (K8s) and container orchestration at scale
- Experience implementing monitoring and observability tools such as Prometheus, Grafana, and Azure Monitor
- Strong skills in incident management, root cause analysis, and capacity planning
- Expertise working with distributed systems in production and secure environments
- Familiarity with cloud-native telemetry and performance optimization techniques
Benefits:
Why Join Us?
Enjoy a collaborative culture that values personal growth, innovation, and work-life balance. Benefit from flexible working arrangements, a supportive team environment, and opportunities to expand your skills through ongoing learning initiatives.
Required Skills:
Incident ManagementRoot Cause AnalysisCapacity PlanningGrafanaKubernetesPrometheusAzure MonitorDistributed Systems
Optional Skills:
MonitoringSecurityTelemetryAzurePerformance optimization
Posted by:
AK
Abhignan K
ak@hackertrail.com