Site Reliability Engineer at A global organisation

More details will be shared during screening.

Job Description:

As a Site Reliability Engineer at a global organisation, you will be instrumental in ensuring the availability, scalability, and security of critical distributed systems. Working within a dynamic team passionate about operational excellence, you'll help design and maintain reliable infrastructure while driving continuous improvement across production environments.

Your expertise in Kubernetes, monitoring, and incident management will support our mission to deliver seamless, secure services that empower users worldwide.

Job Requirements:

What You’ll Bring

We’re looking for candidates with a strong foundation in site reliability principles and hands-on experience in the following areas:

Proficiency with Kubernetes (K8s) and container orchestration at scale
Experience implementing monitoring and observability tools such as Prometheus, Grafana, and Azure Monitor
Strong skills in incident management, root cause analysis, and capacity planning
Expertise working with distributed systems in production and secure environments
Familiarity with cloud-native telemetry and performance optimization techniques

Benefits:

Why Join Us?

Enjoy a collaborative culture that values personal growth, innovation, and work-life balance. Benefit from flexible working arrangements, a supportive team environment, and opportunities to expand your skills through ongoing learning initiatives.

Required Skills:

Incident ManagementRoot Cause AnalysisCapacity PlanningGrafanaKubernetesPrometheusAzure MonitorDistributed Systems

Optional Skills:

MonitoringSecurityTelemetryAzurePerformance optimization

Posted by:

Abhignan K

ak@hackertrail.com