עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
Responsibilities:
* Champion observability best practices across engineering teams and heterogeneous system architectures.
* Lead the design and implementation of distributed tracing across hundreds of microservices, including asynchronous communication patterns (e.g., Kafka).
* Collaborate with service owners to define SLA/SLO targets and implement effective monitoring, alerting, and dashboards using tools like Prometheus, Grafana, and Elastic Stack.
* Operate, maintain, and enhance our observability stack, with a focus on:
* Elastic Stack (Elasticsearch, Logstash, Kibana) for logging.
* Grafana for visualization.
* Prometheus for metrics.
* Integrate applications with APM solutions, with a strong preference for Elastic APM.
* Use programming skills (e.g., Python, Go, JAVA, scripting languages) to:
* Develop custom tooling and integrations to enhance observability and automate SRE workflows.
* Build advanced dashboards and data aggregation pipelines for deep system insights.
* Collaborate with development teams to embed observability and reliability into the SDLC, establishing standards and best practices.
* Participate in and help mature our incident response processes, including leading post-mortems.
* Proactively identify and resolve performance bottlenecks and system inefficiencies.
* Mentor engineers on SRE principles and observability techniques.
Country:
Israel
City:
Herzliya
Qualifications (Must-Haves):
* Proven experience in Platform Engineering, SRE, or a similar role with a strong focus on observability in distributed environments.
* Demonstrated success in driving observability adoption across multiple teams.
* Deep knowledge of Kubernetes operations, including Helm-based deployments, monitoring, and troubleshooting.
* Hands-on experience with distributed tracing in asynchronous systems (Kafka, etc.).
* Expertise in defining and tracking SLOs/SLIs and building alerting and dashboarding strategies.
* Proficiency with observability tools such as Elastic Stack, Grafana, and Prometheus.
* Strong experience with APM tools, especially Elastic APM.
* Solid programming/scripting skills (e.g., JAVA /Kotlin, Python, JavaScript /TypeScript).
* Familiarity with diverse application infrastructures and build systems ( JAVA /Maven, Python, C #).
* Experience with CI/CD pipelines (GitLab preferred) and code analysis tools (e.g., SonarQube, Artifactory/Xray).
* Excellent communication and collaboration skills. Qualifications (Highly Desirable):
* Experience contributing to broader platform engineering initiatives (e.g., service catalog portals).
* Proven success in implementing self-service infrastructure provisioning.
* Experience with standardization initiatives and creating code templates/scaffolding.
* Familiarity with public cloud platforms, especially AWS.
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.