עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

Final

הרצליה

Final

הרצליה

Final Israel Ltd. is one of the world’s leading high-frequency trading (HFT) companies.

We use proprietary prediction and trading algorithms as well as highly innovative schemes for handling large amounts of data. As a major participant in the HFT industry, the challenge Final faces is two-fold: to analyze large and complex data sets off-line, as well as to process massive flows of real-time data.

We are looking for a passionate, versatile, and experienced SRE engineer to join our DevOps team.

Our 20+ sites world-wide production environments run financial transactions in unparalleled scale, taking DR, HA, monitoring, log analysis, streaming services, and more cool technologies, to the highest level.

As the first professional SRE engineer you will have the opportunity of leading the design, implementation, and maintenance of this highly reliable and scalable infrastructure. You will mentor and guide other team members, drive process improvements, and collaborate with cross-functional teams to deliver exceptional performance and uptime for our systems.

Responsibilities:

Lead the design, implementation, and maintenance of highly available and fault-tolerant systems – multi-region, multi-clusters k8s, Kafka, docker reg, and more.
Collaborate with IT, DevOps and Dev teams to ensure effective monitoring, alerting, and logging systems to proactively identify and resolve issues.
Automate operational tasks to streamline deployment, configuration, and management processes.
Conduct incident response, root cause analysis, and implement preventive measures.
Stay up to date with industry trends and emerging technologies related to site reliability engineering.

Requirements:

Bachelor’s degree in computer science, Engineering, or a related field.
2+ years of experience as a Site Reliability Engineer or a similar role.
Coding experience in one or more of: Python, Bash, Groovy, java, GO and similar.
Vast experience with self-hosted K8S - debugging, installation, upgrade, monitoring, logging, and operation.
Experience with monitoring and logging tools – ELK, Grafana, Prometheus, etc.
Experience with REPOs and registries - GIT, Artifactory, docker registry etc.
Experience working with automated IaC and CM tools (terraform, Salt, Puppet, Ansible, packer, Helm etc.).
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration abilities.
Strong leadership and mentorship abilities, with the desire to help develop and grow NOC and operations teams to an SRE\DEVOPS state of mind.

Advantages: