Site Reliability Engineer

עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

Guardio

תל אביב - יפו

Guardio

תל אביב - יפו
מלאה, היברידית
25,000-35,000 ₪ הערכה מבוססת AI ולא שכר שהתקבל מהמעסיק
הערכה מבוססת AI ולא שכר של המעסיק

Guardio is on a mission to redefine consumer cybersecurity for the modern internet.

We operate at consumer scale, protecting millions of people every day across devices, accounts, and digital touchpoints. In a world where phishing, fraud, and AI-powered scams evolve overnight, Guardio stays ahead of the curve.

We move fast, think deeply, and build with purpose. Our culture is rooted in transparency, feedback, and collaboration along with shared wins, team dinners, company trips, and good times.

We’re a team of 100+ makers, doers, and boundary-breakers. If you’re ready to tackle meaningful challenges, grow at lightning speed, and help shape the next frontier of online safety, you belong here.

Let's cut to the chase. What's the job?

We're looking for a Site Reliability Engineer to own and establish Guardio's production reliability practice - across observability, alerting, SLOs, and incident response - and build it to support our next phase of scale. Your work will define how over a million users experience Guardio's product, how our engineers sleep at night, and how we build a production environment that's as resilient as the security product we deliver.

You will:

Define SLIs and SLOs with engineering leaders - translate reliability goals into measurable, actionable objectives across our key services. Help teams understand what good looks like in production.
Build AI-powered reliability tools - use LLMs and agents to correlate alerts, accelerate root cause analysis, and build a copilot for on-call engineers. AI is your force multiplier.
Improve observability across teams - build dashboards, tune alert thresholds, reduce noise, and ensure on-call means getting paged for the right reasons. Make observability actually useful.
Design and own on-call - establish our rotation, define escalation policies, write runbooks. Then build automated agents that monitor and begin mitigation before a human is even paged.
Automate toil, aggressively (create skills) - identify recurring manual operational work and replace it systematically. Not just scripts- intelligent automation that learns from incidents.
Own post-mortems - build a culture of learning from incidents. What broke, why, and what gets built to prevent recurrence.
Contribute to the full platform - CI/CD safety, deployment rollback, feature flags. Anything that helps engineers ship faster with less risk to our users.

Sounds great! Am I the right fit?

We're not checking boxes. We're looking for a specific kind of person.

You're probably a great fit if:

You're a builder at heart. You don't just operate systems - you build the tools that make systems better. You have something to show: a repo, a demo, a post-mortem you wrote, a system you built because it needed to exist.
You have strong software engineering roots. You've written production code. You understand distributed systems, APIs, and failure modes from the inside out.
You think in outcomes, not tasks. "I resolved the incident" is not a win. "I reduced MTTR by 50% and prevented the same incident from ever happening again" - that's a win.
You're AI-native. You already use AI tools to move faster. You've probably built something with LLM APIs, LangChain, or custom agents. And critically: you know when to verify the output before trusting it.
You make good calls under uncertainty. You've been the person in the room when things were broken and the data was unclear. You didn't freeze.

Talk nerdy to me.

Don't mind if we do. Our tech stack: