עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
Guardio is on a mission to redefine consumer cybersecurity for the modern internet.
We operate at consumer scale, protecting millions of people every day across devices, accounts, and digital touchpoints. In a world where phishing, fraud, and AI-powered scams evolve overnight, Guardio stays ahead of the curve.
We move fast, think deeply, and build with purpose. Our culture is rooted in transparency, feedback, and collaboration along with shared wins, team dinners, company trips, and good times.
We’re a team of 100+ makers, doers, and boundary-breakers. If you’re ready to tackle meaningful challenges, grow at lightning speed, and help shape the next frontier of online safety, you belong here.
Let's cut to the chase. What's the job?
We're looking for a Site Reliability Engineer to own and establish Guardio's production reliability practice - across observability, alerting, SLOs, and incident response - and build it to support our next phase of scale. Your work will define how over a million users experience Guardio's product, how our engineers sleep at night, and how we build a production environment that's as resilient as the security product we deliver.
You will:
- Define SLIs and SLOs with engineering leaders - translate reliability goals into measurable, actionable objectives across our key services. Help teams understand what good looks like in production.
- Build AI-powered reliability tools - use LLMs and agents to correlate alerts, accelerate root cause analysis, and build a copilot for on-call engineers. AI is your force multiplier.
- Improve observability across teams - build dashboards, tune alert thresholds, reduce noise, and ensure on-call means getting paged for the right reasons. Make observability actually useful.
- Design and own on-call - establish our rotation, define escalation policies, write runbooks. Then build automated agents that monitor and begin mitigation before a human is even paged.
- Automate toil, aggressively (create skills) - identify recurring manual operational work and replace it systematically. Not just scripts- intelligent automation that learns from incidents.
- Own post-mortems - build a culture of learning from incidents. What broke, why, and what gets built to prevent recurrence.
- Contribute to the full platform - CI/CD safety, deployment rollback, feature flags. Anything that helps engineers ship faster with less risk to our users.
We're not checking boxes. We're looking for a specific kind of person.
You're probably a great fit if:
- You're a builder at heart. You don't just operate systems - you build the tools that make systems better. You have something to show: a repo, a demo, a post-mortem you wrote, a system you built because it needed to exist.
- You have strong software engineering roots. You've written production code. You understand distributed systems, APIs, and failure modes from the inside out.
- You think in outcomes, not tasks. "I resolved the incident" is not a win. "I reduced MTTR by 50% and prevented the same incident from ever happening again" - that's a win.
- You're AI-native. You already use AI tools to move faster. You've probably built something with LLM APIs, LangChain, or custom agents. And critically: you know when to verify the output before trusting it.
- You make good calls under uncertainty. You've been the person in the room when things were broken and the data was unclear. You didn't freeze.
Don't mind if we do. Our tech stack:
- Cloud: GCP - GKE, Pub/Sub, BigQuery, Cloud Functions. We are all-in.
- CI/CD: Github actions, Terraform + Argo CD
- Observability: Datadog (our primary)
- Languages: Python, TypeScript, GO
- Data: MySQL, Redis, BigQuery, ClickHouse
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
שאלות ותשובות עבור משרת Site Reliability Engineer
התפקיד המרכזי של מהנדס/ת אמינות אתר ב-Guardio הוא להקים ולנהל את נוהלי אמינות הייצור של החברה, כולל ניטור, התראות, יעדי רמת שירות (SLOs) ותגובה לאירועים. המטרה היא לתמוך בשלב הצמיחה הבא של Guardio ולהבטיח חווית משתמש חלקה עבור מיליוני משתמשים.
משרות נוספות מומלצות עבורך
-
Site Reliability Engineer
-
מיקום לא צוין
optimove
-
-
Senior Site Reliability Engineer
-
תל אביב - יפו
Nebius
-
-
Sr Staff Site Reliability Engineer
-
תל אביב - יפו
Palo Alto Networks
-
-
Site Reliability Engineer | AI Infrastructure
-
תל אביב - יפו
JLL
-
-
Site Reliability Engineer
-
מיקום לא צוין
Optimove
-
-
Site Reliability Engineer | AI Infrastructure
-
תל אביב - יפו
JLL
-
25,000-35,000 ₪