עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
Responsibilities:
Incident Management: Manage and respond to incidents efficiently to minimize the impact on services and meet SLA commitments. Perform root cause analysis and document findings.
Operational Monitoring: Use tools like Datadog, Coralogix, Grafana, Prometheus, and AWS CloudWatch to monitor AWS infrastructure. Ensure system availability, performance, and compliance with security standards.
24/7 Shifts: Be part of a 24/7 operations team, ensuring smooth operations around the clock by monitoring, detecting, and resolving issues in real time.
Communication: Provide timely updates during incidents, and document events to support future improvements and proactive measures.
Continuous Improvement: Collaborate with the team to drive automation initiatives and improve operational workflows, using infrastructure as code to optimize system reliability and scalability.
SLA Monitoring: Ensure adherence to SLAs, monitor performance metrics, and suggest improvements when necessary.
Requirements:
Familiarity with monitoring tools such as Datadog, Coralogix, Grafana, Prometheus, and AWS CloudWatch.
Excellent incident management skills, with the ability to work under pressure and handle multiple tasks.
Understanding of SLA concepts and experience in ensuring SLA adherence.
Familiarity with automation and infrastructure as code (Terraform, Python).
Strong analytical skills for summarizing incidents and identifying areas for future improvement.
Advantage:
Strong knowledge of AWS infrastructure, including EC2, S3, VPC, and Lambda.
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.