עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
NVIDIA is seeking a technical leader to define, craft, implement, and guide firmware architecture for reliability, availability, serviceability, and power management across next-generation NVIDIA Networking products and platforms. You will take a strong hands-on role, working with hardware, firmware, software, validation, customer engineering, and external partners to build robust, diagnosable, power-efficient systems for large-scale deployments.
NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI, with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as the AI computing company. We are looking to grow our teams with the smartest people in the world. If you're creative and autonomous, we want to hear from you!
What you'll be doing
- Define platform-level firmware architecture for RAS and power management across SoCs, accelerators, DPUs, servers, embedded systems, and data center platforms.
- Own error detection, classification, containment, recovery, escalation, and reporting architecture.
- Define firmware architecture for power sequencing, power states, reset flows, thermal and power fault handling, idle management, and recovery from power-related failures.
- Create firmware specifications for hardware error handling, health monitoring, crash capture, telemetry, diagnostics, debug data, and field serviceability.
- Define interfaces and contracts between firmware, hardware, operating systems, BMCs, management controllers, platform software, and cloud/service infrastructure.
- Drive architecture reviews, tradeoff discussions, failure-mode analysis, validation strategy, and long-term RAS and power management roadmap planning.
- Establish standards for error logs, event schemas, telemetry flows, recovery policies, service diagnostics, and production debug infrastructure.
- Guide engineering teams through implementation, validation, silicon bring-up, platform integration, and production deployment of RAS and power management features.
- Analyze customer and field failures, identify architectural gaps, and feed lessons learned into future platform designs.
- BSc, MS, or PhD in Electrical Engineering, Computer Science, Computer Engineering, or equivalent experience.
- 7+ years of relevant experience in firmware, platform architecture, embedded systems, or low-level systems software.
- Deep understanding of RAS principles, fault modeling, error containment, recovery policies, diagnosability, and serviceability requirements.
- Experience architecting firmware for complex hardware platforms such as SoCs, accelerators, DPUs, servers, networking devices, or embedded systems.
- Strong knowledge of power management concepts, including power sequencing, reset architecture, thermal and power fault handling, power state transitions, and platform recovery flows.
- Familiarity with boot firmware, UEFI/BIOS, BMC, embedded controllers, RTOS, embedded Linux, or platform management stacks.
- Strong understanding of hardware/software interfaces, registers, interrupts, telemetry paths, debug infrastructure, and firmware-to-hardware contracts.
- Programming and debugging fundamentals across languages such as C/C++, Python/Perl scripting, Verilog, assembly, or RISC-V assembly.
- Ability to lead cross-functional architecture discussions and drive alignment across hardware, firmware, software, validation, product, and customer-facing teams.
- Excellent communication skills, strong technical leadership, and a real passion for working collaboratively.
- Experience with PCIe AER, CXL RAS, memory RAS, ECC/parity, accelerator RAS, networking RAS, high-availability systems, or large-scale data center platforms.
- Knowledge of ACPI, SMBIOS, UEFI, PLDM, MCTP, Redfish, IPMI, or cloud telemetry systems.
- Experience with power/thermal fault handling, dynamic power management, platform power sequencing, low-power states, or autonomous recovery mechanisms.
- Background in silicon bring-up, platform validation, production diagnostics, or customer failure analysis.
- Prior technical leadership experience as a firmware architect, principal engineer, platform lead, or domain owner.
, , JR2018727
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
שאלות ותשובות עבור משרת Senior RAS and Power Management Firmware Architect
התפקיד המרכזי של Senior RAS and Power Management Firmware Architect ב-NVIDIA הוא להגדיר, לתכנן, ליישם ולהוביל את ארכיטקטורת הקושחה לאמינות, זמינות, שירותיות (RAS) וניהול צריכת חשמל במוצרי ופלטפורמות הרשת מהדור הבא של NVIDIA. זה כולל עבודה צמודה עם צוותי חומרה, קושחה, תוכנה ואימות כדי לבנות מערכות חזקות ויעילות אנרגטית לפריסות בקנה מידה גדול.
משרות נוספות מומלצות עבורך
-
Senior RAS and Power Management Firmware Architect
-
יקנעם עילית
NVIDIA
-
-
Senior RAS and Power Management Firmware Architect
-
תל אביב - יפו
NVIDIA AI
-
-
Senior RAS and Power Management Firmware Architect
-
תל אביב - יפו
NVIDIA
-
-
Senior Firmware Architect, NVLink
-
תל אביב - יפו
NVIDIA
-
-
Senior Firmware Micro-Architect
-
תל אביב - יפו
NVIDIA
-
-
Senior Firmware Micro-Architect
-
יקנעם עילית
NVIDIA
-