Lead Site Reliability Engineer

Royal Bank of Canada
Toronto, CA; US

Job Description

Job Description

What is the opportunity?

Join RBC as a Lead Site Reliability Engineer and take the lead in ensuring the reliability, scalability, and performance of our critical production systems and infrastructure. This is your chance to drive innovation through cutting-edge engineering practices, automation, and process optimization. Collaborate with cross-functional teams, manage key vendor relationships, and tackle complex, high-stakes challenges in a dynamic and supportive environment. With a focus on operational excellence and compliance, this role offers the opportunity to make a meaningful impact at one of the world’s most respected financial institutions. If you’re a visionary leader with expertise in modern infrastructure technologies and a passion for solving complex problems, this is your opportunity to elevate your career while shaping the future of RBC’s technology landscape.

What will you do?

  • Lead strategic direction for 4,000 ATM fleet operations, ensuring 99.7% availability
  • Drive continuous improvement and process optimization across ATM operations
  • Lead technology upgrade and operational change management initiatives
  • Serve as primary relationship owner for vendor field services, maintenance, and support.
  • Develop strategic partnerships with vendors and internal technology teams
  • Deliver executive-level reporting and communication to senior leadership and business stakeholders
  • Establish and monitor performance metrics, SLAs, and KPIs for vendor and operational excellence
  • Define and maintain ATM-specific SLOs/SLAs/SLIs (e.g., transaction success rates, uptime, latency)
  • Ensure end-to-end reliability of ATM ecosystem (hardware, software, network)
  • Oversee regulatory compliance, security standards, and audit requirements with full accountability
  • Manage risk, business continuity, and disaster recovery planning
  • Act as final escalation point for critical outages and emergency response coordination
  • Lead 24/7 incident response, ensuring rapid resolution of customer-impacting issues
  • Perform RCA for AI/ML-related incidents and implement preventive measures
  • Implement real-time monitoring, alerting, and observability tools (hardware, transactions, network)
  • Automate routine tasks (software updates, configurations, log analysis)
  • Collaborate with data scientists, engineers, and operations teams on complex issues
  • Align daily standups/project calls with development, QE, and management teams.

What will you need to succeed?

Must have:

  • Bachelor’s degree in business administration, Information Technology, Engineering, or related field
  • Minimum 5-7 years of experience in ATM technology management or financial services technology
  • Proven experience managing vendor relationships and large-scale technology operations
  • Strong knowledge of Vendor and other ATM technology platforms and capabilities
  • Demonstrated leadership experience managing cross-functional teams and stakeholders
  • Knowledge of banking regulations, compliance requirements, and security standards
  • Experience with budget management, financial analysis, and cost optimization
  • Experience with ITIL framework and service management best practices
  • Strong project management skills with experience in technology deployment projects
  • Experience with PowerShell scripting and automation concepts (intermediate level)
  • Knowledge of SCCM and remote management technologies
  • Knowledge of cloud technologies and hybrid infrastructure management

Nice-to-have:

  • Experience managing and optimizing large-scale ATM ecosystems, including hardware, software, and network infrastructure, to ensure seamless operations.
  • Familiarity with financial transaction processing systems, including payment networks and protocols (e.g., ISO 8583), to support secure and efficient transactions.
  • Hands-on experience with cloud-based infrastructure and hybrid environments, leveraging tools like Kubernetes, Docker, or Terraform for scalability and automation.
  • Proficiency with advanced monitoring and observability tools (e.g., Prometheus, Grafana, Splunk) to enhance system reliability, performance, and proactive issue resolution.

What’s in it for you?

  • Lead and shape the reliability, scalability, and performance of RBC’s critical production systems, directly impacting millions of customers.
  • Work with cutting-edge technologies and drive innovation in a high-availability, mission-critical environment.
  • Collaborate with a diverse, talented team of professionals across development, security, quality assurance, and operations to solve complex challenges.
  • Access unparalleled professional growth opportunities, including leadership development, technical training, and exposure to large-scale, complex systems.
  • Thrive in a supportive and inclusive workplace culture that values your expertise, fosters innovation, and recognizes your contributions.
  • Enjoy competitive compensation, comprehensive benefits, and a strong focus on work-life balance

Skills & Requirements

Technical Skills

PowerShellSCCMcloud technologieshybrid infrastructure managementITIL frameworkservice management best practicesbudget managementfinancial analysiscost optimizationVendor and other ATM technology platformsbanking regulationscompliance requirementssecurity standardsproject managementautomation conceptsintermediate level scriptingleadershipcollaborationcommunicationproblem-solvingteam managementstakeholder managementfinancetechnologyoperationssecurityquality assurancedevelopment

Level

mid

Posted

4/5/2026

Apply Now

You will be redirected to Royal Bank of Canada's application portal.