Job Description
What is the opportunity?
As a Staff Engineer within RBC Borealis' Lumina Production Engineering team, you will be responsible for providing expert-level support for critical systems and services, you will also serve as a technical leader responsible for developing and maintaining the critical infrastructure that powers RBC's next-generation AI and innovation platforms.
The role demands strong problem-solving capabilities, architectural thinking, and the ability to mentor technical teams while managing high-impact incidents.
This is an ideal fit for experienced technical professionals seeking to leverage their expertise in a fast-paced, innovative environment while driving operational excellence across RBC's AI infrastructure.
Production Engineers at RBC Borealis build the foundational systems that enable every major innovation initiative within Lumina's portfolio. Working alongside industry-leading engineers within RBC's innovation hub, you'll contribute to code and systems that directly power breakthrough AI capabilities, advanced analytics platforms, and next-generation customer experiences.
What will you do?
Production Infrastructure Ownership
- Own and operate backend services that power Lumina's AI/ML platforms, real-time analytics engines, and experimental customer-facing applications
- Operate the infrastructure components and provide tier 3 and 4 technical support that drive RBC Borealis' advances in artificial intelligence, machine learning, and data science initiatives
- Manage core services including large-scale data processing pipelines, model serving infrastructure, feature stores, and high-throughput API gateways
- Partner with innovation teams to ensure seamless integration of experimental technologies into RBC's enterprise architecture
Technical Leadership & Engineering Excellence
- Lead engineering initiatives by example, mentoring team members and driving technical excellence across Borealis innovation projects
- Write, review, and optimize code that operates at enterprise scale within fast-moving innovation cycles
- Develop comprehensive documentation, capacity planning models, and operational runbooks for rapidly evolving systems
- Debug complex production issues live on cutting-edge AI/ML infrastructure and experimental platforms
Operational Excellence & Incident Response
- Participate in on-call rotations and serve as an escalation contact for critical service incidents affecting Lumina's innovation platforms
- Drive post-incident reviews and implement systematic improvements to prevent recurrence in rapidly evolving systems
- Partner with stakeholder teams to establish and maintain service level objectives (SLOs) for experimental and production AI workloads
- Champion proactive monitoring, alerting, and automated remediation strategies for novel technology stacks
Cross-Functional Collaboration
- Work closely with data scientists, ML engineers, and product teams to ensure reliable deployment and operation of AI/ML models and features
- Collaborate with enterprise security, compliance, and risk teams to maintain regulatory requirements while enabling rapid innovation
- Partner with Borealis platform teams to optimize infrastructure for AI workloads and experimentation velocity
- Interface with external technology partners and vendors supporting RBC's innovation initiatives
What do you need to succeed?
Must have:
- 7+ years of experience in production engineering, platform engineering, or similar roles supporting large-scale distributed systems or AI/ML platforms
- Proficiency in at least one of: Python, Go, or another high level programming language with demonstrated experience building production services that support data-intensive workloads
- Experience with container orchestration platforms (Kubernetes, OpenShift) and cloud-native architectures, particularly for AI/ML workloads
- Deep understanding of both analytical and transactional data stores from a data engineering platform perspective (e.g., PostgreSQL, MongoDB, Elasticsearch, Kafka, Redis, vector databases)
- Experience with CI/CD pipelines, infrastructure-as-code (Terraform, Ansible), and GitOps workflows supporting rapid iteration cycles
- Proven track record of operating mission-critical systems with high availability requirements in dynamic, fast-paced environments
- Experience with modern observability tools (Prometheus, Grafana, ELK stack) and distributed tracing for complex data pipelines
- Solid understanding of networking fundamentals, load balancing, and content delivery networks supporting high-throughput applications
- Bachelor’s degree in computer science, Engineering, or equivalent practical experience
Preferred Qualifications:
- Experience supporting AI/ML infrastructure, model serving platforms, or data science workloads in production
- Knowledge of enterprise governance, compliance frameworks, and regulatory requirements in innovation contexts