Senior Engineer (Production Support)

Scotiabank
Toronto; Ontario, CA; US
On-site

Job Description

Requisition ID: 253600

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.

About the Role

We are looking for a highly experienced Senior Java Developer with a strong background in production support for large scale, distributed systems. This role requires a deep understanding of Java internals, frameworks, application performance, and distributed architecture. Unlike typical production support roles, this position requires hands-on development skills to quickly assess, diagnose, and implement code fixes when necessary.

You will lead incident resolution, conduct root cause investigations, and collaborate with application teams to improve logging, monitoring, and overall system reliability.

Is this role right for you? In this role you will:

Advanced Production Support & Troubleshooting

  • Serve as the senior technical expert for high priority Java application incidents in production.
  • Diagnose issues across large-scale distributed systems using logs, thread dumps, heap dumps, GC analysis, performance metrics, and dependency traces.
  • Quickly assess and implement code fixes, configuration changes, and performance optimizations to restore stability.
  • Work closely with QA and development teams to validate fixes and coordinate safe rollout.

Incident Management

  • Act as a technical lead during Sev1/Sev2 incidents, driving troubleshooting, communication, and timely decision making.
  • Coordinate with application, infrastructure, database, and SRE teams to resolve complex cross platform issues.
  • Provide clear, concise status updates to leadership and business stakeholders.

Root Cause & Problem Management

  • Lead end to end RCA for major incidents, identifying technical causes and long-term remediation steps.
  • Translate findings into engineering actions, code improvements, or operational enhancements.
  • Ensure preventive measures are implemented to avoid recurrence.

Monitoring, Logging & Observability

  • Enhance observability by improving application logs, error messages, metrics, and tracing.
  • Build and refine dashboards and alerts using:
  • Dynatrace (APM, Davis AI, Smartscape, synthetic monitoring)
  • Splunk (SPL queries, dashboards, correlation)
  • Reduce alert noise and improve signal quality across environments.

Development & Fix Implementation

  • Apply expert-level Java and Spring technical skills to identify, code, and deploy targeted fixes.
  • Collaborate with developers to improve resiliency, error handling, and performance.
  • Ensure all fixes follow best practices for high-scale distributed systems.

On-Call Support

  • Participate in an on-call rotation, including off-hours support for production issues.
  • Demonstrate strong ownership and a production-first mindset.

Do you have the skills that will enable you to succeed in this role? We’d love to work with you if you have:

  • 8-12+ years of Java development experience supporting large-scale enterprise applications.
  • Deep knowledge of:
  • Java (8/11/17), JVM internals, multithreading, memory management
  • Spring, Spring Boot, REST APIs, microservices
  • Distributed architecture and cloud-based systems
  • Proven experience diagnosing and fixing production issues in real time.
  • Strong hands-on skills with Dynatrace and Splunk.
  • Solid understanding of relational and non-relational databases and query optimization.
  • Experience with message queues (Kafka, MQ, RabbitMQ) and async event processing.
  • Familiarity with containerization (Docker, Kubernetes) and cloud (GCP/AWS).
  • Excellent communication skills-capable of leading incident bridges and engaging with senior stakeholders.
  • Nice-to-Have Skills: Experience in SRE practices (SLIs/SLOs, error budgets).
  • Nice-to-Have Skills: Exposure to ITIL processes (Incident, Problem, Change).
  • Nice-to-Have Skills: Basic scripting for automation (Bash, Python).
  • Nice-to-Have Skills: Experience in financial services, telecom, or other mission-critical environments.
  • Rapid identification and resolution of production issues.
  • Durable fixes that prevent recurring incidents.
  • Improved monitoring, logging, and system resilience.
  • Clear communication and leadership during outages.
  • Strong partnership with development and SRE teams.

What's in it for you?

  • Diversity, Equity, Inclusion & Allyship - We strive to create an inclusive culture where every employee is empowered to reach their fullest potential, respected for who they are, and are embraced through bias-free practices and inclusive values across Scotiabank. We embrace diversity and provide opportunities for all employee to learn, grow & participate through our various Employee Resource Groups (ERGs) that span across diverse gender identities, ethnicity, race, age, ability & veterans.
  • Accessibility and Workplace Accommodations - We value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. Sco

Skills & Requirements

Technical Skills

JavaSpringSpring bootRest apisMicroservicesDistributed architectureCloud-based systemsDynatraceSplunkRelational databasesNon-relational databasesQuery optimizationMessage queuesContainerizationLoggingError messagesMetricsTracingDashboardsAlertsJava developmentProduction supportTroubleshootingIncident managementRoot cause analysisProblem managementMonitoringObservabilityJava internalsApplication performanceDistributed systemsJvm internalsMultithreadingMemory managementKafkaMqRabbitmqAsync event processingDockerKubernetesTerraformPrometheusGrafanaElkDatadogSre practicesCompliance frameworksCommunicationCollaborationProblem solvingIndependent workLeadershipDecision makingTime managementStress managementTeamworkCustomer interactionJavaSpringSpring bootRest apisMicroservicesDistributed architectureCloud-based systemsDynatraceSplunkRelational databasesNon-relational databasesQuery optimizationMessage queuesContainerizationLoggingError messagesMetricsTracingDashboardsAlertsJava developmentProduction supportTroubleshootingIncident managementRoot cause analysisProblem managementMonitoringObservabilityJava internalsApplication performanceDistributed systemsJvm internalsMultithreadingMemory managementKafkaMqRabbitmqAsync event processingDockerKubernetesTerraformPrometheusGrafanaElkDatadogSre practicesCompliance frameworks

Soft Skills

Problem-solvingCommunicationTeamwork

Domain Knowledge

Production supportJava developmentDistributed systems

Employment Type

FULL TIME

Level

senior

Posted

4/8/2026

Apply Now

You will be redirected to Scotiabank's application portal.

Sign in and we'll score your resume against this role.