Security Monitoring Analyst
Purpose of the Role
The role staffs the Network Operations Centre on a rotating shift pattern to deliver continuous service monitoring of availability, performance, capacity, and security signals across Active Directory, Entra ID, Microsoft 365, SharePoint, Power Platform, Microsoft Fabric, and Azure — for the services that require 24/7 coverage as defined in the technical scope.
The post-holder triages incoming alerts, performs first-pass diagnostics, executes documented runbooks for known incident patterns, escalates to the relevant L2/L3 specialist within agreed timelines, opens communication bridges for P1 events, and ensures customer stakeholders are kept informed during major incidents. The role is the heartbeat of the SLA: it determines whether the contractual P1 1-hour response is met.
Requirements
Key Technical Responsibilities
Continuous Monitoring and Alert Triage
- Operate the monitoring console stack — Microsoft Sentinel, Azure Monitor, Microsoft Defender for Cloud, Microsoft 365 Admin Center service health, Defender XDR alerts, Log Analytics workbooks, and the integrated ITSM ticketing platform — for the duration of every shift.
- Monitor availability and performance of Active Directory domain controllers, DNS / DHCP / time service, ADFS, AAD Connect sync health, Entra ID sign-in service health, Exchange Online, SharePoint Online, Teams, OneDrive, Power Platform environments, Microsoft Fabric capacity, Azure VMs, storage, networking, and PaaS services.
- Triage incoming alerts within 5 minutes of generation, applying the documented severity matrix; classify alerts as actionable, suppressible, or false-positive, and record the rationale in the ticketing platform.
- Correlate alerts across multiple sources (Sentinel, Defender, Azure Monitor, M365 service health) to identify the underlying incident rather than reacting to individual symptoms.
- Acknowledge alerts and update tickets at the agreed cadence (every 60 minutes during P1; every 4 hours during P2) until handover or closure.
Incident Response and Runbook Execution
- Execute Tier-1 incident response runbooks for known and documented patterns: Conditional Access misconfiguration rollback, AAD Connect sync failure restart, expired application secret rotation, Defender alert containment, mailbox / Teams reset operations, SharePoint sharing-link restoration, and Power Platform environment health checks.
- Initiate the major incident process for any P1 incident: page the duty L2/L3 specialist, open the Microsoft Teams incident bridge, notify the Service Delivery Manager and customer stakeholders per the agreed comms plan, and assume scribe duties on the bridge call.
- Maintain accurate incident timelines in the ticketing platform — every action, every status check, every communication — with timestamp and operator initials, suitable for post-incident review and audit.
- Execute documented automated containment playbooks (Sentinel Logic Apps) for high-confidence security events: disable risky users, force password reset, isolate device in Defender for Endpoint, block sender in Exchange Online.
- Hand over open incidents at shift change using the structured handover template (active incidents, watch-items, scheduled changes, planned maintenance, expected escalations).
Service Request Fulfilment During Out-of-Hours Windows
- Fulfil pre-approved standard service requests during out-of-hours windows where authorised — for example licence assignment for emergency onboarding, Teams meeting policy adjustments for live events, or pre-approved Conditional Access exclusions — strictly within the documented standing change envelope.
Monitoring Hygiene and Improvement
- Participate in alert tuning to reduce false-positive rate and alert fatigue: review noisy rules weekly, propose threshold or filter changes through change control, and validate post-change.
- Maintain monitoring runbook accuracy: every time a runbook is executed, capture deviations and feed back to the engineering team for runbook updates.
- Contribute weekly to the Service Delivery Manager's service review with a shift-summary report (alerts handled, incidents raised, false-positive trends, runbook gaps).
Communication and Stakeholder Management
- Provide clear, factual, non-speculative communication during incidents in line with the proposed SLA Communication Plan — initial notification within 15 minutes of P1 declaration, updates at 60-minute intervals, and a wrap-up notification within 1 hour of resolution.
- Maintain the operational status page / Teams channel for customer stakeholders during major incidents.
- Comply strictly with EEA-only data processing requirements: no customer data is to leave the EEA boundary at any point during incident handling, and no screenshots / logs are to be transmitted via non-approved channels.
Mandatory Technical Skills
- Hands-on experience operating Microsoft Sentinel and Azure Monitor in a production