Senior Site Reliability Engineer

Clickup
Remote
RemoteCareer-pivot friendly

Why this role

Pace
Fast Paced
The role demands a fast-paced environment, as evidenced by the need to manage capacity and performance to scale infrastructure quickly and efficiently, indicating a high-speed work environment.
Collaboration
High
Collaboration is key, as the job requires participating in the team's follow-the-sun model and improving the incident management process across the engineering organization, highlighting the need for teamwork.
Autonomy
High
Engineers are expected to own and drive improvements independently, such as defining SLOs and SLIs and building software solutions for reliability, showcasing a high level of autonomy.
Decision Impact
Team
Decisions made in this role can significantly impact the company's infrastructure and user experience, as seen in the responsibility to manage capacity and performance, and to automate critical portions of ClickUp's engineering processes.
Role Level
Team Lead
The complexity of the role is evident in the need to work with a wide range of technologies and systems, from cloud environments to databases and observability tools, requiring a deep understanding of multiple technical areas.
Career Pivot Friendly
Welcomes transferable skills
Individuals with a background in software engineering or operations, particularly those who have experience in cloud environments and infrastructure management, would find this role a natural progression, as the skills are highly transferable.

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

  • Improving the stability, availability, and reliability of ClickUp's globally distributed and cloud-based infrastructure
  • Defining SLOs and SLIs for all services and introducing error budgeting
  • Building software solutions to enable reliability and operability of large scale distributed systems
  • Automating critical portions of ClickUp engineering processes to minimize risk and maximize speed of innovation
Typical background
Strong software engineering background with a focus on operational, infrastructural, or SRE mentalityExperience in major cloud environments with CI/CD deployments, managed services, and infrastructure-as-code systemsExperience managing production-grade infrastructure with IaC tools or configuration management tools

Transferable backgrounds

  • Coming from Software Engineer at a tech startup
    software engineering · cloud experience
    Experience in software engineering and cloud environments directly aligns with the need to design and build systems for platform and infrastructure layers at ClickUp.
  • Coming from DevOps Engineer at a financial services firm
    infrastructure management · observability
    A background in managing production-grade infrastructure and setting up monitoring and alerting tools would be highly beneficial for improving ClickUp's observability and reliability.

Skills & requirements

Required

Software EngineeringCloud ExperienceInfrastructure ManagementOperating SystemsComputeDatabaseObservability

Preferred

Cloudformation/cdkECSElasticbeanstalkPostgreSQLDynamodbAuroradbTypeScriptJavaScript

Stack & domain

Software EngineeringCloud ExperienceInfrastructure ManagementOperating SystemsComputeDatabaseObservabilitySite Reliability EngineeringCloudInfrastructure

About the role

As a Senior Site Reliability Engineer at ClickUp, you'll focus on enhancing the stability and reliability of ClickUp's global infrastructure, working closely with a dynamic team to solve complex challenges and streamline operations, making this role ideal for a proactive and detail-oriented engineer.

Original posting from Clickup via Ashby

At ClickUp, we’re not just building software. We’re architecting the future of work! In a world overwhelmed by work sprawl, we saw a better way. That’s why we created the first truly converged AI workspace, unifying tasks, docs, chat, calendar, and enterprise search, all supercharged by context-driven AI, empowering millions of teams to break free from silos, reclaim their time, and unlock new levels of productivity. At ClickUp, you’ll have the opportunity to learn, use, and pioneer AI in ways that shape not only our product, but the future of work itself. Join us and be part of a bold, innovative team that’s redefining what’s possible! 🚀

We are looking for driven and innovative software engineers with strong site reliability engineering (SRE) discipline or interest in this area to help us make ClickUp the "one app to rule them all". As an SRE at ClickUp, your primary roles will be improving the stability, availability and reliability of our globally distributed and cloud-based infrastructure that powers our app for thousands of users daily. If you are a rockstar engineer with an entrepreneurial and high-paced mindset who are ready to own, drive and tackle some of the most complex problems there are out there we would love to hear from you!

What you'll do:

  • Build a deep understanding of how ClickUp's systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation
  • Own, drive and improve the incident management process across engineering org and participate in the team's follow-the-sun model
  • Define SLOs and SLIs for all of our services and introduce error budgeting
  • Own and improve our observability on all of our services
  • Build software solutions to enable reliability and operability of large scale distributed systems handling petabytes of data and serving
  • Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across ClickUp Engineering
  • Automate critical portions of ClickUp engineering processes, to minimize risk and maximize the speed of innovation
  • Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world

What we’re looking for:

  • Software engineering: At the very core, we are looking strong software engineers with operational, infrastructural or SRE mentality who can design and build systems for platform and infrastructure layers
  • Cloud experience: Production working experience in a major cloud environment around doing CI/CD deployments, using managed services, bootstrapping and provisioning services via infrastructure-as-code (IAC) systems, automations and operations
  • Infrastructure Management: You have worked with and managed production grade infrastructure with IaC tools or configuration management tools
  • Operating systems: Strong knowledge of *nix based operating systems, their internals and advanced troubleshooting commands
  • Compute: Experience of working with VMs, containers and container orchestration systems
  • Database: Experience of working with RDBMS and NoSQL storage solutions within production capacity and know your way around running and inspecting queries. A good understanding of indexing, locking, replication and sharding are a bonus!
  • Observability: You have worked with logging, monitoring and alerting tools before and you know how logs are collected, aggregated and injected. You have set up monitors and alerts for production services and know your way around concepts such as SLOs and SLIs
  • Bonus points: We believe strong engineers can pick up any technologies and tools fast and hit the ground up running. Therefore, we avoid listing specific technologies. However, if you have worked with at least one of the technologies we have in our stack that would definitely be a bonus point.
  • CloudFormation/CDK, ECS, ElasticBeanstalk
  • PostgreSQL, DynamoDB, AuroraDB
  • Typescript or any JavaScript based framework

#LI-REMOTE

 #LI-CC1

Unsure if you meet all the qualifications of this job description but are deeply excited about the role? We hire based on ambition, grit, and a passion for improving the way people work. If you think ClickUp is the company for you, we encourage you to apply!

At ClickUp, we assess every candidate based on the potential impact they can have. We hire the best people for the job and support each person’s journey to build their boldest career.

Equal Opportunity Employer

ClickUp is an Equal Opportunity Employer, and qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.

Privacy Notice

ClickUp collects and processes personal data in accordance with applicable data protection laws. You can find further details by viewing our Global Candidate Privacy Notice https://doc.clickup-stg.com/333/p/h/ad-7887965/1013ea75ac66a01.

If you are a Philippine Job Applicant, please also see our Philippine Data Privacy Notice https://t333.s.clickup-attachments-stg.com/t333/d9a53ce2-6d2c-48cd-84b1-ed4e525f9613/Philippine%20Data%20Privacy%20Notice_2024.pdf?view=open for further details.

Visa Sponsorship

Please note we are unable to sponsor or take over sponsorship of an employment visa for roles outside of engineering and product at this time. Sponsorship for engineering and product roles is not guaranteed, but is instead based on the business needs for that specific role at that time. Please reach out to the recruiter with any questions.

Fraud Alert

ClickUp Talent Acquisition will only initiate contact via an @clickup.com http://clickup.com email or through our official careers portal on clickup.com http://clickup.com. We will never request fees, payments, or sensitive personal information. Please disregard any offers received outside these channels and report them to support@clickup.com.

AI Processing Notice

ClickUp may use artificial intelligence and machine learning technologies to help review and screen candidates' employment applications against role-related criteria. These tools support, but do not replace, human decision‑making. If you have questions or need an accommodation in the recruitment process, please contact us at AskPeople@ClickUp.com.

Source: Clickup careers (Ashby)

Similar roles