San Francisco, CA, USA
Dec 3, 2021   |  By JJ Tang
Although every company can benefit from SREs, some need SREs more than others.
Nov 24, 2021   |  By Quentin Rousseau
Six tips on how Site Reliability Engineers (SREs) can prepare for the reliability challenges of Black Friday and Cyber Monday 2021
Nov 19, 2021   |  By JJ Tang
A history of Site Reliability Engineering from its origins at Google in 2003 to the present.
Nov 12, 2021   |  By Quentin Rousseau
Follow these steps to write a great SRE job resume.
Nov 5, 2021   |  By JJ Tang
An explanation of the meaning of SLA, SLO and SLI, and how SREs should use each concept to manage reliability.
Oct 29, 2021   |  By Quentin Rousseau
SREs and SWEs complement each other, but they perform different tasks and focus on different priorities.
Oct 22, 2021   |  By JJ Tang
Learn about the key roles within an incident response team, as well as optional incident roles you may not have thought about.
Oct 15, 2021   |  By Quentin Rousseau
A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.
Oct 8, 2021   |  By JJ Tang
Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.
Oct 1, 2021   |  By Quentin Rousseau
The four key takeaways for SREs from Google’s State of DevOps 2021 report

Rootly is a turnkey incident response command centre that brings the best reliability practices from Google, Netflix, Amazon to those without a million-dollar budget.

Rootly is an all-in-one platform that streamlines collaboration, communication, and learning. It automates away manual toil engineers suffer through today and captures data-driven insights. With Rootly, companies accelerate their incident resolution and learn how to prevent them in the future.

Teams depend on Rootly to improve their reliability:

  • Collaborate: Seamlessly handoff alerts from PagerDuty to quickly declare incidents from your tool of choice like Slack. Automatically involve all the right teams in seconds, not minutes. Beyond just engineering but loop in legal, support, and sales. With intelligent workflows, no more wondering what team owns which service or who should be responsible for what. Rootly does the heavy lifting for you.
  • Communicate: Build your incident timeline through Web or Slack. Autolink war rooms with our Zoom & Google Meet integrations. Rich and customizable private and public status pages ensure everyone is updated while you focus on what you do best, fighting fires.
  • Remediate: Enrich your timeline with automated Genius workflows. Fetch relevant information as recent git commits of your impacted services. Customize your workflows based on any incident condition.
  • Retrospective: Learn from incidents with beautiful postmortems engineers want to write without the manual toil of copy and pasting. Accurately replay past incidents to help simulate real world disaster scenarios to train engineers faster and keep their tools sharp. Organized and easily shared, not buried in a Google Doc that can’t be found.

All-in-one incident response platform for humans.