Mattermost Handbook
Need help?How to spend company moneyHow to update the HandbookRelease overview
0.2.1
0.2.1
  • Mattermost Handbook
  • Company
    • About Mattermost
      • List of terms
      • Business model
      • Mindsets
    • "How to" guides for staff
      • How to set up a 1-1 channel
      • How to update the handbook
      • How to manage Handbook notifications
      • How to change mobile device
        • How to handle a lost mobile device
      • How to do a mini-retrospective
      • How to autolink keywords in Mattermost
  • Operations
    • Company operations
      • Areas of Responsibility
      • Mattermost Leadership Team (MLT)
        • MLT cadence
      • Company measures
        • Metrics definitions
        • FY23 goals board
        • MLT metrics
      • Company cadence
      • Company policies
        • Community response policy
        • Security policy
      • Company processes
        • Issue/solution process
        • Company agreements
        • Publishing
          • Public web properties
          • Publishing guidelines
            • Brand and visual design guidelines
            • Voice, tone, and writing style guidelines
              • Contribute to documentation
            • Confidentiality guidelines
          • Post-publication quality control process
      • Handbook processes and policies
        • Handbook onboarding
      • Fiscal year planning
    • Research and Development
      • Organization
        • Tech Writing
        • Data engineering
        • Delivery
        • Cloud Platform
        • Site Reliability Engineering
        • GRC
        • Product Security
        • Security Operations
      • Processes
        • Feature Labels
      • Product
        • Product planning
          • Product philosophy and principles
          • Prioritization process
          • Release planning process
          • Roadmap views
          • Release plan
          • Launch plan
          • Feature requests
        • Development process
          • Mobile feature guidelines
          • Deprecation policy
          • Mattermost software requirements process
          • Jira ticket lifecycle
          • Creating new Jira bug tickets
            • Priority levels for tickets
            • Jira fix versions
        • Release process
          • Release overview
          • Feature release process
          • Dot release process
          • Security release process
          • Mobile app release process
          • Desktop app release process
          • Release tips
          • Release scorecard definitions
        • How-to guides for Product
          • How to use productboard
          • How to record a roadmap video
          • How to update integrations directory
          • How to write a feature release announcement
        • Product Management team handbook
          • Product Management Areas of Ownership
          • Product Manager onboarding
          • Product Manager levels
          • Professional development
        • Product Design team handbook
          • Product Design levels
        • Technical Writing team handbook
          • Work with us
          • User interface text guidelines
          • Documentation style guide
          • Our terminology
          • Guidelines for PMs and developers
          • Guidelines for community contributions
          • Technical Writer levels
          • Docathon 2021
            • Getting started with contributing
        • Growth
          • A/B testing methodology
          • PQL definition
        • Analytics
          • Product Analyst Engineer levels
          • Looker
            • Dashboards
            • Explores
          • Telemetry
        • Developer relations
        • Product team hangouts
      • Engineering
        • Infrastructure engineering
          • Cloud infrastructure cost KPIs
          • Cloud data export process
          • Cloud churn process
          • Reliability Manifesto
          • Production Readiness Review
          • Infrastructure Library
        • Integrations team processes
        • Plugin release process
        • Data Engineering
        • Sustained Engineering
          • On call
        • How to go to a conference
        • Public speaking
        • Core contributor expanded access policy
      • Quality Assurance
        • QA workflow
        • QA testing tips and tools
        • Rainforest process
    • Messaging and Math
      • How-to guides for M&M
        • How to create release announcements
        • How to create screenshots and GIFs
        • How to write Mattermost case studies
        • How to write guest blog posts for Mattermost apps and services
        • How to write Mattermost recipes
        • How to compose tweets
        • How to create a split test for web page
        • How to run meetups
        • How to run executive dinners
      • Checklists for M&M
        • Blog post checklist
        • Bio checklist
      • Mattermost websites
      • Demand generation reporting
      • M&M Asana guidelines
      • Content marketing
        • How to use the editorial calendar
        • Content development and distribution
        • Video content guidelines
        • How to contribute content
    • Sales
      • Deal Desk
      • Partner programs
      • Lead management
    • Deployment Engineering
      • Overview
      • Workflows
      • Frequently Asked Questions
      • Playbook for MME Sev 1 Outages
      • Status Update Template
    • Program Management
    • Customer Success
      • Customer Support
    • Legal
      • Contracts
      • Ironclad Basics
        • Company-Wide Workflows
        • Sales Contracts and Workflows
        • Signing a Contract and Contract Repository
    • Finance
      • Budget
      • How to use Airbase
        • Access Airbase
        • Navigate Airbase
        • How to submit a purchase request
        • How to submit a reimbursement request
        • How to review a reimbursement request
        • Vendor portal guide
        • Frequently asked questions
      • Onboarding
        • Vendor onboarding
        • ROW staff onboarding
      • Staff member expenses
        • How to spend company money
        • How to spend company money: Internships
        • Corporate credit card policy
        • How to access Airbase
        • Gifting policy
        • How to book airfare and travel
        • How to reimburse the company
        • How to convert currencies
        • How to get paid
      • Arrange a Bounty Program
      • Naming files and agreements
      • Risk management
        • Mattermost U.S. consulting agreements
      • Operations playbook
    • Security
      • Policies
      • Privacy
        • Data deletion requests
        • Data subject access requests
      • Product Security
        • Product Vulnerability Process
        • Working on security-sensitive pull requests
        • Secure Software Development guide
      • Security Operations
        • User guides
    • Workplace
      • PeopleOps
        • HR cadences
        • HR systems
        • HR Processes
        • Working at Mattermost
          • Onboarding
            • Things everyone must know
            • Staff onboarding
            • Engineer onboarding timeline and expectations
            • Manager onboarding
            • Frequently asked questions
          • Learning and development
          • Mattermost communication best practices
          • Paid time off
            • Out of office email example
          • Travel
            • Business travel insurance
          • Leaves of absence
            • Pregnancy leave
            • Baby bonding parental leave
            • Jury duty
          • Workplace program
          • Relocation
          • Total rewards
        • Performance reviews
          • Formal review process
          • New staff performance review
          • Informal review process
        • Transfers and promotions
        • Offboarding instructions for managers
        • People compliance
      • People policies
      • Groups
        • Staff Resource Groups
      • Approvals and iteration
      • IT
        • IT helpdesk
        • Hardware and software purchases
        • Hardware buy back policy
        • Software systems
  • Contributors
    • Contributors
      • Equity, diversity, and inclusion
      • How to contribute to Mattermost
        • Community Content program
        • Documentation contributions
        • Help Wanted tickets
        • Localization
        • Contribution events
      • Mattermost community
      • Contributor kindness
      • Community systems
      • Guidelines and playbooks
        • Social engagement guidelines
        • Contribution guidelines and code of conduct
        • Mattermost Community playbook
        • How to run a Hackathon
        • Hacktoberfest event organizer guide for Mattermost
    • MatterCon
      • Staff information privacy management
      • Mattermost events code of conduct
      • MatterCon2021
    • Join us
      • Ice-breakers
      • Help Wanted tickets
      • Localization
      • Mattermost GitHub sponsorship
      • Things candidates should know
      • Staff recruiting
      • Recruiting cadences
        • Product Manager hiring process
      • Exec recruiting
        • EA logistics
  • Help and support
    • Contact us
Powered by GitBook
On this page
  • Sample status update
  • Enterprise ABC SEV1 outage - 7:20pmET update

Was this helpful?

Edit on Git
Export as PDF
  1. Operations
  2. Deployment Engineering

Status Update Template

Below is a template for sharing status updates for MME Sev 1 Outages:

### <customer> SEV1 outage - <time> update

#### Status: <status>

#### Incident Time: <number of hours since incident began>

<summary of incident>

#### Customer Impact

<summary of customer impact>

#### Technical Notes

<summary of technical notes>

Sample status update

Enterprise ABC SEV1 outage - 7:20pmET update

Status: Identified

Incident time: 9 hours

We've identified the likely root cause. PR is in progress, with the goal to merge tonight, followed by a 7.8.6 dot release for ABC by tomorrow morning.

WebEx bridge with ABC has concluded. We will reconvene at 8am ET.

Huge thanks to @neill.collie for being on the bridge call for 9 hours. Additionally big thank you's go to Ben Cooke for identifying the root cause, plus Doug, Joram, Agniva, Chris Speller, Claudio, Ben Schumacher who all pitched in thus far, and Amy + Delivery team for being on standby for a patch release.

Next update will be shared at 8amET.

Customer impact

While the outage is ongoing, mood with ABC team felt much lighter after the root cause was identified. The call ended in a good note.

ABC is preparing comms for an emergency upgrade patch. Last time in September approvals were bypassed with Alice Evan's signoff (SVP - Senior Technical Manager at ABC). Alice was on call and was ready to sign off on the approvals.

Technical notes

The cause appears to be a recent change that results in a long-running bot query hitting the database directly more often instead of a cache. Delivery team has been notified for a patch release in v7.8.6, which ABC would first test in dev/uat, then roll to prod if it's deemed to solve the root cause.

On the API call to get the config for the client, we're adding back a cache around a query to count the total users. This API is called on every page load. The query was used in this case to determine if there are any existing users on the system, which is needed for some client logic. The cache was originally removed due to a bug that mis-reported the existence of users. We're adding it back in a different way that fixes the bug and reduces the query to happen only once.

PR is in progress, with the aim to merge the change tonight, followed by a 7.8.6 dot release.

The dot release will be shared by Neill over intralink, which ABC will push to dev/uat. ABC will smoke test, plus check the Grafana query sum(rate(mattermost_db_store_time_count{instance=~"$server",method="UserStore.Count"}[5m])) to see if there's a noticeable difference.

Afterwards, the patch will be pushed to prod.

Latest thread with full details: https://community.mattermost.com/private-core/pl/fp1ptysz5tbz5bogre8fzbr7za

PreviousPlaybook for MME Sev 1 OutagesNextProgram Management

Last updated 1 year ago

Was this helpful?