Mattermost Handbook
Need help?How to spend company moneyHow to update the HandbookRelease overview
0.2.1
0.2.1
  • Mattermost Handbook
  • Company
    • About Mattermost
      • List of terms
      • Business model
      • Mindsets
    • "How to" guides for staff
      • How to set up a 1-1 channel
      • How to update the handbook
      • How to manage Handbook notifications
      • How to change mobile device
        • How to handle a lost mobile device
      • How to do a mini-retrospective
      • How to autolink keywords in Mattermost
  • Operations
    • Company operations
      • Areas of Responsibility
      • Mattermost Leadership Team (MLT)
        • MLT cadence
      • Company measures
        • Metrics definitions
        • FY23 goals board
        • MLT metrics
      • Company cadence
      • Company policies
        • Community response policy
        • Security policy
      • Company processes
        • Issue/solution process
        • Company agreements
        • Publishing
          • Public web properties
          • Publishing guidelines
            • Brand and visual design guidelines
            • Voice, tone, and writing style guidelines
              • Contribute to documentation
            • Confidentiality guidelines
          • Post-publication quality control process
      • Handbook processes and policies
        • Handbook onboarding
      • Fiscal year planning
    • Research and Development
      • Organization
        • Tech Writing
        • Data engineering
        • Delivery
        • Cloud Platform
        • Site Reliability Engineering
        • GRC
        • Product Security
        • Security Operations
      • Processes
        • Feature Labels
      • Product
        • Product planning
          • Product philosophy and principles
          • Prioritization process
          • Release planning process
          • Roadmap views
          • Release plan
          • Launch plan
          • Feature requests
        • Development process
          • Mobile feature guidelines
          • Deprecation policy
          • Mattermost software requirements process
          • Jira ticket lifecycle
          • Creating new Jira bug tickets
            • Priority levels for tickets
            • Jira fix versions
        • Release process
          • Release overview
          • Feature release process
          • Dot release process
          • Security release process
          • Mobile app release process
          • Desktop app release process
          • Release tips
          • Release scorecard definitions
        • How-to guides for Product
          • How to use productboard
          • How to record a roadmap video
          • How to update integrations directory
          • How to write a feature release announcement
        • Product Management team handbook
          • Product Management Areas of Ownership
          • Product Manager onboarding
          • Product Manager levels
          • Professional development
        • Product Design team handbook
          • Product Design levels
        • Technical Writing team handbook
          • Work with us
          • User interface text guidelines
          • Documentation style guide
          • Our terminology
          • Guidelines for PMs and developers
          • Guidelines for community contributions
          • Technical Writer levels
          • Docathon 2021
            • Getting started with contributing
        • Growth
          • A/B testing methodology
          • PQL definition
        • Analytics
          • Product Analyst Engineer levels
          • Looker
            • Dashboards
            • Explores
          • Telemetry
        • Developer relations
        • Product team hangouts
      • Engineering
        • Infrastructure engineering
          • Cloud infrastructure cost KPIs
          • Cloud data export process
          • Cloud churn process
          • Reliability Manifesto
          • Production Readiness Review
          • Infrastructure Library
        • Integrations team processes
        • Plugin release process
        • Data Engineering
        • Sustained Engineering
          • On call
        • How to go to a conference
        • Public speaking
        • Core contributor expanded access policy
      • Quality Assurance
        • QA workflow
        • QA testing tips and tools
        • Rainforest process
    • Messaging and Math
      • How-to guides for M&M
        • How to create release announcements
        • How to create screenshots and GIFs
        • How to write Mattermost case studies
        • How to write guest blog posts for Mattermost apps and services
        • How to write Mattermost recipes
        • How to compose tweets
        • How to create a split test for web page
        • How to run meetups
        • How to run executive dinners
      • Checklists for M&M
        • Blog post checklist
        • Bio checklist
      • Mattermost websites
      • Demand generation reporting
      • M&M Asana guidelines
      • Content marketing
        • How to use the editorial calendar
        • Content development and distribution
        • Video content guidelines
        • How to contribute content
    • Sales
      • Deal Desk
      • Partner programs
      • Lead management
    • Deployment Engineering
      • Overview
      • Workflows
      • Frequently Asked Questions
      • Playbook for MME Sev 1 Outages
      • Status Update Template
    • Program Management
    • Customer Success
      • Customer Support
    • Legal
      • Contracts
      • Ironclad Basics
        • Company-Wide Workflows
        • Sales Contracts and Workflows
        • Signing a Contract and Contract Repository
    • Finance
      • Budget
      • How to use Airbase
        • Access Airbase
        • Navigate Airbase
        • How to submit a purchase request
        • How to submit a reimbursement request
        • How to review a reimbursement request
        • Vendor portal guide
        • Frequently asked questions
      • Onboarding
        • Vendor onboarding
        • ROW staff onboarding
      • Staff member expenses
        • How to spend company money
        • How to spend company money: Internships
        • Corporate credit card policy
        • How to access Airbase
        • Gifting policy
        • How to book airfare and travel
        • How to reimburse the company
        • How to convert currencies
        • How to get paid
      • Arrange a Bounty Program
      • Naming files and agreements
      • Risk management
        • Mattermost U.S. consulting agreements
      • Operations playbook
    • Security
      • Policies
      • Privacy
        • Data deletion requests
        • Data subject access requests
      • Product Security
        • Product Vulnerability Process
        • Working on security-sensitive pull requests
        • Secure Software Development guide
      • Security Operations
        • User guides
    • Workplace
      • PeopleOps
        • HR cadences
        • HR systems
        • HR Processes
        • Working at Mattermost
          • Onboarding
            • Things everyone must know
            • Staff onboarding
            • Engineer onboarding timeline and expectations
            • Manager onboarding
            • Frequently asked questions
          • Learning and development
          • Mattermost communication best practices
          • Paid time off
            • Out of office email example
          • Travel
            • Business travel insurance
          • Leaves of absence
            • Pregnancy leave
            • Baby bonding parental leave
            • Jury duty
          • Workplace program
          • Relocation
          • Total rewards
        • Performance reviews
          • Formal review process
          • New staff performance review
          • Informal review process
        • Transfers and promotions
        • Offboarding instructions for managers
        • People compliance
      • People policies
      • Groups
        • Staff Resource Groups
      • Approvals and iteration
      • IT
        • IT helpdesk
        • Hardware and software purchases
        • Hardware buy back policy
        • Software systems
  • Contributors
    • Contributors
      • Equity, diversity, and inclusion
      • How to contribute to Mattermost
        • Community Content program
        • Documentation contributions
        • Help Wanted tickets
        • Localization
        • Contribution events
      • Mattermost community
      • Contributor kindness
      • Community systems
      • Guidelines and playbooks
        • Social engagement guidelines
        • Contribution guidelines and code of conduct
        • Mattermost Community playbook
        • How to run a Hackathon
        • Hacktoberfest event organizer guide for Mattermost
    • MatterCon
      • Staff information privacy management
      • Mattermost events code of conduct
      • MatterCon2021
    • Join us
      • Ice-breakers
      • Help Wanted tickets
      • Localization
      • Mattermost GitHub sponsorship
      • Things candidates should know
      • Staff recruiting
      • Recruiting cadences
        • Product Manager hiring process
      • Exec recruiting
        • EA logistics
  • Help and support
    • Contact us
Powered by GitBook
On this page
  • 1 - Escalation
  • 2 - Data gathering
  • 3 - Data review
  • 4 - Code investigation
  • 5 - Release preparation
  • 6 - Dot release deployment
  • 7 - Resolution
  • 8 - Retrospective

Was this helpful?

Edit on Git
Export as PDF
  1. Operations
  2. Deployment Engineering

Playbook for MME Sev 1 Outages

Below is a codified playbook used to respond to MME Sev 1 Outages.

For the latest version, refer to the playbook in our Mattermost community instance: https://community.mattermost.com/playbooks/playbooks/9agdqr7jdtda7p4g8dxbppcibw

1 - Escalation

[ ] Create Incident Channel, run MME Sev1 Playbook

  • Once MME Sev1 issue is escalated by CSM, TAM or CRE, create incident channel, and run MME Sev 1 Playbook

[ ] Add CSM, TAM & DE to Incident Channel

  • Add CSM, TAM & DE leaders (@Brent Fox @Stu Doherty @Jason Blais) to the channel to add the appropriate staff member. Also add @Ian Tien to view Playbooks in motion for L2 and L1 incidents.

[ ] Start audio & screen share with customer

  • Include a Mattermost engineer & customer DBA on the call who can run queries to support troubleshooting

[ ] Reply to customer (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer

2 - Data gathering

[ ] Share system information

  • Includes relevant system configuration setting, database specs (with CPU, RAM) & application specs (with CPU, RAM)

[ ] Share Grafana screenshots

  • Include DB calls, API latency, Store latency, Top HTTP requests, Top API requests, CPU utilization, memory utilization

[ ] Share output from support bundle

  • Link to relevant docs

[ ] Share output from slow query logs

  • Link to relevant docs

[ ] Pin data to channel

  • Link to relevant docs

3 - Data review

[ ] Review system configuration settings that may impact performance

  • Includes user typing timeout, user typing message, max notifications per channel & db replica lag settings

[ ] Review Grafana screenshots to identify potential issues

  • Includes XXX

[ ] Review support bundle output to identify potential issues

  • Includes XXX

[ ] Review slow query log output to identify potential issues

  • Includes XXX

[ ] Summary findings from data review

  • Includes XXX

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer. Include a timeline for anticipated resolution.

4 - Code investigation

[ ] Based on findings from data review, identify areas of codebase with potential root cause

  • Includes XXX

[ ] Identify potential root cause based on the code

  • Includes XXX

[ ] Identify solution for root cause

  • Includes XXX

[ ] Submit PR for solution

  • Includes XXX

[ ] Deem whether verification of a fix is required for release candidate

  • If yes, provide clear step-by-step instructions for QA to verify the fix, including specifications for test server such as database type (MySQL vs Postgres)

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer. Include a timeline for anticipated resolution.

5 - Release preparation

[ ] Merge PR to master branch

  • Includes XXX

[ ] Cherry pick PR to dot release branch

  • Includes XXX

[ ] Cut dot release candidate

  • Includes XXX

[ ] Verify fix in dot release candidate

  • Includes XXX

[ ] Cut dot release

  • Includes XXX

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer. Include a timeline for anticipated resolution.

6 - Dot release deployment

[ ] Send dot release binary to customer

  • Includes XXX

[ ] Upgrade customer’s dev/staging environment with dot release

  • Includes XXX

[ ] Verify fix in customer’s dev/staging environment

  • Includes XXX

[ ] Upgrade customer’s production environment with dot release

  • Includes XXX

[ ] Verify fix in customer’s production environment

  • Includes XXX

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer. Include a timeline for anticipated resolution.

7 - Resolution

[ ] Monitor fix in customer environment for 24 hours

  • Includes XXX

[ ] Receive confirmation from customer about issue resolution

  • Includes XXX

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer

8 - Retrospective

[ ] Complete incident retrospective within 1 business day from resolution

  • Includes XXX

[ ] Draft incident summary analysis within 2 business days from resolution

  • Includes XXX

[ ] Send completed incident summary analysis with customer within 3 business days

  • Includes XXX

[ ] Reply to customer with update (CEO if MME Sev 1 > 1 hour)

  • MME Sev 1 outage >1 hour requires CEO looped into customer

PreviousFrequently Asked QuestionsNextStatus Update Template

Last updated 1 year ago

Was this helpful?