Mattermost Handbook
Need help?How to spend company moneyHow to update the HandbookRelease overview
0.2.1
0.2.1
  • Mattermost Handbook
  • Company
    • About Mattermost
      • List of terms
      • Business model
      • Mindsets
    • "How to" guides for staff
      • How to set up a 1-1 channel
      • How to update the handbook
      • How to manage Handbook notifications
      • How to change mobile device
        • How to handle a lost mobile device
      • How to do a mini-retrospective
      • How to autolink keywords in Mattermost
  • Operations
    • Company operations
      • Areas of Responsibility
      • Mattermost Leadership Team (MLT)
        • MLT cadence
      • Company measures
        • Metrics definitions
        • FY23 goals board
        • MLT metrics
      • Company cadence
      • Company policies
        • Community response policy
        • Security policy
      • Company processes
        • Issue/solution process
        • Company agreements
        • Publishing
          • Public web properties
          • Publishing guidelines
            • Brand and visual design guidelines
            • Voice, tone, and writing style guidelines
              • Contribute to documentation
            • Confidentiality guidelines
          • Post-publication quality control process
      • Handbook processes and policies
        • Handbook onboarding
      • Fiscal year planning
    • Research and Development
      • Organization
        • Tech Writing
        • Data engineering
        • Delivery
        • Cloud Platform
        • Site Reliability Engineering
        • GRC
        • Product Security
        • Security Operations
      • Processes
        • Feature Labels
      • Product
        • Product planning
          • Product philosophy and principles
          • Prioritization process
          • Release planning process
          • Roadmap views
          • Release plan
          • Launch plan
          • Feature requests
        • Development process
          • Mobile feature guidelines
          • Deprecation policy
          • Mattermost software requirements process
          • Jira ticket lifecycle
          • Creating new Jira bug tickets
            • Priority levels for tickets
            • Jira fix versions
        • Release process
          • Release overview
          • Feature release process
          • Dot release process
          • Security release process
          • Mobile app release process
          • Desktop app release process
          • Release tips
          • Release scorecard definitions
        • How-to guides for Product
          • How to use productboard
          • How to record a roadmap video
          • How to update integrations directory
          • How to write a feature release announcement
        • Product Management team handbook
          • Product Management Areas of Ownership
          • Product Manager onboarding
          • Product Manager levels
          • Professional development
        • Product Design team handbook
          • Product Design levels
        • Technical Writing team handbook
          • Work with us
          • User interface text guidelines
          • Documentation style guide
          • Our terminology
          • Guidelines for PMs and developers
          • Guidelines for community contributions
          • Technical Writer levels
          • Docathon 2021
            • Getting started with contributing
        • Growth
          • A/B testing methodology
          • PQL definition
        • Analytics
          • Product Analyst Engineer levels
          • Looker
            • Dashboards
            • Explores
          • Telemetry
        • Developer relations
        • Product team hangouts
      • Engineering
        • Infrastructure engineering
          • Cloud infrastructure cost KPIs
          • Cloud data export process
          • Cloud churn process
          • Reliability Manifesto
          • Production Readiness Review
          • Infrastructure Library
        • Integrations team processes
        • Plugin release process
        • Data Engineering
        • Sustained Engineering
          • On call
        • How to go to a conference
        • Public speaking
        • Core contributor expanded access policy
      • Quality Assurance
        • QA workflow
        • QA testing tips and tools
        • Rainforest process
    • Messaging and Math
      • How-to guides for M&M
        • How to create release announcements
        • How to create screenshots and GIFs
        • How to write Mattermost case studies
        • How to write guest blog posts for Mattermost apps and services
        • How to write Mattermost recipes
        • How to compose tweets
        • How to create a split test for web page
        • How to run meetups
        • How to run executive dinners
      • Checklists for M&M
        • Blog post checklist
        • Bio checklist
      • Mattermost websites
      • Demand generation reporting
      • M&M Asana guidelines
      • Content marketing
        • How to use the editorial calendar
        • Content development and distribution
        • Video content guidelines
        • How to contribute content
    • Sales
      • Deal Desk
      • Partner programs
      • Lead management
    • Deployment Engineering
      • Overview
      • Workflows
      • Frequently Asked Questions
      • Playbook for MME Sev 1 Outages
      • Status Update Template
    • Program Management
    • Customer Success
      • Customer Support
    • Legal
      • Contracts
      • Ironclad Basics
        • Company-Wide Workflows
        • Sales Contracts and Workflows
        • Signing a Contract and Contract Repository
    • Finance
      • Budget
      • How to use Airbase
        • Access Airbase
        • Navigate Airbase
        • How to submit a purchase request
        • How to submit a reimbursement request
        • How to review a reimbursement request
        • Vendor portal guide
        • Frequently asked questions
      • Onboarding
        • Vendor onboarding
        • ROW staff onboarding
      • Staff member expenses
        • How to spend company money
        • How to spend company money: Internships
        • Corporate credit card policy
        • How to access Airbase
        • Gifting policy
        • How to book airfare and travel
        • How to reimburse the company
        • How to convert currencies
        • How to get paid
      • Arrange a Bounty Program
      • Naming files and agreements
      • Risk management
        • Mattermost U.S. consulting agreements
      • Operations playbook
    • Security
      • Policies
      • Privacy
        • Data deletion requests
        • Data subject access requests
      • Product Security
        • Product Vulnerability Process
        • Working on security-sensitive pull requests
        • Secure Software Development guide
      • Security Operations
        • User guides
    • Workplace
      • PeopleOps
        • HR cadences
        • HR systems
        • HR Processes
        • Working at Mattermost
          • Onboarding
            • Things everyone must know
            • Staff onboarding
            • Engineer onboarding timeline and expectations
            • Manager onboarding
            • Frequently asked questions
          • Learning and development
          • Mattermost communication best practices
          • Paid time off
            • Out of office email example
          • Travel
            • Business travel insurance
          • Leaves of absence
            • Pregnancy leave
            • Baby bonding parental leave
            • Jury duty
          • Workplace program
          • Relocation
          • Total rewards
        • Performance reviews
          • Formal review process
          • New staff performance review
          • Informal review process
        • Transfers and promotions
        • Offboarding instructions for managers
        • People compliance
      • People policies
      • Groups
        • Staff Resource Groups
      • Approvals and iteration
      • IT
        • IT helpdesk
        • Hardware and software purchases
        • Hardware buy back policy
        • Software systems
  • Contributors
    • Contributors
      • Equity, diversity, and inclusion
      • How to contribute to Mattermost
        • Community Content program
        • Documentation contributions
        • Help Wanted tickets
        • Localization
        • Contribution events
      • Mattermost community
      • Contributor kindness
      • Community systems
      • Guidelines and playbooks
        • Social engagement guidelines
        • Contribution guidelines and code of conduct
        • Mattermost Community playbook
        • How to run a Hackathon
        • Hacktoberfest event organizer guide for Mattermost
    • MatterCon
      • Staff information privacy management
      • Mattermost events code of conduct
      • MatterCon2021
    • Join us
      • Ice-breakers
      • Help Wanted tickets
      • Localization
      • Mattermost GitHub sponsorship
      • Things candidates should know
      • Staff recruiting
      • Recruiting cadences
        • Product Manager hiring process
      • Exec recruiting
        • EA logistics
  • Help and support
    • Contact us
Powered by GitBook
On this page
  • A/B Testing Methodology
  • Defining the Goal/Objective
  • Success Criteria
  • Formulating Hypotheses
  • Minimum Detectable Effect Size (MDE)
  • Sample Size
  • Design & Implementation
  • Analyzing Results

Was this helpful?

Edit on Git
Export as PDF
  1. Operations
  2. Research and Development
  3. Product
  4. Growth

A/B testing methodology

PreviousGrowthNextPQL definition

Last updated 3 years ago

Was this helpful?

This document will provide an overview of Mattermost’s A/B Testing methodology. It consists of a sequence of steps and considerations to design valid A/B tests.

A/B Testing Methodology

Defining the Goal/Objective

Before you can determine the success criteria (i.e. the metric that measures success), you must clearly articulate what the goal/objective of your AB test is. This goal/objective is a statement about the behavior you're trying to influence, the direction of that influence, and the method for achieving that influence.

An example of a goal/objective could be:

  1. Objective: Increase early user retention and engagement.

    • Method: Alter the location and design of the invite member feature.

  2. Objective: Increase Cloud free-to-paid conversions.

    • Method: Alter design of the purchasing process within the Billing & Subscription section of the Admin Console.

  3. Objective: Increase conversions of Mattermost Cloud Trial Sign Ups

    • Method: Alter paid ad messaging, visuals, and/or targeting.

All of the above goals/objectives have a clear metric that can be measured to compare the performance of various tests and control groups. It is important when conceptualizing an AB test to define a goal/objective that produces a measurable outcome.

Success Criteria

First you must decide what the success criteria of your A/B test will be. The following questions must be answered in order to formulate a proper A/B testing hypothesis:

  1. What is the metric or measure you are trying to influence? (i.e. What action or outcome are you trying to influence?)

    • This must be a concise definition that prevents the metric from changing over time.

    • A/B tests tend to work best when working with proportions i.e. the percentage of users to take an action similar to conversion rates.

  2. Which direction are you trying to influence the measure (increase, decrease or no directional preference)?

  3. Can the measure be compared to historical benchmarks?

    • Is it a static measure (one that will not change as more time passes)?

    • Historical benchmarks simplify sample size requirement calculations and provide us with a better understanding of the time required to run a test (i.e. the amount of time required to collect enough data/reach a large enough sample).

  4. Once this has been established, you will need to establish the MDE (Minimum Detectable Effect Size). You can learn more about the .

The success criteria of an A/B test is based on one to two clearly defined measures, and how an intervention hopes to influence them i.e. increase, decrease, or change (regardless of direction). Clearly defined measures ensure the ability to quantify the impact of interventions, thereby allowing you to measure the success or failure of a test.

Examples of valid success criteria metrics/measures: 1. First 14-Day Invite Member Conversion Rate: Proportion of total users to invite new members w/in the first 14 days of first activity

  • I.e. the percentage of total users that trigger an invite member event w/in 14 days of the user’s first active timestamp.

    1. Cloud Workspace Creation Conversion Rate: Proportion of unique visitors to the Cloud Trial Signup Page that successfully create a cloud workspace

Formulating Hypotheses

Example of valid Null & Alternative Hypothesis:

  • H𝞱 (Null Hypothesis): Adding an invite members icon to the banner will have no effect on the First 14-Day Invite Member Conversion Rate.

    • 𝞱 = 𝞱0

  • Ha (One-Sided Alternative Hypothesis): Adding an invite members icon to the banner will increase the First 14-Day Invite Member Conversion Rate.

    • 𝞱 > 𝞱0

  • Ha (Two-Sided Alternative Hypothesis): Adding an invite members icon to the banner will change (either direction) the First 14-Day Invite Member Conversion Rate.

    • 𝞱 ≠ 𝞱0

Minimum Detectable Effect Size (MDE)

The MDE is essentially the minimum relative change between test and control means to justify the changes. A smaller MDE requires a larger sample size, i.e. ability to confidently claim a small change to the baseline in the test group is statistically significant. A relative MDE of 15%-20% is typical for proportions and conversion rates, but the MDE will vary based on the metric specified by the Success Criteria.

Example

The Product Team wants to increase the proportion of users that invite members within the first 14 days on the platform. They’re evaluating two potential UI changes (interventions) to affect this change: 1. Test 1: Placing an icon at the top of the left-hand sidebar channel switcher. 2. Test 2: Placing a banner button at the bottom of the left-hand sidebar channel switcher. 3. Control: Hold the current UI constant.

The Product Team is targeting at least a 20% relative increase in the proportion of users that invite members within the first 14 days on the platform. This desired relative change in the proportion of users to “convert” will help govern our calculations for the minimum required sample size, and transitively the amount of time the A/B test will have to run in order to collect the minimum required sample size for each group (test 1, test 2, and the control group).

Sample Size

The next step in formulating an A/B test involves determining the minimum required sample size to produce statistically significant results i.e. adequate confidence level and statistical power.

This step is key. It does not need to be done in advance, although it will help you better understand how long the test will need to be run in order to achieve the desired statistical power (Default = .8) and confidence level (Default = .95).

The statistical power (β), the confidence level (1-𝜶), the minimum detectable effect size (𝝈), the desired test outcome, as well as the historical measure benchmark (as a buffer for the control outcome) are all required to calculate the minimum required sample size. As the test progresses, the calculation for minimum required sample size will change as you’re able to substitute hypothesized outcomes with actual results. The greater the change between groups, the smaller the sample size needed.

These variables and assumptions are listed below, as well as the formula for calculating the sample size for varying types of tests (1-tailed vs. 2-tailed).

Variables & Assumptions

n = Sample Size

  • This represents the sample size of each group. Ideally, the samples will be the same size for each test group and the control.

  • The sample size must be sufficiently large in order for the A/B test to have adequate statistical power (the typical statistical power for A/B tests is .8, which is 1 - β)

𝜶 = Alpha: Uncertainty Level (Probability of Committing Type I Error)

  • Default = .05

    • Meaning 1/20 chance there is an effect, but you conclude there is no effect

𝜷 = Beta: (Probability of Committing Type II Error)

  • 1 - β = Statistical Power

𝞺1 = Proportion of group 1 to convert

𝞺2 = Proportion of group 2 to convert

𝝈 = |𝞺1 - 𝞺2|

  • Absolute difference between proportion of group 1 and group 2

Z = Z Score

  • Z Score dependent on the specified 𝜶 (alpha) & 𝜷 (beta) statistics

One-sided Test Sample Size Formula:

  • n = ( (Z1-𝜶 √ 𝞺1(1- 𝞺1) ) + (Z1-𝜷 √ 𝞺2(1- 𝞺2) ) ) / 𝝈 )^2

Two-sided Sample Size Formula:

  • n = ( (Z1-𝜶/2 √ 𝞺1(1- 𝞺1) ) + (Z1-𝜷 √ 𝞺2(1- 𝞺2) ) ) / 𝝈 )^2

Design & Implementation

When designing an A/B test, it’s key to: 1. Simplify the test changes as much as possible.

  • This ensures there are minimal confounding factors that could affect the outcome of a test.

    • I.e. do not make multiple changes in a single test group or we will not be able to attribute a specific change as the factor affecting the test outcome.

    • For instance, don’t add a new icon to the UI, while simultaneously adding emphasis to that icon without testing for just the icon placement (w/ no emphasis) in another test group.

    • Minimize the number of test variations

  • Running too many variations of tests at a time can also cause the time required to achieve appropriate power and confidence to increase exponentially.

    1. Monitor data collection to ensure the sample sizes are (approximately) equally distributed.

  • If the sample sizes begin to diverge, additional calculations must be performed to determine the minimum required sample sizes for adequate confidence and statistical power. Significantly different sample sizes also require a pooled variance calculation to account for the imbalances.

    1. Close out the test once adequate confidence and statistical power has been reached.

  • This will allow us to expedite analysis of the test outcomes and implementation of the changes if the desired outcome is achieved.

Analyzing Results

  1. Calculate mean value of the target metric

  2. Calculate the variance standard deviation for each test & control group

    • For proportion-based target metrics the calculation is:

    • Variance

      • s1 = (𝞺1 * (1-𝞺1)) / n1

      • s2 = (𝞺2 * (1-𝞺2)) / n2

    • Standard deviation

      • s1 = √ (𝞺1 * (1-𝞺1)) / n1

      • s2 = √ (𝞺2 * (1-𝞺2)) / n2

  3. Calculate p-value

    • Ensure no overlap between confidence intervals of tests and control

  4. Reject or accept null hypothesis

Once you’ve defined the target measure and established the success criteria, you must determine the of the test. The null hypothesis is always that the intervention will have no effect on the target measure i.e. Test Outcome = Control Outcome. The alternative hypothesis determines the type of change we’re testing for. If the alternative hypothesis is that the intervention will affect the measure, regardless of direction (Test Outcome ≠ Control Outcome), then the test is 2-sided. If the alternative hypothesis is that the intervention will increase (or decrease) our target measure (Test Outcome > | < Control Outcome) then this is a 1-sided test.

There are online AB test sample size calculators that allow you to determine the minimum required sample size per test variation. These tests allow you to quickly check the required sample size to achieve an adequate level of statistical power and confidence. provides the option to switch between one and two-sided tests, as well as specify industry standard powers and confidence levels. The duration of the test depends on the volume of users and instances being created over time. The fewer being created the longer the test will take. Given Mattermost's current Cloud sign up frequency it would take at least 12-16 weeks for a single test on new end users to reach a industry-accepted level of statistical power and confidence. Which requires patience our fast-paced culture does not often permit... All the more reason to increase the breadth and depth of our marketing and business development initiatives.

MDE statistic in this section
null and alternative hypotheses
This AB Test Sample Size Calculator by AB Testguide