Mattermost Handbook
Need help?How to spend company moneyHow to update the HandbookRelease overview
0.2.1
0.2.1
  • Mattermost Handbook
  • Company
    • About Mattermost
      • List of terms
      • Business model
      • Mindsets
    • "How to" guides for staff
      • How to set up a 1-1 channel
      • How to update the handbook
      • How to manage Handbook notifications
      • How to change mobile device
        • How to handle a lost mobile device
      • How to do a mini-retrospective
      • How to autolink keywords in Mattermost
  • Operations
    • Company operations
      • Areas of Responsibility
      • Mattermost Leadership Team (MLT)
        • MLT cadence
      • Company measures
        • Metrics definitions
        • FY23 goals board
        • MLT metrics
      • Company cadence
      • Company policies
        • Community response policy
        • Security policy
      • Company processes
        • Issue/solution process
        • Company agreements
        • Publishing
          • Public web properties
          • Publishing guidelines
            • Brand and visual design guidelines
            • Voice, tone, and writing style guidelines
              • Contribute to documentation
            • Confidentiality guidelines
          • Post-publication quality control process
      • Handbook processes and policies
        • Handbook onboarding
      • Fiscal year planning
    • Research and Development
      • Organization
        • Tech Writing
        • Data engineering
        • Delivery
        • Cloud Platform
        • Site Reliability Engineering
        • GRC
        • Product Security
        • Security Operations
      • Processes
        • Feature Labels
      • Product
        • Product planning
          • Product philosophy and principles
          • Prioritization process
          • Release planning process
          • Roadmap views
          • Release plan
          • Launch plan
          • Feature requests
        • Development process
          • Mobile feature guidelines
          • Deprecation policy
          • Mattermost software requirements process
          • Jira ticket lifecycle
          • Creating new Jira bug tickets
            • Priority levels for tickets
            • Jira fix versions
        • Release process
          • Release overview
          • Feature release process
          • Dot release process
          • Security release process
          • Mobile app release process
          • Desktop app release process
          • Release tips
          • Release scorecard definitions
        • How-to guides for Product
          • How to use productboard
          • How to record a roadmap video
          • How to update integrations directory
          • How to write a feature release announcement
        • Product Management team handbook
          • Product Management Areas of Ownership
          • Product Manager onboarding
          • Product Manager levels
          • Professional development
        • Product Design team handbook
          • Product Design levels
        • Technical Writing team handbook
          • Work with us
          • User interface text guidelines
          • Documentation style guide
          • Our terminology
          • Guidelines for PMs and developers
          • Guidelines for community contributions
          • Technical Writer levels
          • Docathon 2021
            • Getting started with contributing
        • Growth
          • A/B testing methodology
          • PQL definition
        • Analytics
          • Product Analyst Engineer levels
          • Looker
            • Dashboards
            • Explores
          • Telemetry
        • Developer relations
        • Product team hangouts
      • Engineering
        • Infrastructure engineering
          • Cloud infrastructure cost KPIs
          • Cloud data export process
          • Cloud churn process
          • Reliability Manifesto
          • Production Readiness Review
          • Infrastructure Library
        • Integrations team processes
        • Plugin release process
        • Data Engineering
        • Sustained Engineering
          • On call
        • How to go to a conference
        • Public speaking
        • Core contributor expanded access policy
      • Quality Assurance
        • QA workflow
        • QA testing tips and tools
        • Rainforest process
    • Messaging and Math
      • How-to guides for M&M
        • How to create release announcements
        • How to create screenshots and GIFs
        • How to write Mattermost case studies
        • How to write guest blog posts for Mattermost apps and services
        • How to write Mattermost recipes
        • How to compose tweets
        • How to create a split test for web page
        • How to run meetups
        • How to run executive dinners
      • Checklists for M&M
        • Blog post checklist
        • Bio checklist
      • Mattermost websites
      • Demand generation reporting
      • M&M Asana guidelines
      • Content marketing
        • How to use the editorial calendar
        • Content development and distribution
        • Video content guidelines
        • How to contribute content
    • Sales
      • Deal Desk
      • Partner programs
      • Lead management
    • Deployment Engineering
      • Overview
      • Workflows
      • Frequently Asked Questions
      • Playbook for MME Sev 1 Outages
      • Status Update Template
    • Program Management
    • Customer Success
      • Customer Support
    • Legal
      • Contracts
      • Ironclad Basics
        • Company-Wide Workflows
        • Sales Contracts and Workflows
        • Signing a Contract and Contract Repository
    • Finance
      • Budget
      • How to use Airbase
        • Access Airbase
        • Navigate Airbase
        • How to submit a purchase request
        • How to submit a reimbursement request
        • How to review a reimbursement request
        • Vendor portal guide
        • Frequently asked questions
      • Onboarding
        • Vendor onboarding
        • ROW staff onboarding
      • Staff member expenses
        • How to spend company money
        • How to spend company money: Internships
        • Corporate credit card policy
        • How to access Airbase
        • Gifting policy
        • How to book airfare and travel
        • How to reimburse the company
        • How to convert currencies
        • How to get paid
      • Arrange a Bounty Program
      • Naming files and agreements
      • Risk management
        • Mattermost U.S. consulting agreements
      • Operations playbook
    • Security
      • Policies
      • Privacy
        • Data deletion requests
        • Data subject access requests
      • Product Security
        • Product Vulnerability Process
        • Working on security-sensitive pull requests
        • Secure Software Development guide
      • Security Operations
        • User guides
    • Workplace
      • PeopleOps
        • HR cadences
        • HR systems
        • HR Processes
        • Working at Mattermost
          • Onboarding
            • Things everyone must know
            • Staff onboarding
            • Engineer onboarding timeline and expectations
            • Manager onboarding
            • Frequently asked questions
          • Learning and development
          • Mattermost communication best practices
          • Paid time off
            • Out of office email example
          • Travel
            • Business travel insurance
          • Leaves of absence
            • Pregnancy leave
            • Baby bonding parental leave
            • Jury duty
          • Workplace program
          • Relocation
          • Total rewards
        • Performance reviews
          • Formal review process
          • New staff performance review
          • Informal review process
        • Transfers and promotions
        • Offboarding instructions for managers
        • People compliance
      • People policies
      • Groups
        • Staff Resource Groups
      • Approvals and iteration
      • IT
        • IT helpdesk
        • Hardware and software purchases
        • Hardware buy back policy
        • Software systems
  • Contributors
    • Contributors
      • Equity, diversity, and inclusion
      • How to contribute to Mattermost
        • Community Content program
        • Documentation contributions
        • Help Wanted tickets
        • Localization
        • Contribution events
      • Mattermost community
      • Contributor kindness
      • Community systems
      • Guidelines and playbooks
        • Social engagement guidelines
        • Contribution guidelines and code of conduct
        • Mattermost Community playbook
        • How to run a Hackathon
        • Hacktoberfest event organizer guide for Mattermost
    • MatterCon
      • Staff information privacy management
      • Mattermost events code of conduct
      • MatterCon2021
    • Join us
      • Ice-breakers
      • Help Wanted tickets
      • Localization
      • Mattermost GitHub sponsorship
      • Things candidates should know
      • Staff recruiting
      • Recruiting cadences
        • Product Manager hiring process
      • Exec recruiting
        • EA logistics
  • Help and support
    • Contact us
Powered by GitBook
On this page
  • Data Engineering Infrastructure
  • GitHub
  • Amazon EKS
  • Airflow
  • Kubernetes Secrets
  • Snowflake
  • dbt (data build tool)
  • Meltano permissions
  • Data sources
  • Telemetry
  • Push notifications
  • Server release
  • Stitch data

Was this helpful?

Edit on Git
Export as PDF
  1. Operations
  2. Research and Development
  3. Engineering

Data Engineering

PreviousPlugin release processNextSustained Engineering

Last updated 4 years ago

Was this helpful?

Data Engineering Infrastructure

GitHub

    • To access the repository, you must be a member of the Core Developers team.

Amazon EKS

Airflow

  • This is a screenshot of our actual Airflow installation and gives an example of the UI.

Kubernetes Secrets

Snowflake

Virtual warehouse

Stage and COPY

dbt (data build tool)

  • We use it to transform raw data in our Snowflake warehouse into more easily usable tables and views.

    • Then, you can define models that reference the sources.

    • Example:

      • The filename (account_daily_arr) of the model file determines the object name in the database (in this case a table).

Meltano permissions

  • We’re actually just using a small piece that allows us to control our Snowflake user and role permissions in a fine-grained way.

Data sources

Telemetry

  • Telemetry data is data that's sent from Mattermost servers and makes its way to our data warehouse.

    • The data is available in its raw form in the Raw database, in the mattermost2 and mattermost_nps schemas.

  • Active User Counts

Push notifications

Server release

License data

Stitch data

  • Google Analytics

    • Overview

      • Google Analytics - Stitch integration has a lot of caveats and limitations.

      • Known limitations:

        • Each set of dimensions and measures from Google Analytics needs to have its own Stitch integration.

        • Each integration creates a schema in Snowflake that matches the name of the integration and adds a table called report.

          • Name: GA ChannelGrouping Source Users Org Schema: analytics.ga_channelgrouping_source_users_org Table: analytics.ga_channelgrouping_source_users_org.report

        • Once an integration is created, it can't be edited. If you need to make changes, you need to delete the integration and start over.

        • Data is only pulled at a daily level.

          • This is an issue because Unique Monthly Users is not the same as Aggregated Unique Daily Users.

    • Mattermost.com

      • Owner: Kevin Fayle

      • Stitch integrations:

          • Frequency: 6 hours

          • Dimensions: Page Path, Page Title

          • Measures: Page Visits, Unique Page Visits, Avg Time on Page

    • Developers.Mattermost.com

      • Owner: Kevin Fayle

      • Stitch integrations:

          • Frequency: 6 hours

          • Dimensions: Page Path, Page Title

          • Measures: Page Visits, Unique Page Visits, Avg Time on Page

Link:

EKS is a managed Kubernetes service that allows us to deploy, orchestrate, and run our code. The main benefit of Kubernetes is being able to declaratively specify the resources you need and how much CPU and memory they require, and Kubernetes will figure out how to make it work. It will also attempt to restart VMs that have failed. We make use of for our images. We also use .

To keep our data and configuration confidential, we make use of which are only shared with team members who need access in LastPass.

To access Airflow, you must be on VPN go to this . The Airflow Creds are stored in the Shared-BizOps LastPass folder.

is a workflow orchestration tool built in Python that allows you to build and schedule . With these DAGs we can schedule jobs to run using and also declare dependencies between jobs so that we can ensure that data that we’re processing doesn’t get overwritten. Airflow also has great utilities for retrying failed jobs and alerting for job and DAG failures.

We take advantage of to send DAG failures to a special internal Mattermost channel called BizOps where team members can triage the failure. We ensure that these get sent to Mattermost with our which is specified in each .

We also utilize to automatically pull from the master branch of our every 60 seconds so our DAGs are always up to date.

We use Airflow’s new that allows each of our jobs to run in its own Kubernetes Pod. The real flexibility with this is that because it’s simply a Kubernetes Pod running a process, we can actually run any job in any language. It also isolates the compute and memory for all the jobs, and we can even customize how much compute and memory power we give to each job so if a job requires more power we can grant it that.

To keep our connection strings and other configuration items confidential, we utilize and inject those as environment variables into our Kubernetes Pods. To inject a secret into the environment of a job run through an Airflow DAG, you must specify it in the then you can import it in the and then finally inject it into the itself.

is a cloud- and SQL- based data warehouse platform that allows you to separate query compute power from data storage. It uses a proprietary data format for storing data and strives to provide a service that means you don’t need a DBA to constantly monitor and tweak to keep the warehouse performant.

are Snowflake’s concept for a cluster of compute resources that can execute queries. You are billed based on how the size of the Virtual Warehouse and how long it is running for.

in Snowflake allow you to specify an external data source that you want to load data from. Once specified, you can run a simple COPY INTO command with a pattern, and in our case, will allow us to import data from S3 buckets. You can see how we utilize this .

is a tool, written in Python, that allows you to execute the transform step of your ELT or ETL process.

Our dbt implementation is .

Dbt has a concept of and .

An example sources file is and this specifies already existing raw data that dbt can pull from to build models.

.

is an overall framework that includes a lot more than what we’re using it for.

The specific piece we use is .

We use this in the . The interesting piece is the container_cmd lines. Essentially we create an entire Meltano project, but then just use a file that we define to control the permissions.

This data is detailed .

Currently, we use to push this data to Snowflake.

We currently have a dbt model that uses this raw data, but will continue to add more.

Mattermost servers ping a Cloudfront endpoint with some basic telemetry. It uses the log format specified .

The import job uses code .

Mattermost runs a proxy service that allows notifications to be sent through Apple and Google’s respective notification services for mobile notifications. The log data is put into an S3 bucket and then ingested using Snowflake Stage and COPY. See for more details.

To help us track how many Mattermost servers are being deployed, there's a pingback which gets logged to an S3 bucket that we import. Details .

Mattermost’s enterprise license metadata is exported nightly to an S3 bucket and then we import it daily. Code .

Mattermost Data Warehouse
https://aws.amazon.com/eks/
Dockerfiles
Bitnami’s Airflow helm chart
Kubernetes Secrets
link
Apache Airflow
DAGs
crontab style scheduling
Mattermost incoming webhooks
failed task callback
DAGs configuration
Bitnami’s Get DAG files from a git repository
GitHub repository
KubernetesPodOperator
Kubernetes Secrets
kube_secrets.py
DAG file
Operator object
Snowflake
Virtual warehouses
Stages
here
dbt
here
sources
models
here
https://github.com/mattermost/mattermost-data-warehouse/blob/master/transform/snowflake-dbt/models/finance/**account_daily_arr**.sql
Meltano
here
snowflake_permissions DAG
roles.yml
here
Segment
here
here
here
here
here
here
Google Analytics Link
GA Mattermost Com Pages Visits
Google Analytics Link
GA Developers Pages Visits