githubEdit

Data Engineering

Data Engineering Infrastructure

GitHub

Amazon EKS

Airflow

Kubernetes Secrets

Snowflake

  • Snowflakearrow-up-right is a cloud- and SQL- based data warehouse platform that allows you to separate query compute power from data storage. It uses a proprietary data format for storing data and strives to provide a service that means you don’t need a DBA to constantly monitor and tweak to keep the warehouse performant.

Virtual warehouse

  • Virtual warehousesarrow-up-right are Snowflake’s concept for a cluster of compute resources that can execute queries. You are billed based on how the size of the Virtual Warehouse and how long it is running for.

Stage and COPY

  • Stagesarrow-up-right in Snowflake allow you to specify an external data source that you want to load data from. Once specified, you can run a simple COPY INTO command with a pattern, and in our case, will allow us to import data from S3 buckets. You can see how we utilize this herearrow-up-right.

dbt (data build tool)

Meltano permissions

  • Meltanoarrow-up-right is an overall framework that includes a lot more than what we’re using it for.

    We’re actually just using a small piece that allows us to control our Snowflake user and role permissions in a fine-grained way.

  • The specific piece we use is herearrow-up-right.

  • We use this in the snowflake_permissions DAGarrow-up-right. The interesting piece is the container_cmd lines. Essentially we create an entire Meltano project, but then just use a roles.ymlarrow-up-right file that we define to control the permissions.

Data sources

Telemetry

  • Telemetry data is data that's sent from Mattermost servers and makes its way to our data warehouse.

  • Currently, we use Segmentarrow-up-right to push this data to Snowflake.

    • The data is available in its raw form in the Raw database, in the mattermost2 and mattermost_nps schemas.

  • We currently have a dbt model herearrow-up-right that uses this raw data, but will continue to add more.

  • Active User Counts

Push notifications

  • Mattermost runs a proxy service that allows notifications to be sent through Apple and Google’s respective notification services for mobile notifications. The log data is put into an S3 bucket and then ingested using Snowflake Stage and COPY. See herearrow-up-right for more details.

Server release

  • To help us track how many Mattermost servers are being deployed, there's a pingback which gets logged to an S3 bucket that we import. Details herearrow-up-right.

License data

  • Mattermost’s enterprise license metadata is exported nightly to an S3 bucket and then we import it daily. Code herearrow-up-right.

Stitch data

  • Google Analytics

    • Overview

      • Google Analytics - Stitch integration has a lot of caveats and limitations.

      • Known limitations:

        • Each set of dimensions and measures from Google Analytics needs to have its own Stitch integration.

        • Each integration creates a schema in Snowflake that matches the name of the integration and adds a table called report.

          • Name: GA ChannelGrouping Source Users Org Schema: analytics.ga_channelgrouping_source_users_org Table: analytics.ga_channelgrouping_source_users_org.report

        • Once an integration is created, it can't be edited. If you need to make changes, you need to delete the integration and start over.

        • Data is only pulled at a daily level.

          • This is an issue because Unique Monthly Users is not the same as Aggregated Unique Daily Users.

    • Mattermost.com

    • Developers.Mattermost.com

Last updated

Was this helpful?