Snowplow

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Alternatives To Snowplow
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Snowplow6,576
3 days ago42December 03, 201958apache-2.0Scala
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Scio2,485365 days ago91August 18, 2023151apache-2.0Scala
A Scala API for Apache Beam and Google Cloud Dataflow.
Hivemq Mqtt Tensorflow Kafka Realtime Iot Machine Learning Training Inference159
3 years ago4apache-2.0Jupyter Notebook
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Selfservicekiosk Audio Streaming117
7 months ago62apache-2.0JavaScript
A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
Streamx95
4 years ago26apache-2.0Java
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Pixelstreamingcloudguide78
2 months agocc-by-4.0
A guide to Pixel Streaming in the cloud (AWS, Azure, GCP, Linux, Windows).
Vue Speech Streaming64
a year ago2mitJavaScript
A Vue2 Streaming Speech Recognition Speech to text with Google Cloud Speech
Weddell22
2 years ago5mitElixir
A Google Pub/Sub library for Elixir
Kio1229 months ago1July 20, 20201apache-2.0Kotlin
Kotlin extensions for Apache Beam
Streaming Data Workshop10
5 years ago13Java
Streaming data workshop with Infinispan, Vert.x and OpenShift
Alternatives To Snowplow
Select To Compare


Alternative Project Comparisons
Readme

Snowplow logo

Release Release activity Latest release Docker pulls Discourse posts License

Overview

Snowplow is a developer-first engine for collecting behavioral data. In short, it allows you to:

Thousands of organizations around the world generate, enhance, and model behavioral data with Snowplow to fuel advanced analytics, AI/ML initiatives, or composable CDPs.

Table of contents

Why Snowplow?

  • 🏔️ Rock solid architecture capable of processing billions of events per day.
  • 🛠️ Over 20 SDKs to collect data from web, mobile, server-side, and other sources.
  • ✅ A unique approach based on schemas and validation ensures your data is as clean as possible.
  • 🪄 Over 15 enrichments to get the most out of your data.
  • 🏭 Send data to popular warehouses and streams — Snowplow fits nicely within the Modern Data Stack.

➡ Where to start? ⬅️

Snowplow Open Source Snowplow Behavioral Data Platform
Our Open Source solution equips you with everything you need to start creating behavioral data in a high-fidelity, machine-readable way. Head over to the Quick Start Guide to set things up. Looking for an enterprise solution with a console, APIs, data governance, workflow tooling? The Behavioral Data Platform is our managed service that runs in your AWS or GCP cloud. Check out Try Snowplow.

The documentation is a great place to learn more, especially:

  • Tracking design — discover how to approach creating your data the Snowplow way.
  • Pipelines — understand what’s under the hood of Snowplow.

Would rather dive into the code? Then you are already in the right place!


Snowplow technology 101

Snowplow architecture

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats.

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 15 trackers, covering web, mobile, desktop, server and IoT
  • Collector receives Snowplow events from trackers. Currently we have one official collector implementation with different sinks: Amazon Kinesis, Google PubSub, Amazon SQS, Apache Kafka and NSQ
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have several implementations, built for different environments (GCP, AWS, Apache Kafka) and one core library
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift, Postgres, Snowflake and BigQuery databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We officially support data models for Redshift, Snowflake and BigQuery.
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.

Version Compatibility Matrix

To make sure all the components work well together, we strongly recommended you take a look at the compatibility matrix when setting up a Snowplow pipeline.


About this repository

This repository is an umbrella repository for all loosely-coupled Snowplow components and is updated on each component release.

Since June 2020, all components have been extracted into their dedicated repositories (more info here) and this repository serves as an entry point for Snowplow users, the home of our public roadmap and as a historical artifact.

Components that have been extracted to their own repository are still here as git submodules.

Trackers

A full list of supported trackers can be found on our documentation site. Popular trackers and use cases include:

Web Mobile Gaming TV Desktop & Server
JavaScript Android Unity Roku Command line
AMP iOS C++ iOS .NET
React Native Lua Android Go
Flutter React Native Java
Node.js
PHP
Python
Ruby
Scala
C++
Rust
Lua

Collector

Enrich

Loaders

Iglu

Data modeling

Web

Mobile

Media

Retail

Testing

Parsing enriched event

Bad rows

Terraform Modules


Public Roadmap

This repository also contains the Snowplow Public Roadmap. The Public Roadmap lets you stay up to date and find out what's happening on the Snowplow Platform. Help us prioritize our cards: open the issue and leave a 👍 to vote for your favorites. Want us to build a feature or function? Tell us by heading to our Discourse forum 💬.

Community

We want to make it super easy for Snowplow users and contributors to talk to us and connect with one another, to share ideas, solve problems and help make Snowplow awesome. Join the conversation:

  • Meetups. Don’t miss your chance to talk to us in person. We are often on the move with meetups in Amsterdam, Berlin, Boston, London, and more.
  • Discourse. Our forum for all Snowplow users: engineers setting up Snowplow, data modelers structuring the data, and data consumers building insights. You can find guides, recipes, questions and answers from Snowplow users and the Snowplow team. All questions and contributions are welcome!
  • Twitter. Follow @Snowplow for official news and @SnowplowLabs for engineering-heavy conversations and release announcements.
  • GitHub. If you spot a bug, please raise an issue in the GitHub repository of the component in question. Likewise, if you have developed a cool new feature or an improvement, please open a pull request, we’ll be glad to integrate it in the codebase! For brainstorming a potential new feature, Discourse is the best place to start.
  • Email. If you want to talk to Snowplow directly, email is the easiest way. Get in touch at [email protected].

Copyright and license

Snowplow is copyright 2012-2023 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Popular Google Cloud Platform Projects
Popular Streaming Projects
Popular Cloud Computing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Scala
Streaming
Gcp
Bigquery
Snowflake
Redshift