Go Stash

go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch.
Alternatives To Go Stash
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Flink Learning13,801
3 months ago8apache-2.0Java
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Gpmall4,513
a year ago49apache-2.0Java
【咕泡学院实战项目】-基于SpringBoot+Dubbo构建的电商平台-微服务架构、商城、电商、微服务、高并发、kafka、Elasticsearch
Go Zero Looklook3,661
a month ago6mitGo
🔥基于go-zero(go zero) 微服务全技术栈开发最佳实践项目。Develop best practice projects based on the full technology stack of go zero (go zero) microservices.
Szt Bigdata1,986
a month ago13otherScala
深圳地铁大数据客流分析系统🚇🚄🌟
Rsyslog1,873
3 days ago1February 27, 2018755lgpl-3.0C
a Rocket-fast SYStem for LOG processing
Whatsmars1,842
6 months ago10apache-2.0Java
Java生态研究(Spring Boot + Redis + Dubbo + RocketMQ + Elasticsearch)🔥🔥🔥🔥🔥
Javakeeper1,766
2 months ago7Java
✍️ Java 工程师必备架构体系知识总结:涵盖分布式、微服务、RPC等互联网公司常用架构,以及数据存储、缓存、搜索等必备技能
Demo Scene1,402
2 months ago9apache-2.0Shell
👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
Gnomock1,25038a month ago111August 26, 202324mitGo
Test your code without writing mocks with ephemeral Docker containers 📦 Setup popular services with just a couple lines of code ⏱️ No bash, no yaml, only code 💻
Alnoda Workspaces1,200
4 months ago6agpl-3.0Dockerfile
:fireworks: Flexible and extendable containerized workspaces. Now. with free offline chat GPT!!! 🚀🚀🚀
Alternatives To Go Stash
Select To Compare


Alternative Project Comparisons
Readme

English |

go-stash

go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch.

go-stash is about 5x throughput more than logstash, and easy to deploy, only one executable file.

go-stash

Quick Start

Install

cd stash && go build stash.go

Quick Start

  • With binary
./stash -f etc/config.yaml
  • With docker, make sure the path of config file is correct.
docker run -d -v `pwd`/etc:/app/etc kevinwan/go-stash

The config.yaml example is as follows:

Clusters:
- Input:
    Kafka:
      Name: go-stash
      Log:
        Mode: file
      Brokers:
      - "172.16.48.41:9092"
      - "172.16.48.42:9092"
      - "172.16.48.43:9092"
      Topic: ngapplog
      Group: stash
      Conns: 3
      Consumers: 10
      Processors: 60
      MinBytes: 1048576
      MaxBytes: 10485760
      Offset: first
  Filters:
  - Action: drop
    Conditions:
      - Key: status
        Value: 503
        Type: contains
      - Key: type
        Value: "app"
        Type: match
        Op: and
  - Action: remove_field
    Fields:
    - message
    - source
    - beat
    - fields
    - input_type
    - offset
    - "@version"
    - _score
    - _type
    - clientip
    - http_host
    - request_time
  Output:
    ElasticSearch:
      Hosts:
      - "http://172.16.188.73:9200"
      - "http://172.16.188.74:9200"
      - "http://172.16.188.75:9200"
      Index: "go-stash-{{yyyy.MM.dd}}"
      MaxChunkBytes: 5242880
      GracePeriod: 10s
      Compress: false
      TimeZone: UTC

Details

input

Conns: 3
Consumers: 10
Processors: 60
MinBytes: 1048576
MaxBytes: 10485760
Offset: first

Conns

  • The number of links to kafka, the number of links is based on the number of cores of the CPU, usually <= the number of cores of the CPU.

Consumers

  • The number of open threads per connection, the calculation rule is Conns * Consumers, not recommended to exceed the total number of slices, for example, if the topic slice is 30, Conns * Consumers <= 30

Processors

  • The number of threads to process data, depending on the number of CPU cores, can be increased appropriately, the recommended configuration: Conns * Consumers * 2 or Conns * Consumers * 3, for example: 60 or 90

MinBytes MaxBytes

  • The default size of the data block from kafka is 1M~10M. If the network and IO are better, you can adjust it higher.

Offset

  • Optional last and false, the default is last, which means read data from kafka from the beginning

Filters

- Action: drop
  Conditions:
    - Key: k8s_container_name
      Value: "-rpc"
      Type: contains
    - Key: level
      Value: info
      Type: match
      Op: and
- Action: remove_field
  Fields:
    - message
    - _source
    - _type
    - _score
    - _id
    - "@version"
    - topic
    - index
    - beat
    - docker_container
    - offset
    - prospector
    - source
    - stream
- Action: transfer
  Field: message
  Target: data

- Action: drop

  • Delete flag: The data that meets this condition will be removed when processing and will not be entered into es
  • According to the delete condition, specify the value of the key field and Value, the Type field can be contains (contains) or match (match)
  • Splice condition Op: and, can also write or

- Action: remove_field

Remove_field_id: the field to be removed, just list it below

- Action: transfer

Transfer field identifier: for example, the message field can be redefined as a data field

Output

Index

  • Index name, indexname-{{yyyy.MM.dd}} for year. Month. Day, or {{yyyy-MM-dd}}, in your own format

MaxChunkBytes

  • The size of the bulk submitted to ES each time, default is 5M, can be adjusted according to the ES io situation.

GracePeriod

  • The default is 10s, which is used to process the remaining consumption and data within 10s after the program closes and exits gracefully

Compress

  • Data compression, compression will reduce the amount of data transferred, but will increase certain processing performance, optional value true/false, default is false

TimeZone

  • Default value is UTC, Universal Standard Time

ES performance write test

Test environment

  • stash server: 3 units 4 cores 8G
  • es server: 15 units 16 cores 64G

Key configuration

- Input:
      Conns: 3
      Consumers: 10
      Processors: 60
      MinBytes: 1048576
      MaxBytes: 10485760
  Filters:
  - Action: remove_field
    Fields:
    - Message
    - source
    - beat
    - fields
    - input_type
    - offset
    - request_time
  Output:
      Index: "nginx_pro-{{yyyy.MM.d}}"
      Compress: false
      MaxChunkBytes: 5242880
      TimeZone: UTC

Write speed is above 150k/s on average

go-stash

Acknowledgements

go-stash is powered by go-zero GitHub Repo stars GitHub forks for great performance!

Give a Star!

If you like or are using this project to learn or start your solution, please give it a star. Thanks!

Popular Elasticsearch Projects
Popular Kafka Projects
Popular Data Storage Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Go
Elasticsearch
Kafka
Logstash
Elk