Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Bigdata Notes | 14,410 | 2 months ago | 37 | Java | ||||||
大数据入门指南 :star: | ||||||||||
Cookbook | 12,447 | 2 days ago | 111 | apache-2.0 | ||||||
The Data Engineering Cookbook | ||||||||||
Cmak | 11,564 | 4 months ago | 513 | apache-2.0 | Scala | |||||
CMAK is a tool for managing Apache Kafka clusters | ||||||||||
God Of Bigdata | 8,483 | 4 months ago | 3 | |||||||
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... | ||||||||||
Kafka Ui | 7,377 | a day ago | 2 | December 09, 2022 | 330 | apache-2.0 | Java | |||
Open-Source Web UI for Apache Kafka Management | ||||||||||
Risingwave | 5,446 | a day ago | 11 | November 10, 2023 | 891 | apache-2.0 | Rust | |||
The streaming database: redefining stream processing 🌊. PostgreSQL-compatible, highly performant, scalable, elastic, and reliable ☁️. | ||||||||||
Bigdataguide | 2,313 | 6 days ago | Java | |||||||
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料 | ||||||||||
Flinkstreamsql | 1,873 | a year ago | 86 | apache-2.0 | Java | |||||
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法 | ||||||||||
Bigdata Interview | 1,397 | 2 years ago | n,ull | |||||||
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结 | ||||||||||
Bigdata Growth | 1,066 | a month ago | 1 | mit | Shell | |||||
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。 |
CMAK (previously known as Kafka Manager) is a tool for managing Apache Kafka clusters. See below for details about the name change.
CMAK supports the following:
Cluster Management
Topic List
Topic View
Consumer List View
Consumed Topic View
Broker List
Broker View
The minimum configuration is the zookeeper hosts which are to be used for CMAK (pka kafka manager) state. This can be found in the application.conf file in conf directory. The same file will be packaged in the distribution zip file; you may modify settings after unzipping the file on the desired server.
cmak.zkhosts="my.zookeeper.host.com:2181"
You can specify multiple zookeeper hosts by comma delimiting them, like so:
cmak.zkhosts="my.zookeeper.host.com:2181,other.zookeeper.host.com:2181"
Alternatively, use the environment variable ZK_HOSTS
if you don't want to hardcode any values.
ZK_HOSTS="my.zookeeper.host.com:2181"
You can optionally enable/disable the following functionality by modifying the default list in application.conf :
application.features=["KMClusterManagerFeature","KMTopicManagerFeature","KMPreferredReplicaElectionFeature","KMReassignPartitionsFeature"]
Consider setting these parameters for larger clusters with jmx enabled :
Here is an example for a kafka cluster with 10 brokers, 100 topics, with each topic having 10 partitions giving 1000 total partitions with JMX enabled :
The follow control consumer offset cache's thread pool and queue :
You should increase the above for large # of consumers with consumer polling enabled. Though it mainly affects ZK based consumer polling.
Kafka managed consumer offset is now consumed by KafkaManagedOffsetCache from the "__consumer_offsets" topic. Note, this has not been tested with large number of offsets being tracked. There is a single thread per cluster consuming this topic so it may not be able to keep up on large # of offsets being pushed to the topic.
Warning, you need to have SSL configured with CMAK (pka Kafka Manager) to ensure your credentials aren't passed unencrypted. Authenticating a User with LDAP is possible by passing the user credentials with the Authorization header. LDAP authentication is done on first visit, if successful, a cookie is set. On next request, the cookie value is compared with credentials from Authorization header. LDAP support is through the basic authentication filter.
Note: LDAP is unencrypted and insecure. LDAPS is a commonly implemented extension that implements an encryption layer in a manner similar to how HTTPS adds encryption to an HTTP. LDAPS has not been documented, and the specification is not formally defined anywhere. LDAP + StartTLS is the currently recommended way to start an encrypted channel, and it upgrades an existing LDAP connection to achieve this encryption.
The command below will create a zip file which can be used to deploy the application.
./sbt clean dist
Please refer to play framework documentation on production deployment/configuration.
If java is not in your path, or you need to build against a specific java version, please use the following (the example assumes zulu java11):
$ PATH=/usr/lib/jvm/zulu-11-amd64/bin:$PATH \
JAVA_HOME=/usr/lib/jvm/zulu-11-amd64 \
/path/to/sbt -java-home /usr/lib/jvm/zulu-11-amd64 clean dist
This ensures that the 'java' and 'javac' binaries in your path are first looked up in the correct location. Next, for all downstream tools that only listen to JAVA_HOME, it points them to the java11 location. Lastly, it tells sbt to use the java11 location as well.
After extracting the produced zipfile, and changing the working directory to it, you can run the service like this:
$ bin/cmak
By default, it will choose port 9000. This is overridable, as is the location of the configuration file. For example:
$ bin/cmak -Dconfig.file=/path/to/application.conf -Dhttp.port=8080
Again, if java is not in your path, or you need to run against a different version of java, add the -java-home option as follows:
$ bin/cmak -java-home /usr/lib/jvm/zulu-11-amd64
To add JAAS configuration for SASL, add the config file location at start:
$ bin/cmak -Djava.security.auth.login.config=/path/to/my-jaas.conf
NOTE: Make sure the user running CMAK (pka kafka manager) has read permissions on the jaas config file
If you'd like to create a Debian or RPM package instead, you can run one of:
sbt debian:packageBin
sbt rpm:packageBin
Most of the utils code has been adapted to work with Apache Curator from Apache Kafka.
CMAK was renamed from its previous name due to this issue. CMAK is designed to be used with Apache Kafka and is offered to support the needs of the Kafka community. This project is currently managed by employees at Verizon Media and the community who supports this project.
Licensed under the terms of the Apache License 2.0. See accompanying LICENSE file for terms.
Producer offset is polled. Consumer offset is read from the offset topic for Kafka based consumers. This means the reported lag may be negative since we are consuming offset from the offset topic faster then polling the producer offset. This is normal and not a problem.