Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Flink Learning | 13,198 | a month ago | apache-2.0 | Java | ||||||
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》 | ||||||||||
Cruise Control | 2,390 | 21 hours ago | 24 | September 09, 2021 | 149 | bsd-2-clause | Java | |||
Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters. | ||||||||||
Kafka Monitor | 1,946 | 7 days ago | 27 | apache-2.0 | Java | |||||
Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more. | ||||||||||
Kafka_exporter | 1,684 | 9 days ago | 21 | January 18, 2022 | 206 | apache-2.0 | Go | |||
Kafka exporter for Prometheus | ||||||||||
Jvm Profiler | 1,661 | 5 months ago | 17 | other | Java | |||||
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter | ||||||||||
Filodb | 1,375 | 3 hours ago | 31 | apache-2.0 | Scala | |||||
Distributed Prometheus time series database | ||||||||||
Metrictank | 630 | 3 | 3 months ago | 17 | November 26, 2021 | 33 | agpl-3.0 | Go | ||
metrics2.0 based, multi-tenant timeseries store for Graphite and friends. | ||||||||||
Firebolt | 620 | 5 days ago | 12 | March 01, 2022 | 7 | other | Go | |||
Golang framework for streaming ETL, observability data pipeline, and event processing apps | ||||||||||
Hypertrace | 468 | 3 months ago | 76 | other | Shell | |||||
An open source distributed tracing & observability platform | ||||||||||
Haystack | 281 | a year ago | 44 | apache-2.0 | HCL | |||||
Top level repository for Haystack, containing documentation and deployment scripts |
Uber JVM Profiler provides a Java Agent to collect various metrics and stacktraces for Hadoop/Spark JVM processes in a distributed way, for example, CPU/Memory/IO metrics.
Uber JVM Profiler also provides advanced profiling capabilities to trace arbitrary Java methods and arguments on the user code without user code change requirement. This feature could be used to trace HDFS name node call latency for each Spark application and identify bottleneck of name node. It could also trace the HDFS file paths each Spark application reads or writes and identify hot files for further optimization.
This profiler is initially created to profile Spark applications which usually have dozens of or hundreds of processes/machines for a single application, so people could easily correlate metrics of these different processes/machines. It is also a generic Java Agent and could be used for any JVM process as well.
mvn clean package
This command creates jvm-profiler.jar file with the default reporters like ConsoleOutputReporter, FileOutputReporter and KafkaOutputReporter bundled in it. If you want to bundle the custom reporters like RedisOutputReporter or InfluxDBOutputReporter in the jar file then provide the maven profile id for that reporter in the build command. For example to build a jar file with RedisOutputReporter, you can execute mvn -P redis clean package
command. Please check the pom.xml file for available custom reporters and their profile ids.
You could upload jvm-profiler jar file to HDFS so the Spark application executors could access it. Then add configuration like following when launching Spark application:
--conf spark.jars=hdfs://hdfs_url/lib/jvm-profiler-1.0.0.jar
--conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar
Following command will start the example application with the profiler agent attached, which will report metrics to the console output:
java -javaagent:target/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,tag=mytag,metricInterval=5000,durationProfiling=com.uber.profiling.examples.HelloWorldApplication.publicSleepMethod,argumentProfiling=com.uber.profiling.examples.HelloWorldApplication.publicSleepMethod.1,sampleInterval=100 -cp target/jvm-profiler-1.0.0.jar com.uber.profiling.examples.HelloWorldApplication
Use following command to run jvm profiler with executable jar application.
java -javaagent:/opt/jvm-profiler/target/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000,durationProfiling=foo.bar.FooAppication.barMethod,sampleInterval=5000 -jar foo-application.jar
Set the jvm profiler in CATALINA_OPTS before starting the tomcat server. Check logs/catalina.out file for metrics.
export CATALINA_OPTS="$CATALINA_OPTS -javaagent:/opt/jvm-profiler/target/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000,durationProfiling=foo.bar.FooController.barMethod,sampleInterval=5000"
Use following command to use jvm profiler with Spring Boot 2.x. For Spring Boot 1.x use -Drun.arguments
instead of -Dspring-boot.run.jvmArguments
in following command.
mvn spring-boot:run -Dspring-boot.run.jvmArguments="-javaagent:/opt/jvm-profiler/target/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.ConsoleOutputReporter,metricInterval=5000,durationProfiling=foo.bar.FooController.barMethod,sampleInterval=5000"
Uber JVM Profiler supports sending metrics to Kafka. For example,
java -javaagent:target/jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,metricInterval=5000,brokerList=localhost:9092,topicPrefix=profiler_ -cp target/jvm-profiler-1.0.0.jar com.uber.profiling.examples.HelloWorldApplication
It will send metrics to Kafka topic profiler_CpuAndMemory. See bottom of this document for an example of the metrics.
Uber JVM Profiler supports following features:
Debug memory usage for all your spark application executors, including java heap memory, non-heap memory, native memory (VmRSS, VmHWM), memory pool, and buffer pool (directed/mapped buffer).
Debug CPU usage, Garbage Collection time for all spark executors.
Debug arbitrary java class methods (how many times they run, how much duration they spend). We call it Duration Profiling.
Debug arbitrary java class method call and trace it argument value. We call it Argument Profiling.
Do Stacktrack Profiling and generate flamegraph to visualize CPU time spent for the spark application.
Debug IO metrics (disk read/write bytes for the application, CPU iowait for the machine).
Debug JVM Thread Metrics like Count of Total Threads, Peak Threads, Live/Active Threads and newly Threads.
The java agent supports following parameters, which could be used in Java command line like "-javaagent:agent_jar_file.jar=param1=value1,param2=value2":
reporter: class name for the reporter, e.g. com.uber.profiling.reporters.ConsoleOutputReporter, or com.uber.profiling.reporters.KafkaOutputReporter, which are already implemented in the code. You could implement your own reporter and set the name here.
configProvider: class name for the config provider, e.g. com.uber.profiling.YamlConfigProvider, which are already implemented in the code. You could implement your own config provider and set the name here.
configFile: config file path to be used by YamlConfigProvider (if configProvider is set to com.uber.profiling.YamlConfigProvider). This could be a local file path or HTTP URL.
tag: plain text string which will be reported together with the metrics.
metricInterval: how frequent to collect and report the metrics, in milliseconds.
durationProfiling: configure to profile specific class and method, e.g. com.uber.profiling.examples.HelloWorldApplication.publicSleepMethod. It also support wildcard (*) for method name, e.g. com.uber.profiling.examples.HelloWorldApplication.*.
argumentProfiling: configure to profile specific method argument, e.g. com.uber.profiling.examples.HelloWorldApplication.publicSleepMethod.1 (".1" means getting value for the first argument and sending out in the reporter).
sampleInterval: frequency (milliseconds) to do stacktrace sampling, if this value is not set or zero, the profiler will not do stacktrace sampling.
ioProfiling: whether to profile IO metrics, could be true or false.
brokerList: broker list if using com.uber.profiling.reporters.KafkaOutputReporter.
topicPrefix: topic prefix if using com.uber.profiling.reporters.KafkaOutputReporter. KafkaOutputReporter will send metrics to multiple topics with this value as the prefix for topic names.
outputDir: output directory if using com.uber.profiling.reporters.FileOutputReporter. FileOutputReporter will write metrics into this directory.
The parameters could be provided as arguments in java command, or in a YAML config file if you use configProvider=com.uber.profiling.YamlConfigProvider. Following is an example of the YAML config file:
reporter: com.uber.profiling.reporters.ConsoleOutputReporter
metricInterval: 5000
Following is an example of CPU and Memory metrics when using ConsoleOutputReporter or KafkaOutputReporter:
{
"nonHeapMemoryTotalUsed": 11890584.0,
"bufferPools": [
{
"totalCapacity": 0,
"name": "direct",
"count": 0,
"memoryUsed": 0
},
{
"totalCapacity": 0,
"name": "mapped",
"count": 0,
"memoryUsed": 0
}
],
"heapMemoryTotalUsed": 24330736.0,
"epochMillis": 1515627003374,
"nonHeapMemoryCommitted": 13565952.0,
"heapMemoryCommitted": 257425408.0,
"memoryPools": [
{
"peakUsageMax": 251658240,
"usageMax": 251658240,
"peakUsageUsed": 1194496,
"name": "Code Cache",
"peakUsageCommitted": 2555904,
"usageUsed": 1173504,
"type": "Non-heap memory",
"usageCommitted": 2555904
},
{
"peakUsageMax": -1,
"usageMax": -1,
"peakUsageUsed": 9622920,
"name": "Metaspace",
"peakUsageCommitted": 9830400,
"usageUsed": 9622920,
"type": "Non-heap memory",
"usageCommitted": 9830400
},
{
"peakUsageMax": 1073741824,
"usageMax": 1073741824,
"peakUsageUsed": 1094160,
"name": "Compressed Class Space",
"peakUsageCommitted": 1179648,
"usageUsed": 1094160,
"type": "Non-heap memory",
"usageCommitted": 1179648
},
{
"peakUsageMax": 1409286144,
"usageMax": 1409286144,
"peakUsageUsed": 24330736,
"name": "PS Eden Space",
"peakUsageCommitted": 67108864,
"usageUsed": 24330736,
"type": "Heap memory",
"usageCommitted": 67108864
},
{
"peakUsageMax": 11010048,
"usageMax": 11010048,
"peakUsageUsed": 0,
"name": "PS Survivor Space",
"peakUsageCommitted": 11010048,
"usageUsed": 0,
"type": "Heap memory",
"usageCommitted": 11010048
},
{
"peakUsageMax": 2863661056,
"usageMax": 2863661056,
"peakUsageUsed": 0,
"name": "PS Old Gen",
"peakUsageCommitted": 179306496,
"usageUsed": 0,
"type": "Heap memory",
"usageCommitted": 179306496
}
],
"processCpuLoad": 0.0008024004394748531,
"systemCpuLoad": 0.23138430784607697,
"processCpuTime": 496918000,
"appId": null,
"name": "[email protected]",
"host": "machine01",
"processUuid": "3c2ec835-749d-45ea-a7ec-e4b9fe17c23a",
"tag": "mytag",
"gc": [
{
"collectionTime": 0,
"name": "PS Scavenge",
"collectionCount": 0
},
{
"collectionTime": 0,
"name": "PS MarkSweep",
"collectionCount": 0
}
]
}
A list of all metrics and information corresponding to them can be found here.
We can take the output of Stacktrack Profiling to generate flamegraph to visualize CPU time. Using the Python script stackcollapse.py
, following command will collapse Stacktrack Profiling json output file to the input file format for generating flamegraph. The script flamegraph.pl
can be found at FlameGraph.
python stackcollapse.py -i Stacktrace.json > Stacktrace.folded
flamegraph.pl Stacktrace.folded > Stacktrace.svg
Note that it is required to enable stacktrace sampling, in order to generate flamegraph. To enable it, please set sampleInterval
parameter. If it is not set or zero, the profiler will not do stacktrace sampling.