Awesome Open Source
Awesome Open Source

SZT-bigdata



   ___     ____   _____           _         _      __ _      _             _
  / __|   |_  /  |_   _|   ___   | |__     (_)    / _` |  __| |   __ _    | |_    __ _
  \__ \    / /     | |    |___|  | '_ \    | |    \__, | / _` |  / _` |   |  _|  / _` |
  |___/   /___|   _|_|_   _____  |_.__/   _|_|_   |___/  \__,_|  \__,_|   _\__|  \__,_|
_|"""""|_|"""""|_|"""""|_|     |_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|
"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'

  • ...

.file/.doc/SZT-bigdata-2.png


1-cn.java666.sztcommon.util.SZTData
2-cn.java666.etlflink.app.Jsons2Redis
3-cn.java666.etlspringboot.controller.RedisController#get
4-cn.java666.etlflink.app.Redis2ES
5-cn.java666.etlflink.app.Redis2Csv
6-Hive sql 
7-Saprk 
8-HUE  Hive 
9-cn.java666.etlflink.app.Redis2HBase
1014-cn.java666.szthbase.controller.KafkaListen#sink2Hbase
11-cn.java666.etlflink.app.Redis2HBase
12-CDH HDFS+HUE+Hbase+Hive 
13-cn.java666.etlflink.app.Redis2Kafka
15-cn.java666.sztflink.realtime.Kafka2MyCH
16-cn.java666.sztflink.realtime.sink.MyClickhouseSinkFun



+ + ()

  • Java-1.8/Scala-2.11
  • Flink-1.10ETL
  • Redis-3.2 SSDBWin10|CentOS7|Docker Redis-3.2 CentOS REPL yum 3.2
  • Kafka-2.1 CPkafka-eagle-1.4.5Ksql zk Kafka
    • KafkaOffsetMonitor
    • Kafka Manager CMAK Kafka 0.11 Kafka 2.4
    • Kafka
  • Zookeeper-3.4.5 ID
  • CDH-6.2
  • Docker-19 docker
  • SpringBoot-2.13 JAVA
  • knife4j-2.0 swagger-bootstrap-uiREST API
  • Elasticsearch-7
  • Kibana-7.4ELK
  • ClickHouse nginx clickhouse PB
  • MongoDB-4.0 Json
  • Spark-2.3 spark Flink
  • Hive-2.1Hadoop OLAP HQL Mysql
  • Impala-3.2 hive sql impala hive 80
  • HBase-2.1 + PhoenixHadoop HBase rowkey hbase
  • Kylin-2.5
  • HUE-4.3CDH hive + impala hdfs oozie
  • DataX FlinkX Flink
  • Oozie-5.1 UI HUE
  • Sqoop-1.4 Mysql HDFS
  • Mysql-5.7 SQLMysql 8.0 MariaDB Mysql
  • Hadoop3.0HDFS+YarnHDFS Yarn hadoop MR
  • DataV
  • ...

Apache CDH

  • Win10 VMware + Win10 VMware + CentOS7 SSD + HDFS


kafka


javascala IDEAVMware CDH


1- appKey

https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601

2-

2.1- cn.java666.etlspringboot.source.SZTData#saveData /tmp/szt-data/szt-data-page.jsons 13371000


2.2- cn.java666.etlflink.sink.RedisSinkPageJson#main etl redis redis 1337


2.3- redis redis-cli hget szt:pageJson 1

dbeaver


2.4- cn.java666.etlspringboot.EtlSApp#main knife4j REST API


2.5- cn.java666.etlflink.source.MyRedisSourceFun#run 133.7 9stationcar_no

{
	"deal_date": "2018-08-31 21:15:55",
	"close_date": "2018-09-01 00:00:00",
	"card_no": "CBHGDEEJB",
	"deal_value": "0",
	"deal_type": "",
	"company_name": "",
	"car_no": "IGT-104",
	"station": "",
	"conn_mark": "0",
	"deal_money": "0",
	"equ_no": "263032104"
}
{
	"deal_date": "2018-09-01 05:24:22",
	"close_date": "2018-09-01 00:00:00",
	"card_no": "HHAAABGEH",
	"deal_value": "0",
	"deal_type": "",
	"company_name": "",
	"conn_mark": "0",
	"deal_money": "0",
	"equ_no": "268005140"
}

2.6- cn.java666.etlflink.app.Redis2Kafka#main kafkatopic-flink-szt-all 1337000 topic-flink-szt 1266039


2.7- kafka-eagle topic

ksql select * from "topic-flink-szt" where "partition" in (0) limit 1000


2.8- cn.java666.etlflink.app.Redis2Csv#main flink sink csv


2.9- cn.java666.etlflink.app.Redis2ES#main ES

ES

ES



2018-09-01 kibana 2018-09-01 00:00:00.000~2018-09-01 23:59:59.999

1266039 2018-09-01 1229180

2018-09-01 6~12 kibana

ETL

1337000 1266039 ES szt-data

1266039 1227234 2018-09-01

122 X 2

ES

  • ES kibana
    index
{
  "properties": {
	"deal_date": {
	  "format": "yyyy-MM-dd HH:mm:ss",
	  "type": "date"
	}
  }
}  

ES 0 ES 0 kibana UTC kibana


  • ES json json
  • ES bean fastjson Gson

TIPS

  • Gson fastjsonGson fastjson

2.10- ES

J AA != 0 BCDEFGHIJ K


2.11- cn.java666.sztcommon.util.ParseCardNo#parse cn.java666.etlspringboot.controller.CardController#get REST API


3-

3.1-

---> ---> --->

3.2-

ODSDWDDWSADS

  • ODS
ods/ods_szt_data/day=2018-09-01/   
# szt_szt_page/day=2018-09-01/  
  • DWD
    dim_ fact_
dwd_fact_szt_in_detail      
dwd_fact_szt_out_detail     
dwd_fact_szt_in_out_detail  
  • DWS
dws_card_record_day_wide  
  • ADS
       
	ads_in_station_day_top
       
	ads_out_station_day_top
       
	ads_in_out_station_day_top
       
	ads_card_deal_day_top  
      
	ads_line_send_passengers_day_top  
        
	ads_stations_send_passengers_day_top
      
	ads_line_single_ride_average_time_day_top
     
	ads_all_passengers_single_ride_spend_time_average
      
	ads_passenger_spend_time_day_top
 
	  		ads_station_in_equ_num_top
	    		ads_station_out_equ_num_top
 
	 		ads_line_in_equ_num_top.png
	 		ads_line_out_equ_num_top
    
	ads_station_deal_day_top
    
	ads_line_deal_day_top
   
	ads_conn_ratio_day_top
 9.5       
	ads_line_sale_ratio_top
 	
	ads_conn_spend_time_top
    
	ads_on_line_min_top

3.3-

hdfs hive /warehouse
hue hue hue hue hive sql szt
ods dwd dws ads

/warehouse/szt.db/ods/
szt-etl-data.csv szt-etl-data_2018-09-01.csv szt-page.jsons

hdfs dfs -ls -h hdfs://cdh231:8020/warehouse/szt.db/ods/

HUE sql/hive.sql HQL .....

IDEA Database idea cdh hive https://github.com/timveil/hive-jdbc-uber-jar/releases

DBeaver Sqlyognavicatheidisqlworkbench debug DBeaver HUE



3.3.1 -

2018-09-01


3.3.2 -

2018-09-01


3.3.3-

**2018-09-01

**


3.3.4-

**2018-09-01 48
**


3.3.5-

2018-09-01


3.3.6-

2018-09-01>>>


3.3.7-

**2018-09-011500s25 11 40 **


3.3.8-

**2018-09-01 1791 s 30 **


3.3.9-

**2018-09-01 17123 4.75 20 **


3.3.10-

2018-09-01


3.3.11-

[email protected][email protected]


3.3.12-

**2018-09-01 4 **


3.3.12-

**2018-09-011 30 **


3.3.13-

** 15.6% 9.42%**


3.3.14-

9.52018-09-01 90.36% 84.3%


3.3.15-


4- SZT-kafka-hbase

SZT-kafka-hbase project for Spring Boot2
spring-boot-starter-hbasespring-data-hadoop-hbase API

hbase-2.1 + springboot-2.1.13 + kafka-2.0 hbase

  • knife4j hbase

  • hbase 10 10

  • hbase rowkey

  • hbase szt hbase

  • hbase SZT-kafka-hbase

api-debug

hue-hbase

hue-hbase

hbase-shell

scan 'szt:data', {FORMATTER => 'toString',VERSIONS=>10}


  • kafka
    cn.java666.etlflink.app.Redis2Kafka
    SZT-kafka-hbase

hbase 2GB X 3

5- SZT-flink cn.java666.etlflink.app.Json2HBase

redis json hbase redis json kafka flink hbase flink:flink2hbase 1010

hbase bean JSON

val keys = jsonObj.keySet().toList
val size = keys.size()

for (i <- 0 until size) {
	val key = keys.get(i)
	val value = jsonObj.getStr(key)
	putCell(card_no_re, cf, key, value)
}

6- SZT-flink

flink kafka clickhouse


......


TODO:

  • [x] redis pageJson csv
  • [x] kafka
  • [x] elasticsearchkibana
  • [x] ODSDWDDWSADS
  • [x] hive on spark
  • [x] spark on hive spark hive
  • [ ] hbase
  • [-] ~~oozie ~~;
  • [ ] flink
  • [ ] spark
  • [ ] DataV

  • 2022-05-28:

    • fastjson
    • 996apache-2.0
  • 2020-05-25

    • flink flink kafka clickhouse
  • 2020-05-22:

  • 2020-05-14

    • RedisSinkPageJson package cn.java666.etlflink.sink package cn.java666.etlflink.app Jsons2Redisjsonredis
  • 2020-05-01

    • redis json hbase
    • hbase-2.1 + springboot-2.1.13 + kafka-2.0
    • kafka hbase n
  • 2020-04-30

    • hbase-2.1 + springboot-2.1.13 hbase
  • 2020-04-27

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-devtools</artifactId>
	<scope>runtime</scope>
	<optional>true</optional>
</dependency>

#########################  ###################################
#", "
spring.freemarker.cache=false
spring.thymeleaf.cache=false

#
spring.devtools.restart.enabled=true
#livereload
spring.devtools.livereload.enabled=true
#,restart
spring.devtools.restart.additional-paths=src/main/*
#
#spring.devtools.restart.exclude=static/**,public/**
  • 202-04-27:
      • 45932
    • hive
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS  modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS  modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table  INDEX_PARAMS  modify column PARAM_VALUE  varchar(4000) character set utf8;
  • 2020-04-24

  • 2020-04-23

  • 2020-04-22

  • 2020-04-21:

    • SZT-spark-hive spark Hive
    • Debugspark on hive yarn
  • 2020-04-20

    • logo
    • SQL hive 3.1 TEZ hive on spark MR 10
  • 2020-04-19

    • vmware rm -rf /usr/ HDFSKafkaES cdh
    • hive on MR hive on spark
  • 2020-04-18

  • 2020-04-17

    • v0.12;
  • 2020-04-16

    • v0.1
  • 2020-04-15

    • common
    • REST API
    • ES
    • Redis2Csv csv
  • 2020-04-14

    • csv
    • GPL-3
    • ES ,kibana
  • 2020-04-13

    • redis
    • redis REST API
    • flink source redis
    • kafka

github


Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Docker (97,289
Mysql (31,065
Mongodb (29,790
Scala (28,666
Redis (23,839
Spring Boot (15,558
Elasticsearch (11,366
Spark (10,730
Kafka (9,565
Hadoop (5,352
Phoenix Framework (4,862
Zookeeper (3,639
Kibana (3,054
Hive (2,653
Hue (1,904
Hbase (1,604
Flink (1,106
Clickhouse (575
Kylin (7