Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Spark | 35,336 | 2,394 | 882 | 15 hours ago | 46 | May 09, 2021 | 214 | apache-2.0 | Scala | |
Apache Spark - A unified analytics engine for large-scale data processing | ||||||||||
Cookbook | 11,362 | 3 months ago | 108 | apache-2.0 | ||||||
The Data Engineering Cookbook | ||||||||||
God Of Bigdata | 7,992 | 2 days ago | 2 | |||||||
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... | ||||||||||
Zeppelin | 5,981 | 32 | 23 | 5 days ago | 2 | June 21, 2017 | 134 | apache-2.0 | Java | |
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. | ||||||||||
Sparkinternals | 4,665 | a year ago | 27 | |||||||
Notes talking about the design and implementation of Apache Spark | ||||||||||
Bigdl | 4,179 | 10 | a day ago | 16 | April 19, 2021 | 718 | apache-2.0 | Jupyter Notebook | ||
Fast, distributed, secure AI for Big Data | ||||||||||
Iceberg | 4,063 | 14 hours ago | 4 | May 23, 2022 | 1,308 | apache-2.0 | Java | |||
Apache Iceberg | ||||||||||
Tensorflowonspark | 3,849 | 5 | 18 days ago | 32 | April 21, 2022 | 11 | apache-2.0 | Python | ||
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters. | ||||||||||
Koalas | 3,228 | 1 | 12 | 3 months ago | 47 | October 19, 2021 | 109 | apache-2.0 | Python | |
Koalas: pandas API on Apache Spark | ||||||||||
Spark Nlp | 3,160 | 2 | 2 | 15 hours ago | 90 | March 05, 2021 | 35 | apache-2.0 | Scala | |
State of the Art Natural Language Processing |
More information about compilation and usage, please visit Spark Doris Connector
$ docker pull apache/doris:build-env-ldb-toolchain-latest
the result of compile jar is like:spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar
download spark for https://spark.apache.org/downloads.html .if in china there have a good choice of tencent link https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/
#download
wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
#decompression
tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz
vim /etc/profile
export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar $SPARK_HOME/jars
created doris database and table。
create database mongo_doris;
use mongo_doris;
CREATE TABLE data_sync_test_simple
(
_id VARCHAR(32) DEFAULT '',
id VARCHAR(32) DEFAULT '',
user_name VARCHAR(32) DEFAULT '',
member_list VARCHAR(32) DEFAULT ''
)
DUPLICATE KEY(_id)
DISTRIBUTED BY HASH(_id) BUCKETS 10
PROPERTIES("replication_num" = "1");
INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');
import org.apache.doris.spark._
val dorisSparkRDD = sc.dorisRDD(
tableIdentifier = Some("mongo_doris.data_sync_test"),
cfg = Some(Map(
"doris.fenodes" -> "127.0.0.1:8030",
"doris.request.auth.user" -> "root",
"doris.request.auth.password" -> ""
))
)
dorisSparkRDD.collect()
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
Link:https://github.com/apache/doris/discussions/9486
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "mongo_doris.data_sync_test")
.option("doris.fenodes", "127.0.0.1:8030")
.option("user", "root")
.option("password", "")
.load()
# show 5 lines data
dorisSparkDF.show(5)
If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.
Contact us through the following mailing list.
Name | Scope | |||
---|---|---|---|---|
[email protected] | Development-related discussions | Subscribe | Unsubscribe | Archives |