Doris Spark Connector

Spark Connector for Apache Doris
Alternatives To Doris Spark Connector
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Spark35,3362,39488215 hours ago46May 09, 2021214apache-2.0Scala
Apache Spark - A unified analytics engine for large-scale data processing
Cookbook11,362
3 months ago108apache-2.0
The Data Engineering Cookbook
God Of Bigdata7,992
2 days ago2
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Zeppelin5,98132235 days ago2June 21, 2017134apache-2.0Java
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Sparkinternals4,665
a year ago27
Notes talking about the design and implementation of Apache Spark
Bigdl4,17910a day ago16April 19, 2021718apache-2.0Jupyter Notebook
Fast, distributed, secure AI for Big Data
Iceberg4,063
14 hours ago4May 23, 20221,308apache-2.0Java
Apache Iceberg
Tensorflowonspark3,849
518 days ago32April 21, 202211apache-2.0Python
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Koalas3,2281123 months ago47October 19, 2021109apache-2.0Python
Koalas: pandas API on Apache Spark
Spark Nlp3,1602215 hours ago90March 05, 202135apache-2.0Scala
State of the Art Natural Language Processing
Alternatives To Doris Spark Connector
Select To Compare


Alternative Project Comparisons
Readme

Spark Connector for Apache Doris

License Join the Doris Community at Slack

Spark Doris Connector

More information about compilation and usage, please visit Spark Doris Connector

License

Apache License, Version 2.0

QuickStart

  1. download and compile Spark Doris Connector from apache/doris-spark-connector, we suggest compile Spark Doris Connector by Doris offfcial image。
$ docker pull apache/doris:build-env-ldb-toolchain-latest
  1. the result of compile jar is like:spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar

  2. download spark for https://spark.apache.org/downloads.html .if in china there have a good choice of tencent link https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/

#download
wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
#decompression
tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz
  1. config Spark environment
vim /etc/profile
export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
  1. copy spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar to spark jars directory。
cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar  $SPARK_HOME/jars
  1. created doris database and table。

    create database mongo_doris;
    use mongo_doris;
    CREATE TABLE data_sync_test_simple
     (
             _id VARCHAR(32) DEFAULT '',
             id VARCHAR(32) DEFAULT '',
             user_name VARCHAR(32) DEFAULT '',
             member_list VARCHAR(32) DEFAULT ''
     )
     DUPLICATE KEY(_id)
     DISTRIBUTED BY HASH(_id) BUCKETS 10
     PROPERTIES("replication_num" = "1");
    INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');
    
    1. Input this coed in spark-shell.
import org.apache.doris.spark._
val dorisSparkRDD = sc.dorisRDD(
  tableIdentifier = Some("mongo_doris.data_sync_test"),
  cfg = Some(Map(
    "doris.fenodes" -> "127.0.0.1:8030",
    "doris.request.auth.user" -> "root",
    "doris.request.auth.password" -> ""
  ))
)
dorisSparkRDD.collect()
  • mongo_doris:doris database name
  • data_sync_test:doris table mame.
  • doris.fenodes:doris FE IP:http_port
  • doris.request.auth.user:doris user name.
  • doris.request.auth.password:doris password
  1. if Spark is Cluster model,upload Jar to HDFS,add doris-spark-connector jar HDFS URL in spark.yarn.jars.
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar

Link:https://github.com/apache/doris/discussions/9486

  1. in pyspark,input this code in pyspark shell command.
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "mongo_doris.data_sync_test")
.option("doris.fenodes", "127.0.0.1:8030")
.option("user", "root")
.option("password", "")
.load()
# show 5 lines data 
dorisSparkDF.show(5)

Report issues or submit pull request

If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.

Contact Us

Contact us through the following mailing list.

Name Scope
[email protected] Development-related discussions Subscribe Unsubscribe Archives

Links

Popular Apache Projects
Popular Spark Projects
Popular Web Servers Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Apache
Spark
Dbms
Olap