Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Xgboost | 24,219 | 796 | 574 | 3 hours ago | 65 | May 09, 2022 | 363 | apache-2.0 | C++ | |
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow | ||||||||||
Alink | 3,343 | 1 | 2 months ago | 16 | September 08, 2022 | 48 | apache-2.0 | Java | ||
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. | ||||||||||
Ai_tutorial | 1,440 | an hour ago | ||||||||
精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总 | ||||||||||
Featran | 465 | 1 | 11 | a month ago | 34 | December 04, 2019 | 11 | apache-2.0 | Scala | |
A Scala feature transformation library for data science and machine learning | ||||||||||
Cascading | 321 | 5 years ago | n,ull | other | Java | |||||
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches. | ||||||||||
Flink Ml | 243 | 14 days ago | 17 | July 01, 2022 | 4 | apache-2.0 | Java | |||
Machine learning library of Apache Flink | ||||||||||
Bigdata Playground | 154 | 4 years ago | 4 | apache-2.0 | TypeScript | |||||
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL | ||||||||||
Bigdata | 142 | 4 years ago | 20 | Shell | ||||||
hadoop,hbase,storm,spark,etc.. | ||||||||||
Toolbox | 104 | 3 years ago | 46 | apache-2.0 | Java | |||||
A Java Toolbox for Scalable Probabilistic Machine Learning | ||||||||||
Cloud Bigdata Book | 53 | 2 years ago | 86 | C | ||||||
write book |
AlinkFlink,PAI,Alink
pyalink
Alink Flink 1.13 pyalink-flink-***
Flink pyalink-flink-1.12
, pyalink-flink-1.11
, pyalink-flink-1.10
pyalink-flink-1.9
1.6.1
pip install pyalink``pip install pyalink-flink-1.12``pip install pyalink-flink-1.11``pip install pyalink-flink-1.10
pip install pyalink-flink-1.9
pyalink
pyalink-flink-***
pyalink
pyalink-flink-***``pip uninstall pyalink
pip uninstall pyalink-flink-***
pip
pip whl pip
pip
pip3
Anaconda AnacondaJupyter Notebook PyAlink
jupyter notebook
Python 3 Notebookfrom pyalink.alink import *
useLocalEnv(parallism, flinkHome=None, config=None)
parallism
flinkHome
flink config
Flink
JVM listening on ***
source = CsvSourceBatchOp()\
.setSchemaStr("sepal_length double, sepal_width double, petal_length double, petal_width double, category string")\
.setFilePath("https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/iris.csv")
res = source.select(["sepal_length", "sepal_width"])
df = res.collectToDataframe()
print(df)
PyAlink Java API setXXX
link/linkTo/linkFrom
Jupyter Notebook
print/collectToDataframe/collectToDataframes
BatchOperator.execute()
StreamOperator.execute()
String URL = "https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/iris.csv";
String SCHEMA_STR = "sepal_length double, sepal_width double, petal_length double, petal_width double, category string";
BatchOperator data = new CsvSourceBatchOp()
.setFilePath(URL)
.setSchemaStr(SCHEMA_STR);
VectorAssembler va = new VectorAssembler()
.setSelectedCols(new String[]{"sepal_length", "sepal_width", "petal_length", "petal_width"})
.setOutputCol("features");
KMeans kMeans = new KMeans().setVectorCol("features").setK(3)
.setPredictionCol("prediction_result")
.setPredictionDetailCol("prediction_detail")
.setReservedCols("category")
.setMaxIter(100);
Pipeline pipeline = new Pipeline().add(va).add(kMeans);
pipeline.fit(data).transform(data).print();
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.13_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.13.0</version>
</dependency>
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.12_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.12.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.12.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.12.1</version>
</dependency>
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.11_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.11.0</version>
</dependency>
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.10_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.9_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.9.0</version>
</dependency>
wget https://archive.apache.org/dist/flink/flink-1.13.0/flink-1.13.0-bin-scala_2.11.tgz
tar -xf flink-1.13.0-bin-scala_2.11.tgz && cd flink-1.13.0
./bin/start-cluster.sh
git clone https://github.com/alibaba/Alink.git
# add <scope>provided</scope> in pom.xml of alink_examples.
cd Alink && mvn -Dmaven.test.skip=true clean package shade:shade
./bin/flink run -p 1 -c com.alibaba.alink.ALSExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar
# ./bin/flink run -p 1 -c com.alibaba.alink.GBDTExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar
# ./bin/flink run -p 1 -c com.alibaba.alink.KMeansExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar