TensorBase is a modern engineering effort for building a high performance and cost-effective bigdata warehouse in an open source culture.
The core works and practices of TensorBase will be presented with the Rust context in mind. And more, current progress (a.k.a. TensorBase 2020.11) will be shown as possible.
Let's meet in the talk, all data nerds!
TensorBase has released milestone 0 as the developer previewing release for inviting more interesting contributors to join in.
Current development is active in background and this repo is not synced regularly because it is planned to introduce different editions from m1 and there is no external contribution now.
It is demonstrated to query ~1.5 billion rows of NYC taxi dataset in ~100 milliseconds for total response time in its milestone 0. This is 6x faster than that of ClickHouse.
Aggregation results in Base's baseshell (95 - 118ms)
Aggregation result in ClickHouse client (0.642s or 642ms)
TensorBase is written from scratch in Rust language and its friend C. Here with comfortable languages, minimized dependencies and from-scratch architectings, you now can use the most familiar tools to challenge the most difficult problems.
If you like this project, please give a star to help it more grown.
The coming m1 will be the first milestone which is targeted to provide a production-friendly release.
A speicial edition will be shown to the interesting personals and oraganizations. Subscribe to TensorBase's Newsletter here to get the first time information if you are interesting.
TensorBase is developed for Linux, but should work for any docker enabled system (for example, Windows 10 WSL2).
TensorBase follows the idiomatic development flow of Rust. Make sure your Rust nightly toolchain works. If you only try to run, just play with Quick Start. Thanks to the strong rust ecosystem, it is not necessary to run build first. Please check prerequisites before running from soruce.
This mode is portable (but has some platform dependent resource and performance effects).
Try like this:
docker pull tensorbase/tensorbase:m0 docker run -ti tensorbase/tensorbase:m0 /bin/bash >> /base/baseshell
then run a sum agg sql with the preshipped data (1MB):
select sum(trip_id) from nyc_taxi
Now TensorBase provides two binaries to enable the following workflow:
baseops: cli/workbench for devops, including kinds of processes/roles starts/stop
baseshell: query client (now is a monolithic to include everything), m0 only supports query with single integer column type sum aggregation intentionally.
cargo run --bin baseops table create -c samples/nyc_taxi_create_table_sample.sql
Base explicitly separates write/mutation behaviors into the cli baseops. the provided sql file is just an ansi-SQL DDL script, which can be seen in the samples directory of repo.
cargo run --release --bin baseops import csv -c /jian/nyc-taxi.csv -i nyc_taxi:trip_id,pickup_datetime,passenger_count:0,2,10:51
Base import tool uniquely supports to import csv partially into storage like above. Use help to get more infos.
cargo run --release --bin baseshell
Dev Docs provides a little more explanation for why above commands work.
Welcome to join us, you data nerds!
Here are on-going efforts. If you are interested in any effort, do not hesitate to join us.
|base ir (intermediate representation)|
Feel free to feedback any problem via issues.
Mailing list: just open an issue with label [type/discuss].
Thanks for your contributions!
TensorBase is distributed under the terms of the Apache License (Version 2.0).
See LICENSE for details.