Awesome Open Source
Awesome Open Source
Sponsorship

What is TensorBase

TensorBase is a modern engineering effort for building a high performance and cost-effective bigdata warehouse in an open source culture.

News

TensorBase joins Rust Fest Global 2020!

The core works and practices of TensorBase will be presented with the Rust context in mind. And more, current progress (a.k.a. TensorBase 2020.11) will be shown as possible.

Let's meet in the talk, all data nerds!

Status

TensorBase has released milestone 0 as the developer previewing release for inviting more interesting contributors to join in.

Current development is active in background and this repo is not synced regularly because it is planned to introduce different editions from m1 and there is no external contribution now.

  • TensorBase is an architectural performance design.

It is demonstrated to query ~1.5 billion rows of NYC taxi dataset in ~100 milliseconds for total response time in its milestone 0. This is 6x faster than that of ClickHouse.

Aggregation results in Base's baseshell (95 - 118ms)

Aggregation result in ClickHouse client (0.642s or 642ms)

  • TensorBase is a highly hackable system

TensorBase is written from scratch in Rust language and its friend C. Here with comfortable languages, minimized dependencies and from-scratch architectings, you now can use the most familiar tools to challenge the most difficult problems.

If you like this project, please give a star to help it more grown.

Roadmap

The coming m1 will be the first milestone which is targeted to provide a production-friendly release.

A speicial edition will be shown to the interesting personals and oraganizations. Subscribe to TensorBase's Newsletter here to get the first time information if you are interesting.

Try TensorBase

TensorBase is developed for Linux, but should work for any docker enabled system (for example, Windows 10 WSL2).

  • from source

TensorBase follows the idiomatic development flow of Rust. Make sure your Rust nightly toolchain works. If you only try to run, just play with Quick Start. Thanks to the strong rust ecosystem, it is not necessary to run build first. Please check prerequisites before running from soruce.

  • docker

This mode is portable (but has some platform dependent resource and performance effects).

Try like this:

docker pull tensorbase/tensorbase:m0
docker run -ti tensorbase/tensorbase:m0 /bin/bash
>> /base/baseshell

then run a sum agg sql with the preshipped data (1MB):

select sum(trip_id) from nyc_taxi

Quick Start

Now TensorBase provides two binaries to enable the following workflow:

  • baseops: cli/workbench for devops, including kinds of processes/roles starts/stop

  • baseshell: query client (now is a monolithic to include everything), m0 only supports query with single integer column type sum aggregation intentionally.

  1. run baseops to create a table definition in Base
cargo run --bin baseops table create -c samples/nyc_taxi_create_table_sample.sql

Base explicitly separates write/mutation behaviors into the cli baseops. the provided sql file is just an ansi-SQL DDL script, which can be seen in the samples directory of repo.

  1. run baseops to import nyc_taxi csv dataset into Base
cargo run --release --bin baseops import csv -c /jian/nyc-taxi.csv -i nyc_taxi:trip_id,pickup_datetime,passenger_count:0,2,10:51

Base import tool uniquely supports to import csv partially into storage like above. Use help to get more infos.

  1. run baseshell to issue query against Base
cargo run --release --bin baseshell

Dev Docs provides a little more explanation for why above commands work.

Engineering Efforts

Welcome to join us, you data nerds!

Here are on-going efforts. If you are interested in any effort, do not hesitate to join us.

subsystem component priority status
storage*
data layout
data read
data write
metadata
runtime
base language(sql)
parsing
base ir (intermediate representation)
codegen
jit compiler*
kernel execution
infra
common
lib
testing
bench
doc
project
client
baseshell
baseops
visualization

Communications

Feel free to feedback any problem via issues.

Mailing list: just open an issue with label [type/discuss].

Slack Channel

Telegram

Contributing

Thanks for your contributions!

Dev Docs

License

TensorBase is distributed under the terms of the Apache License (Version 2.0).

See LICENSE for details.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
rust (3,976
database (1,124
data (355
analytics (287
high-performance (179
infrastructure (113
rust-lang (97
bigdata (86
engineering (71
modern (53

Find Open Source By Browsing 7,000 Topics Across 59 Categories