For data scientists and data engineers, DataBolt is a collection of python-based libraries and products to reduce the time it takes to get your data ready for analysis and collaborate with others.
Majority of time in data science is spent on tedious tasks unrelated to data analysis. DataBolt simplifies those tasks so you can experience up to 10x productivity gains.
The libraries are modularized so you can use them individually but they work well together to improve your entire data workflow.
Easily manage data workflows including complex dependencies and parameters. With d6tflow you can easily chain together complex data flows and intelligently execute them. You can quickly load input and output data for each task. It makes your workflow very clear and intuitive.
Learn more at https://github.com/d6t/d6tflow
d6tpipe is a python library which makes it easier to exchange data. It's like git for data! But better because you can include it in your data science code.
Learn more at https://github.com/d6t/d6tpipe
Quickly ingest raw files. Works for XLS, CSV, TXT which can be exported to CSV, Parquet, SQL and Pandas. d6tstack solves many performance and other problems typically encountered when ingesting raw files.
Learn more at https://github.com/d6t/d6tstack
Easily join different datasets without writing custom code using fuzzy matches. Does similarity joins on strings, dates and numbers. For example you can quickly join similar but not identical stock tickers, addresses, names and dates without manual processing.
Learn more at https://github.com/d6t/d6tjoin
We encourage you to join the Databolt blog to get updates and tips+tricks http://blog.databolt.tech
For questions or comments contact: support-at-databolt.tech