Parquet Tools Assembly

Parquet-tools assembly and distribution
Alternatives To Parquet Tools Assembly
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Luigi16,8113387118 hours ago79May 04, 2023119apache-2.0Python
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
6 years ago10apache-2.0Java
hbase UI tool
Druid Hadoop Inputformat9
7 years agoapache-2.0Java
Hadoop InputFormat for
13 years ago6Java
A file busing system for integration with Hadoop
6 years agoapache-2.0C++
Fast and Concurrent Distributed RDF Queries using RDMA-assisted GPU Graph Exploration
5 years agoapache-2.0Java
HBase GUI 工具,支持HBase 2.x。另外导出excel时,自动修改表中数据的 c:exportFlag字段的值改为1,便于导出操作。Forked From NiceSystems/hrider( )。
Hbase Rule2
a year agoJava
JUnit rule which provides an embedded HBase server.
Parquet Tools Assembly2
7 years agoapache-2.0Shell
Parquet-tools assembly and distribution
Alternatives To Parquet Tools Assembly
Select To Compare

Alternative Project Comparisons


Parquet-tools assembly and distribution

Repository provides script to clone apache/parquet-mr and build distribution for submodule parquet-tools, command-line utility to read Parquet files.


Repo has following structure:

  • bin binaries copied from parquet-tools with some minor changes, e.g. parquet-cat
  • lib contains jar files that will be included in distribution, acts as staging folder
  • sbin scripts to build distribution
  • staging staging folder for cloned repositories (folder for each tag)


Script creates tar.gz and zip distributions with or without Hadoop dependency. Name parquet-tools-dist-TAG-VERSION.tar.gz contains provided tag, VERSION is a version of this repository, not parquet-tools or Hadoop. Suffix -dh is included in name when client dependency is included. Some versions of parquet-tools have already been prepared, see releases for more info.


To build parquet-tools you must have python, git and mvn installed, though script checks if those are available. Currently project works and tested only for Python 2.7.x, but it should be trivial to extend it for Python 3.x.


You can build distribution with or without Hadoop dependency (see parquet-tools for more info), meaning whether or not client library will be included as part of uber-jar.

cd parquet-tools-assembly && sbin/ --tag=XYZ

With Hadoop dependency:

cd parquet-tools-assembly && sbin/ --tag=XYZ --client=true


  • --tag - parquet-mr repository tag to use, e.g. apache-parquet-1.8.1. See all available tags.
  • --client - whether or not client library should be included. If true, distribution name will include -dh suffix.

Once archives are built, unarchive them into wanted directory:

tar zxf parquet-tools-dist.tar.gz
cd parquet-tools-dist

And use scripts:

bin/parquet-schema /path/to/parquet-file
bin/parquet-head /path/to/parquet-file
bin/parquet-cat /path/to/parquet-file
Popular Dependency Projects
Popular Hadoop Projects
Popular Build Tools Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Client Library