Parquet Tools Assembly

Parquet-tools assembly and distribution
Alternatives To Parquet Tools Assembly
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Luigi16,8113387118 hours ago79May 04, 2023119apache-2.0Python
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Hrider132
6 years ago10apache-2.0Java
hbase UI tool
Druid Hadoop Inputformat9
7 years agoapache-2.0Java
Hadoop InputFormat for http://druid.io/
Fbus4
13 years ago6Java
A file busing system for integration with Hadoop
Wukonggpu4
6 years agoapache-2.0C++
Fast and Concurrent Distributed RDF Queries using RDMA-assisted GPU Graph Exploration
Hrider3
5 years agoapache-2.0Java
HBase GUI 工具,支持HBase 2.x。另外导出excel时,自动修改表中数据的 c:exportFlag字段的值改为1,便于导出操作。Forked From NiceSystems/hrider(https://github.com/NiceSystems/hrider )。
Hbase Rule2
a year agoJava
JUnit rule which provides an embedded HBase server.
Parquet Tools Assembly2
7 years agoapache-2.0Shell
Parquet-tools assembly and distribution
Alternatives To Parquet Tools Assembly
Select To Compare


Alternative Project Comparisons
Readme

parquet-tools-assembly

Parquet-tools assembly and distribution

Repository provides script to clone apache/parquet-mr and build distribution for submodule parquet-tools, command-line utility to read Parquet files.

Structure

Repo has following structure:

  • bin binaries copied from parquet-tools with some minor changes, e.g. parquet-cat
  • lib contains jar files that will be included in distribution, acts as staging folder
  • sbin scripts to build distribution
  • staging staging folder for cloned repositories (folder for each tag)

Assembly

Script creates tar.gz and zip distributions with or without Hadoop dependency. Name parquet-tools-dist-TAG-VERSION.tar.gz contains provided tag, VERSION is a version of this repository, not parquet-tools or Hadoop. Suffix -dh is included in name when client dependency is included. Some versions of parquet-tools have already been prepared, see releases for more info.

Requirements

To build parquet-tools you must have python, git and mvn installed, though script checks if those are available. Currently project works and tested only for Python 2.7.x, but it should be trivial to extend it for Python 3.x.

Usage

You can build distribution with or without Hadoop dependency (see parquet-tools for more info), meaning whether or not client library will be included as part of uber-jar.

cd parquet-tools-assembly && sbin/make-distribution.sh --tag=XYZ

With Hadoop dependency:

cd parquet-tools-assembly && sbin/make-distribution.sh --tag=XYZ --client=true

where:

  • --tag - parquet-mr repository tag to use, e.g. apache-parquet-1.8.1. See all available tags.
  • --client - whether or not client library should be included. If true, distribution name will include -dh suffix.

Once archives are built, unarchive them into wanted directory:

tar zxf parquet-tools-dist.tar.gz
cd parquet-tools-dist

And use scripts:

bin/parquet-schema /path/to/parquet-file
bin/parquet-head /path/to/parquet-file
bin/parquet-cat /path/to/parquet-file
Popular Dependency Projects
Popular Hadoop Projects
Popular Build Tools Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Shell
Assembly
Dependencies
Hadoop
Client Library
Parquet