Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Q | 9,801 | 3 months ago | 1 | February 27, 2018 | 98 | gpl-3.0 | Python | |||
q - Run SQL directly on delimited files and multi-file sqlite databases | ||||||||||
Textql | 8,830 | 10 months ago | 1 | February 27, 2018 | 37 | mit | Go | |||
Execute SQL against structured text like CSV or TSV | ||||||||||
Datasette | 7,806 | 35 | 130 | 7 days ago | 120 | May 02, 2022 | 511 | apache-2.0 | Python | |
An open source multi-tool for exploring and publishing data | ||||||||||
Countries States Cities Database | 4,908 | 9 days ago | 41 | odbl-1.0 | PHP | |||||
🌍 Discover our global repository of countries, states, and cities! 🏙️ Get comprehensive data in JSON, SQL, XML, YAML, and CSV formats. Access ISO2, ISO3 codes, country code, capital, native language, timezones (for countries), and more. #countries #states #cities | ||||||||||
Octosql | 4,504 | 2 | 2 days ago | 32 | September 04, 2022 | 39 | mpl-2.0 | Go | ||
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL. | ||||||||||
Dsq | 3,126 | 3 months ago | 1 | March 02, 2022 | 19 | other | Go | |||
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. | ||||||||||
Ethereum Etl | 2,547 | 3 days ago | 37 | May 24, 2022 | 132 | mit | Python | |||
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ | ||||||||||
Province City China | 1,874 | 9 | 5 months ago | 4 | March 22, 2022 | 15 | mit | JavaScript | ||
🇨🇳最全最新中国【省、市、区县、乡镇街道】json,csv,sql数据 | ||||||||||
Sqliteviz | 1,715 | 15 days ago | 28 | apache-2.0 | JavaScript | |||||
Instant offline SQL-powered data visualisation in your browser | ||||||||||
Trdsql | 1,495 | 2 | a month ago | 46 | January 19, 2022 | 13 | mit | Go | ||
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats. |
csvz
is the hot new open database standard that is taking the entire technological world by storm.
A csvz
file is literally just a bunch of csv
files, in a zip file, that has been renamed to have a ".csvz" file extension.
Are you using
csvz
? Why not?csvz
is the brave technology that unites the worlds of data science, sql and no-sql. Is it no-sql's answer to the rdbms? Or is it the rdbms answer to no-sql? You decide.
csvz-0
A csvz file is literally just a bunch of csv
files, in a zip file with a file name that ends with ".csvz"csvz-meta-tables
A csvz file can contain a file called tables.csv
describing the contents of the filecsvz-meta-columns
A csvz file can contain a file called columns.csv
csvz-meta-relations
A csvz file can contain a file called relations.csv
csvz-meta-csv
A csvz file can contain a file called csv.csv
csvz-meta-per-file
The ability to include individual meta-files per csv filecsvz-compliant
Tools and LibrariesThe csvz
specification is broken into meaningful fragments.
Files can call themselves csvz-compliant
if they only comply with the first fragment of the specification, csvz-0
.
They can also indicate other fragments of the specification that they have implemented, such as csvz-meta-tables
, csv-meta-relations
etc.
csvz-0
A csvz file is literally just a bunch of csv
files, in a zip file with a file name that ends with ".csvz"A csvz file is compliant with csvz-0
if it is literally just a bunch of csv
files, in a zip file, that has been renamed to have a ".csvz" file extension.
(Note that each fragment has a fragment identifier written at the beginning of the fragment. For example this is csvz-0
and the next fragment is csvz-meta-tables
. Fragments are optional, but it is good to know which fragments you do or do not comply with.)
The csv
files themselves should be parseable with most csv reading software.
(Anywhere that this spec refers to "a csv file" it means a file that complies with RFC 4180
or a compatible dialect as described by the CSV on the Web Working Group, unless a stricter definition is explicitly given.)
(Anywhere that the csvz specification
refers to "this spec" it means the csvz specification
.)
csvz-meta-tables
A csvz
file can contain a file called tables.csv
describing the contents of the fileMetadata about the contents of the csvz file is contained in a directory called "_meta". The file tables.csv
, if present, is inside this directory.
(Assume that the csvz reserves the right to create other .csv files under the _meta folder, and to create more folders under it. Details appear in subsequent spec fragments.)
The file tables.csv
contains metadata about all of the csv files included in the csvz
file.
(The file tables.csv
is a csv file.)
(Anywhere that this spec refers to a file with a name that ends with ".csv" it means the file is a "csv file", as described in csvz-0
.)
The file tables.csv
meets the following description:
csvz
filebytes
- the size of the file in bytesrows
- the number of rows in the filecolumns
- the number of columns in the filedescription
- a description of the filepublished
- the date the data in the file was first publishedsource
- information about the source of the data in the filehas-column-names
- a true/false
value indicating if the file has a header row containing column namesskip-rows
- How many rows need to be skipped, before the data begins? (Rarely need to specify this, but when you need it, you need it!)csv.csv
, then tables.csv
has precedence over csv.csv
, for the file it describes. For example csv.csv
may indicate that all files have header rows, but a specific file may not, and this would be indicated in tables.csv
)tables.csv
may also describe itself. See Russell. Note that bytes
(for example) might cause a paradox.(The word "must" is used for parts of the specification that are required for a file or tool to claim compliance with the standards described in this spec. The word "may" is used for parts which are not required; Optional sections may be covered in more detail, as required elements in a subsequent fragment of this spec.)
(Whenever suggestions are provided, they are not required for conformance with the current spec fragment. These suggestion may be described more fully in later spec fragments, in which they may be required.)
(Expectations around the encoding of true/false
values, and other fundamental data-types
, are not currently defined.)
csvz-meta-columns
A csvz file can contain a file called columns.csv
Metadata about the contents of the csvz file is contained in a directory called "_meta". The file columns.csv
, if present, is inside this directory.
The file columns.csv
contains metadata about all of the columns in all of the csv files included in the csvz
file.
The file columns.csv
meets the following description:
data-type
- the type of the column. (Data-types are not described in this spec fragment, and will be covered in later spec fragments.)nullable
- a true/false
value indicating if the column can be nullmax-length
- a nullable column, that describes the maximum length of the column, in cases where the data-type supports a maximum lengthunique
- a true/false
value indicating if the values in the column should be uniqueprimary-key
- a true/false
value indicating if the column can serve as (part or whole of) the primary key of the table.description
- a description of the columnunits
- a nullable name description of the unit of measureordinal
- the order in which the columns have been written to the file. In cases where there is no header row, or where columns are re-ordered, this can be helpful.published
- the date the data in the file was first publishedsource
- information about the source of the data in the file(The word "should" is used for parts of the specification that are not required, but which will lead to difficulty for users of the data or the tools if they are not complied with.)
csvz-meta-relations
A csvz file can contain a file called relations.csv
Metadata about the contents of the csvz file is contained in a directory called "_meta". The file relations.csv
, if present, is inside this directory.
The file relations.csv
contains metadata about all of the relationships between any of the columns in any of the files in the csvz
file.
The file relations.csv
meets the following description:
csvz
file.csvz-meta-csv
A csvz file can contain a file called csv.csv
(todo: this section is still very much a draft)
Metadata about the rules of the csvz file are contained in a directory called "_meta". The file csv.csv
, if present, is inside this directory.
The file csv.csv
contains metadata about how the csv files in this csvz
file are formatted, from a general csv standards point of view.
(Later spec fragments will give exact definitions for the expected columns and supported columns, their possible values and the meanings of those values.)
But to comply with csvz-meta-csv
the file csv.csv
must:
strict-4180
field separator
- examples comma, tab, semicolon, space, various emojirow separator
- examples CRLF, LF, CR, semicolon, exclamation point, backtickqualifier
- what qualifiers (if any) are used for embedding delimiters. perhaps qualifiers are not used. Can single/double/mixed/other be used?escaping
- Are qualifiers doubled or escaped? (If escaped, escaped with what?)null-values
- how are null-values
represented? e.g. the literal string null
with no quotes? or NULL
, or nil
? Or are empty strings, unquoted, to be treated as NULLs?has-column-names
- a true/false
value indicating if every csv file (other than this one) file has a header row containing column names. (Can be over-ridden by a has-column-names
value in the tables.csv
file, if present.)data-types
can be handled elsewhere, but a limited number of common fundamental data-types
could be most expediently described in csv.csv
, such as:
(todo: See also csvw dialect descriptions)
csvz-meta-per-file
The ability to include individual meta-files per csv fileThis fragment extends all other csvz-meta-*
fragments.
Consider an example where a single csv file, people.csv
inside the csvz follows different standards to the other files.
It's csv conventions could be described in a file: _meta/csv/people.csv
and those would be taken to override the conventions in _meta/csv.csv
Similarly, a file can have its own _meta/tables/{filename}.csv
file, _meta/columns/{filename}.csv
and _meta/relations/{filename}.csv
.
This methods can be assumed to extend for all other _meta/*.csv
files.
A per-file
meta file is assumed to have higher precedence than the files directly contained in _meta/*.csv
.
For example: if _meta/columns.csv
decribed the columns of states.csv
in one way, but _meta/columns/states.csv
described those columns in another way, all details for states.csv
in _meta/columns.csv
should be ignored, and those in _meta/columns/states.csv
used instead. (i.e. they are not combined).
(Note - combining might be more interesting, useful. Would let you build up/inherit attributes. But would also need a way to "erase" a rule, and I can't think of a way to do that so let's stick with "no combining")
(Suggestion for authors of Tooling that reads these files: they may want to provide optional debug information that describes where meta data was sourced from, highlighting situations where precedence rules needed to be applied.)
You can also mix and match _meta/*.csv
with per-file
meta information, without loss of meaning.
For example the table states.csv
may be described in _meta/tables.csv
while it's columns may be described in _meta/columns/states.csv
More meta-*
spec fragments may be needed to describe other meta files.
For example:
indexes
- what indexes can/should be built on the tables (if the data)data-types
- what types are used, how are they encoded (e.g. dates: how? binary data base-64 encoded? etc), what ranges exist for numbers etc.user-defined-types
- how can types be extended?schemas
- consider situations where directories are used to describe separate schemas\databases (or other namespacing concepts)directories
- instead of defining schemas (or other namespaces) perhaps the concept of directories could be directly described, a kind of set of routing rules/conventions. in the directories.csv you might in effect say, the directories in the root directory (other than _meta) are to be treated as "server" names. the next level are to be treated as "share" names... or perhaps you will say, "under "/databases" the directory names in there are treated as "database" names, and the names under that are "schema" names.naming
- perhaps you will define naming conventions, e.g. ways to pull data from names, or use names to know which files can be combined into one logical unit later.deltas
- csv files may hold operations on data, instead of data itself, i.e. details of insert,update,delete,(upsert) operationsconstraints
- what other constraints exist on the datapartitions
- consider situations where a single table is split across multiple files, and or a csvz
itself is split amongst multiple files, includingformulas
- are there calculated columns? what form do the calculations take?csvz-compliant
Tools and LibrariesThe following tools and libraries are able to read, write or process .csvz
files.
Tool | Actions | Compliance | Description |
---|---|---|---|
Sylvan.Data.CsvZip | Create / Read |
csvz-0 csvz-meta-tables csvz-meta-columns
|
Library for programatically creating and reading .csvz files |
Sylvan.Tools.CsvZip | Create |
csvz-0 csvz-meta-tables csvz-meta-columns
|
.NET global tool for creating .csvz files from the commandline |
Packs a set of csv files into a new csvz file, and generates a tables.csv and columns.csv
|
|||
Converts a .csvz file into a .xlsx file, that can be opened by Excel. |
|||
Converts a .csvz file into a .xlsx file, that can be opened by Excel. |
|||
Converts a .xlsx file into a .csvz file (note that not all of Excel's features are respected.) |
|||
Exports a sqlite database into a new .csvz file |
|||
Creates a new sqlite database from a .csvz file |
|||
Exports a mysql database into a new .csvz file |
|||
Creates a new PostgreSQL database from a .csvz file |
|||
Exports a PostgreSQL database into a new .csvz file |
|||
Save a JSON file as a series of csv files and _meta files (ready for zipping) | |||
Load some or all of an unzipped csvz as a single json object (limited filtering ability) | |||
Validates which spec fragments a csvz file complies with |
|||
(More tools...) |
If you know of a csvz
compliant tool, or you have created one (hint hint), a pull request is welcome.
Suggestion: You can use existing csvz
or csv
libraries to build a new type of connection (e.g. A tool to create/read csvz files from an Oracle database, using existing libraries, would take some Oracle knowledge, and not much else.)
To experience the fun of contributing, see Contributing
Contributors definitely includes people who raise issues. Raising issues is the quickest way to contribute. Also look for issues marked good first issue
or help wanted
A community forum for discussion/ideas for implementors and tool builders is much needed, following issue #14 to find where the community will be built.
To the extent possible under law, Leon Bambrick has waived all copyright and related or neighboring rights to this work.
Some ideas are too smart to live; other ideas are too dumb to die.