Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width,
XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output
based on a schema written in JSON.
Min Golang Version: 1.14
In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots sub directory, you'll find their corresponding output files.
Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up)
for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.
- No good ETL transform/parser library exists in Golang.
- Even looking into Java and other languages, choices aren't many and all have limitations:
Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
BeanIO can't deal with EDI input.
Jolt can't deal with anything other than JSON input.
JSONata still only JSON -> JSON transform.
- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some
Recent Major Feature Additions/Changes
- 1.0.0 Released!
Transform.RawRecord() for caller of omniparser to access the raw ingested record.
custom_parse in favor of
custom_parse is still usable for
back-compatibility, it is just removed from all public docs and samples).
NonValidatingReader EDI segment reader.
- Added fixed-length file format support in omniv21 handler.
- Added EDI file format support in omniv21 handler.
- Major restructure/refactoring
- Upgrade omni schema version to
omni.2.1 due a number of incompatible schema changes:
- Changed how we handle custom functions: previously we always use strings as in param type as well as result param
type. Not anymore, all types are supported for custom function in and out params.
- Changed the way how we package custom functions for extensions: previously we collect custom functions from all
extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom
functions included in a particular extension are used in that extension.
- A number of package renaming.
- Added CSV file format support in omniv2 handler.
- Introduced IDR node cache for allocation recycling.
- Introduced IDR for in-memory data representation.
- Added trie based high performance
- Command line interface (one-off
transform cmd or long-running http
- JSON stream parser.
- Ability to provide custom functions.
- Ability to provide custom schema handler.
- Ability to customize the built-in omniv2 schema handler's parsing code.
- Ability to provide a new file format support to built-in omniv2 schema handler.