Awesome Open Source
Awesome Open Source

Latest Stable Version License Build Status Coverage Status Quality Score Total Downloads

Okay, so you heard about the Extract / Transform / Load pattern and you're looking for a PHP library to do the stuff.

Alright, let's go!

Installation

composer require bentools/etl:^[email protected]

Warning: version 3.0 is a complete rewrite and a involves important BC breaks. Don't upgrade from ^2.0 unless you know what you're doing!

Usage

To sum up, you will apply transformations onto an iterable of any things in order to load them in some place. Sometimes your iterable is ready to go, sometimes you just don't need to perform transformations, but anyway you need to load that data somewhere.

Let's start with a really simple example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Loader\JsonFileLoader;

$data = [
    'foo',
    'bar',
];

$etl = EtlBuilder::init()
    ->loadInto(JsonFileLoader::toFile(__DIR__.'/data.json'))
    ->createEtl();
$etl->process($data);

Basically you just loaded the string ["foo","bar"] into data.json. Yay!

Now let's apply a basic uppercase transformation:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Loader\JsonFileLoader;

$data = [
    'foo',
    'bar',
];

$etl = EtlBuilder::init()
    ->transformWith(new CallableTransformer('strtoupper'))
    ->loadInto(JsonFileLoader::factory())
    ->createEtl();
$etl->process($data, __DIR__.'/data.json'); // You can also set the output file when processing :-)

Didn't you just write the string ["FOO","BAR"] into data.json ? Yes, you did!

Okay, but what if your source data is not an iterable (yet)? It can be a CSV file or a CSV string, for instance. Here's another example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Extractor\CsvExtractor;
use BenTools\ETL\Loader\JsonFileLoader;

$data = <<<CSV
country_code,country_name,president
US,USA,"Donald Trump"
RU,Russia,"Vladimir Putin"
CSV;

$etl = EtlBuilder::init()
    ->extractFrom(new CsvExtractor())
    ->loadInto(JsonFileLoader::factory(['json_options' => \JSON_PRETTY_PRINT]))
    ->createEtl();
$etl->process($data, __DIR__.'/data.json');

As you guessed, the following content was just written into presidents.json:

[
    {
        "country_code": "US",
        "country_name": "USA",
        "president": "Donald Trump"
    },
    {
        "country_code": "RU",
        "country_name": "Russia",
        "president": "Vladimir Putin"
    }
]

We provide helpful extractors and loaders to manipulate JSON, CSV, text, and you'll also find a DoctrineORMLoader for when your transformer yields Doctrine entities.

Because yes, a transformer must return a \Generator. Why? Because a single extracted item can lead to several output items. Let's take a more sophisticated example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Extractor\JsonExtractor;

$pdo = new \PDO('mysql:host=localhost;dbname=test');
$input = __DIR__.'/presidents.json';

$etl = EtlBuilder::init()
    ->extractFrom(new JsonExtractor())
    ->transformWith(
        function ($item) use ($pdo) {
            $stmt = $pdo->prepare('SELECT country_code FROM countries WHERE country_code = ?');
            $stmt->bindValue(1, $item['country_code'], \PDO::PARAM_STR);
            $stmt->execute();
            if (0 === $stmt->rowCount()) {
                yield ['INSERT INTO countries (country_code, country_name) VALUES (?, ?)', [$item['country_code'], $item['country_name']]];
            }

            yield ['REPLACE INTO presidents (country_code, president_name) VALUES (?, ?)', [$item['country_code'], $item['president']]];

        }
    )
    ->loadInto(
        $loader = function (\Generator $queries) use ($pdo) {
            foreach ($queries as $query) {
                list($sql, $params) = $query;
                $stmt = $pdo->prepare($sql);
                foreach ($params as $i => $value) {
                    $stmt->bindValue($i + 1, $value);
                }
                $stmt->execute();
            }
        }
    )
    ->createEtl();

$etl->process(__DIR__.'/presidents.json'); // The JsonExtractor will convert that file to a PHP array

As you can see, from a single item, we loaded up to 2 queries.

Your extractors, transformers and loaders can implement ExtractorInterface, TransformerInterface or LoaderInterface as well as being simple callables.

Skipping items

Each extractor / transformer / loader callback gets the current Etl object injected in their arguments.

This allows you to ask the ETL to skip an item, or even to stop the whole process:

use BenTools\ETL\Etl;
use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Transformer\CallableTransformer;

$fruits = [
    'apple',
    'banana',
    'strawberry',
    'pineapple',
    'pear',
];


$storage = [];
$etl = EtlBuilder::init()
    ->transformWith(new CallableTransformer('strtoupper'))
    ->loadInto(
        function ($generated, $key, Etl $etl) use (&$storage) {
            foreach ($generated as $fruit) {
                if ('BANANA' === $fruit) {
                    $etl->skipCurrentItem();
                    break;
                }
                if ('PINEAPPLE' === $fruit) {
                    $etl->stopProcessing();
                    break;
                }
                $storage[] = $fruit;
            }
        })
    ->createEtl();

$etl->process($fruits);

var_dump($storage); // ['APPLE', 'STRAWBERRY']

Events

Now you're wondering how you can hook on the ETL lifecycle, to log things, handle exceptions, ... This library ships with a built-in Event Dispatcher that you can leverage when:

  • The ETL starts
  • An item has been extracted
  • The extraction failed
  • An item has been transformed
  • Transformation failed
  • Loader is initialized (1st item is about to be loaded)
  • An item has been loaded
  • Loading failed
  • An item has been skipped
  • The ETL was stopped
  • A flush operation was completed
  • A rollback operation was completed
  • The ETL completed the whole process.

The ItemEvents (on extract, transform, load) will allow you to mark the current item to be skipped, or even handle runtime exceptions. Let's take another example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\EventDispatcher\Event\ItemExceptionEvent;

$fruits = [
    'apple',
    new \RuntimeException('Is tomato a fruit?'),
    'banana',
];


$storage = [];
$etl = EtlBuilder::init()
    ->transformWith(
        function ($item, $key) {
            if ($item instanceof \Exception) {
                throw $item;
            }

            yield $key => $item;
        })
    ->loadInto(
        function (iterable $transformed) use (&$storage) {
            foreach ($transformed as $fruit) {
                $storage[] = $fruit;
            }
        })
    ->onTransformException(
        function (ItemExceptionEvent $event) {
            echo $event->getException()->getMessage(); // Is tomato a fruit?
            $event->ignoreException();
        })
    ->createEtl();

$etl->process($fruits);

var_dump($storage); // ['apple', 'banana']

Here, we intentionnally threw an exception during the transform operation. But thanks to the event dispatcher, we could tell the ETL this exception can be safely ignored and it can pursue the rest of the process.

You can attach as many event listeners as you wish, and sort them by priority.

Recipes

A recipe is an ETL pattern that can be reused through different tasks. If you want to log everything that goes through an ETL for example, use our built-in Logger recipe:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Recipe\LoggerRecipe;

$etl = EtlBuilder::init()
    ->useRecipe(new LoggerRecipe($logger))
    ->createEtl();

You can also create your own recipes:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\EventDispatcher\Event\ItemEvent;
use BenTools\ETL\Extractor\JsonExtractor;
use BenTools\ETL\Loader\CsvFileLoader;
use BenTools\ETL\Recipe\Recipe;


class JSONToCSVRecipe extends Recipe
{
    /**
     * @inheritDoc
     */
    public function updateBuilder(EtlBuilder $builder): EtlBuilder
    {
        return $builder
            ->extractFrom(new JsonExtractor())
            ->loadInto($loader = CsvFileLoader::factory(['delimiter' => ';']))
            ->onLoaderInit(
                function (ItemEvent $event) use ($loader) {
                    $loader::factory(['keys' => array_keys($event->getItem())], $loader);
                })
            ;
    }

}

$builder = EtlBuilder::init()->useRecipe(new JSONToCSVRecipe());
$etl = $builder->createEtl();
$etl->process(__DIR__.'/presidents.json', __DIR__.'/presidents.csv');

The above example will result in presidents.csv containing:

country_code;country_name;president
US;USA;"Donald Trump"
RU;Russia;"Vladimir Putin"

To sum up, a recipe is a kind of an ETLBuilder factory, but keep in mind that a recipe will only add event listeners to the existing builder but can also replace the builder's extractor, transformer and/or loader.

Tests

./vendor/bin/phpunit

License

MIT


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
php (15,969) 
transformer (167) 
loader (160) 
input (119) 
etl (94) 
extract (65) 
export (64) 
import (50) 
transform (46) 
pattern (45) 
extractor (22) 
loop (22) 
load (19) 

Find Open Source By Browsing 7,000 Topics Across 59 Categories

Advertising 📦 10
All Projects
Application Programming Interfaces 📦 124
Applications 📦 192
Artificial Intelligence 📦 78
Blockchain 📦 73
Build Tools 📦 113
Cloud Computing 📦 80
Code Quality 📦 28
Collaboration 📦 32
Command Line Interface 📦 49
Community 📦 83
Companies 📦 60
Compilers 📦 63
Computer Science 📦 80
Configuration Management 📦 42
Content Management 📦 175
Control Flow 📦 213
Data Formats 📦 78
Data Processing 📦 276
Data Storage 📦 135
Economics 📦 64
Frameworks 📦 215
Games 📦 129
Graphics 📦 110
Hardware 📦 152
Integrated Development Environments 📦 49
Learning Resources 📦 166
Legal 📦 29
Libraries 📦 129
Lists Of Projects 📦 22
Machine Learning 📦 347
Mapping 📦 64
Marketing 📦 15
Mathematics 📦 55
Media 📦 239
Messaging 📦 98
Networking 📦 315
Operating Systems 📦 89
Operations 📦 121
Package Managers 📦 55
Programming Languages 📦 245
Runtime Environments 📦 100
Science 📦 42
Security 📦 396
Social Media 📦 27
Software Architecture 📦 72
Software Development 📦 72
Software Performance 📦 58
Software Quality 📦 133
Text Editors 📦 49
Text Processing 📦 136
User Interface 📦 330
User Interface Components 📦 514
Version Control 📦 30
Virtualization 📦 71
Web Browsers 📦 42
Web Servers 📦 26
Web User Interface 📦 210