Awesome Open Source
Awesome Open Source

⚡ PHP7 / Laravel Multi-format Streaming Parser

Build Status Latest Version on Packagist Quality Score Code Coverage License

When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider:

DOM loading: loads all the document, making it easy to navigate and parse, and as such provides maximum flexibility for developers.

Streaming: implies iterating through the document, acts like a cursor and stops at each element in its way, thus avoiding memory overkill.

https://www.linkedin.com/pulse/processing-xml-documents-dom-vs-streaming-marius-ilina/

Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as memory is concerned.

Installation

composer require rodenastyle/stream-parser

Recommended usage

Delegate as possible the callback execution so it doesn't blocks the document reading:

(Laravel Queue based example)

use Tightenco\Collect\Support\Collection;

StreamParser::xml("https://example.com/users.xml")->each(function(Collection $user){
    dispatch(new App\Jobs\SendEmail($user));
});

Practical Input/Code/Output demos

XML

<bookstore>
    <book ISBN="10-000000-001">
        <title>The Iliad and The Odyssey</title>
        <price>12.95</price>
        <comments>
            <userComment rating="4">
                Best translation I've read.
            </userComment>
            <userComment rating="2">
                I like other versions better.
            </userComment>
        </comments>
    </book>
    [...]
</bookstore>
use Tightenco\Collect\Support\Collection;

StreamParser::xml("https://example.com/books.xml")->each(function(Collection $book){
    var_dump($book);
    var_dump($book->get('comments')->toArray());
});
class Tightenco\Collect\Support\Collection#19 (1) {
  protected $items =>
  array(4) {
    'ISBN' =>
    string(13) "10-000000-001"
    'title' =>
    string(25) "The Iliad and The Odyssey"
    'price' =>
    string(5) "12.95"
    'comments' =>
    class Tightenco\Collect\Support\Collection#17 (1) {
      protected $items =>
      array(2) {
        ...
      }
    }
  }
}
array(2) {
  [0] =>
  array(2) {
    'rating' =>
    string(1) "4"
    'userComment' =>
    string(27) "Best translation I've read."
  }
  [1] =>
  array(2) {
    'rating' =>
    string(1) "2"
    'userComment' =>
    string(29) "I like other versions better."
  }
}

Additionally, you could make use of ->withSeparatedParametersList() to get the params of each element separated on the __params property. Also, ->withoutSkippingFirstElement() could be of help to parse the very first item (usually the element that contains the elements).

JSON

[
  {
    "title": "The Iliad and The Odyssey",
    "price": 12.95,
    "comments": [
      {"comment": "Best translation I've read."},
      {"comment": "I like other versions better."}
    ]
  },
  {
    "title": "Anthology of World Literature",
    "price": 24.95,
    "comments": [
      {"comment": "Needs more modern literature."},
      {"comment": "Excellent overview of world literature."}
    ]
  }
]
use Tightenco\Collect\Support\Collection;

StreamParser::json("https://example.com/books.json")->each(function(Collection $book){
    var_dump($book->get('comments')->count());
});
int(2)
int(2)

CSV

title,price,comments
The Iliad and The Odyssey,12.95,"Best translation I've read.,I like other versions better."
Anthology of World Literature,24.95,"Needs more modern literature.,Excellent overview of world literature."
use Tightenco\Collect\Support\Collection;

StreamParser::csv("https://example.com/books.csv")->each(function(Collection $book){
    var_dump($book->get('comments')->last());
});
string(29) "I like other versions better."
string(39) "Excellent overview of world literature."

License

This library is released under MIT license.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
php (16,694
laravel (1,683
json (1,208
parser (528
xml (338
streaming (307
csv (279
collections (68