Aws Glacier Multipart Upload

Script for uploading large files to AWS Glacier
Alternatives To Aws Glacier Multipart Upload
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Evaporatejs1,76226122 months ago51October 08, 201791JavaScript
Javascript library for browser to S3 multipart resumable uploads
Lambdaws1,277
96 years ago15March 26, 201518lgpl-3.0JavaScript
Deploy, run and get results from Amazon AWS Lambda in a breeze
Django S3direct6027619 months ago78June 17, 202233mitPython
Directly upload files to S3 compatible services with Django.
Meteor Slingshot599
4 years ago3June 30, 201698mitJavaScript
Upload files directly to AWS S3, Google Cloud Storage and others in meteor
Gulp Awspublish3981,5301993 months ago57March 04, 202221mitJavaScript
gulp plugin to publish files to amazon s3
React Native Aws33638024 years ago7April 29, 201952mitJavaScript
Pure JavaScript React Native library for uploading to AWS S3
S3 Upload Stream320303863 years ago19December 04, 201422mitJavaScript
A Node.js module for streaming data to Amazon S3 via the multipart upload API
S3 Parallel Put277
3 years ago8April 11, 201912mitPython
Parallel uploads to Amazon AWS S3
Lambda Uploader26193a year ago19May 14, 201831apache-2.0Python
Helps package and upload Python lambda functions to AWS
Node S3 Uploader2327865 years ago45November 24, 201631mitJavaScript
Flexible and efficient resize, rename, and upload images to Amazon S3 disk storage. Uses the official AWS Node SDK for transfer, and ImageMagick for image processing. Support for multiple image versions targets.
Alternatives To Aws Glacier Multipart Upload
Select To Compare


Alternative Project Comparisons
Readme

aws-glacier-multipart-upload

Script for uploading large files to AWS Glacier

Helpful AWS Glacier pages:

Running scripts in parallel:

Motivation

The one-liner upload-archive isn't recommend for files over 100 MB, and you should instead use upload-multipart. The difficult part of using using multiupload is that it is really three major commands, with the second needing to repeated for every file to upload, and a custom byte range needs to be defined for each file chunk that is being uploaded. For example, with a 4MB file (4194304 bytes) the first three files need the following argument. This is repeated 1945 times for my 8GB file.

  • aws glacier upload-multipart-part --body partaa --range 'bytes 0-4194303/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • aws glacier upload-multipart-part --body partab --range 'bytes 4194304-8388607/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • aws glacier upload-multipart-part --body partac --range 'bytes 8388608-12582911/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • 1941 commands later...
  • aws glacier upload-multipart-part --body partzbxu --range 'bytes 8153726976-8157921279/*' --account-id - --vault-name media1 --upload-id [your upload id here]

We need a script to handle the math and autogenerate the code.

This script leverages the parallel library, so my 1945 upload scripts are kicked off in parallel, but are queued up until a core is done with one before proceeding to the next. There is even a progress bar built in that shows you what percent is complete, and an estimated wait time until it is done.

Prerequisites

All of the following items in the Prerequisites section only need to be done once to set things up.

This script depends on jq for dealing with json and parallel for submitting the upload commands in parallel. If you are using Fed/CentOS/RHEL, then run the following:

sudo dnf install jq
sudo dnf install parallel

It assumes you have an AWS account, and have signed up for the glacier service. In this example, I have already created the vault named media1 via AWS console.

It also assumes that you have the AWS Command Line Interface installed on your machine. Again, if you are using Fed/CentOS/RHEL, then here is how you would get it:

sudo pip install awscli

Configure your machine to pass credentials automatically. This allows you pass a single dash with the account-id argument.

aws configure

Before jumping into the script, verify that your connection works by describing the vault you have created, which is media1 in my case. Run this describ-vault command and you should see similiar json results.

aws glacier describe-vault --vault-name media1 --account-id -
{
"SizeInBytes": 11360932143, 
"VaultARN": "arn:aws:glacier:us-east-1:<redacted>:vaults/media1", 
"LastInventoryDate": "2015-12-16T01:23:18.678Z", 
"NumberOfArchives": 7, 
"CreationDate": "2015-12-12T02:22:24.956Z", 
"VaultName": "media1"
}

Download the glacierupload.sh script:

wget https://raw.githubusercontent.com/benporter/aws-glacier-multipart-upload/master/glacierupload.sh

Make it executable:

chmod u+x glacierupload.sh

Script Usage

Tar and zip the files you want to upload:

tar -zcvf my-backup.tar.gz /location/to/zip/*

Now chunk out your zipped file into equal peice chunks. You can only pick multiples of 1MB up to 4MB. This example chunks out the my-backup.tar.gz file into 4MB chunks, giving all of them the prefix part which is what the script expects to see. If you choose something other than part, then you'll need to edit the script.

split --bytes=4194304 --verbose my-backup.tar.gz part

Now it is time to run the script. It assumes that your part* files are in the same directory as the script.

./glacierupload.sh
Popular Upload Projects
Popular Amazon Web Services Projects
Popular Networking Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Shell
Script
Amazon Web Services
Upload
Parallel
Chunk
Vault
Multipart