Ratarmount collects all file positions inside a TAR so that it can easily jump to and read from any file without extracting it. It, then, mounts the TAR using fusepy for read access just like archivemount. In contrast to libarchive, on which archivemount is based, random access and true seeking is supported. And in contrast to tarindexer, which also collects file positions for random access, ratarmount offers easy access via FUSE and support for compressed TARs.
TAR compressions supported for random access:
Other supported archive formats:
You can install ratarmount either by simply downloading the AppImage or via pip. The latter might require installing additional dependencies.
The AppImage files are attached under "Assets" on the releases page.
They require no installation and can be simply executed like a portable executable.
If you want to install it, you can simply copy it into any of the folders listed in your
wget 'https://github.com/mxmlnkn/ratarmount/releases/download/v0.10.0/ratarmount-manylinux2014_x86_64.AppImage' chmod u+x 'ratarmount-manylinux2014_x86_64.AppImage' ./ratarmount-manylinux2014_x86_64.AppImage --help # Simple test run sudo cp ratarmount-manylinux2014_x86_64.AppImage /usr/local/bin/ratarmount # Example installation
Python 3.6+, preferably pip 19.0+, FUSE, and sqlite3 are required. These should be preinstalled on most systems. On Debian-like systems like Ubuntu, you can install/update all dependencies using:
sudo apt install python3 python3-pip fuse sqlite3 unar
On macOS, you have to install macFUSE with:
brew install macfuse
If you are installing on a system for which there exists no manylinux wheel, then you'll have to install dependencies required to build from source:
sudo apt install python3 python3-pip fuse build-essential software-properties-common zlib1g-dev libzstd-dev liblzma-dev cffi
Then, you can simply install ratarmount from PyPI:
pip install ratarmount
Or, if you want to test the latest version:
python3 -m pip install --user --force-reinstall git+https://github.com/mxmlnkn/[email protected]#egginfo=ratarmount
If there are troubles with the compression backend dependencies, you can try the pip
Ratarmount will work without the compression backends.
The hard requirements are
fusepy and for Python versions older than 3.7.0
For xz support,
lzmaffi will be used if available.
lzmaffi does not provide wheels and the build from source depends on
cffi, which might be missing, only
python-xz is a dependency of ratarmount.
If there are problems with xz files, please report any encountered issues.
But, as a quick workaround, you can try to simply switch out the xz decoder backend by installing
lzmaffi manually and ratarmount will use that instead with higher priority:
sudo apt install liblzma-dev python3 -m pip install --user cffi # Necessary because of missing pyprojects.toml python3 -m pip install --user lzmaffi
--asyncprogressoption to give a progress indicator using the timestamp of a dummy file. Note that fuse-archive daemonizes instantly but the mount point will not be usable for a long time and everything trying to use it will hang until then when not using
-P 0, i.e., when not parallelizing. The gzip backend grows linearly with the archive size because the data for seeking is thousands of times larger than the simple two 64-bit offsets required for bzip2. The memory usage of the zstd backend only seems humongous because it uses
mmapto open. The memory used by
mmapis not even counted as used memory when showing the memory usage with
ratarmount -P 0on most modern processors because it actually uses more than one core for decoding those compressions.
indexed_bzip2supports block parallel decoding since version 1.2.0.
findon the mount point is an order of magnitude slower compared to archivemount. Because the C-based fuse-archive is even slower than ratarmount, the difference is very likely that archivemount uses the low-level FUSE interface while ratarmount and fuse-archive use the high-level FUSE interface.
O( (sizeOfFileToBeCopiedFromArchive / readChunkSize)^2 ). Both, ratarmount and fuse-archive avoid this behavior. Because of this quadratic scaling, the average bandwidth with archivemount seems like it decreases with the file size.
Further benchmarks can be viewed here.
You downloaded a large TAR file from the internet, for example the 1.31TB large ImageNet, and you now want to use it but lack the space, time, or a file system fast enough to extract all the 14.2 million image files.
time cat mounted/ILSVRC2012_val_00049975.JPEG | wc -ctakes 250ms for archivemount and 2ms for ratarmount.
Tarindex is a command line to tool written in Python which can create index files and then use the index file to extract single files from the tar fast. However, it also has some caveats which ratarmount tries to solve:
I didn't find out about TAR Browser before I finished the ratarmount script. That's also one of it's cons:
Ratarmount creates an index file with file names, ownership, permission flags, and offset information.
This sidecar is stored at the TAR file's location or in
Ratarmount can load that index file in under a second if it exists and then offers FUSE mount integration for easy access to the files inside the archive.
The test with the first version (50e8dbb), which used the removed pickle backend for serializing the metadata index, for the ImageNet data set is promising:
The reading time for a small file simply verifies the random access by using file seek to be working. The difference between the first read and subsequent reads is not because of ratarmount but because of operating system and file system caches.
Here is a more recent test for version 0.2.0 with the new default SQLite backend:
usage: ratarmount.py [-h] [-f] [-d DEBUG] [-c] [-r] [--recursion-depth RECURSION_DEPTH] [-l] [-s] [--transform-recursive-mount-point REGEX_PATTERN REPLACEMENT] [-gs GZIP_SEEK_POINT_SPACING] [-p PREFIX] [--password PASSWORD] [--password-file PASSWORD_FILE] [-e ENCODING] [-i] [--gnu-incremental] [--no-gnu-incremental] [--verify-mtime] [--index-file INDEX_FILE] [--index-folders INDEX_FOLDERS] [-w WRITE_OVERLAY] [--commit-overlay] [-o FUSE] [-u] [-P PARALLELIZATION] [-v] mount_source [mount_source ...] [mount_point] With ratarmount, you can: - Mount a (compressed) TAR file to a folder for read-only access - Mount a compressed file to `<mountpoint>/<filename>` - Bind mount a folder to another folder for read-only access - Union mount a list of TARs, compressed files, and folders to a mount point for read-only access Optional Arguments: --password PASSWORD Specify a single password which shall be used for RAR and ZIP files. (default: ) -P PARALLELIZATION, --parallelization PARALLELIZATION If an integer other than 1 is specified, then the threaded parallel bzip2 decoder will be used specified amount of block decoder threads. Further threads with lighter work may be started. A value of 0 will use all the available cores (24). (default: 0) -h, --help Show this help message and exit. -r, --recursive Mount archives inside archives recursively. Same as --recursion-depth -1. (default: False) -u, --unmount Unmount the given mount point. Equivalent to calling "fusermount -u". (default: False) -v, --version Print version information and exit. Positional Options: mount_source The path to the TAR archive to be mounted. If multiple archives and/or folders are specified, then they will be mounted as if the arguments coming first were updated with the contents of the archives or folders specified thereafter, i.e., the list of TARs and folders will be union mounted. mount_point The path to a folder to mount the TAR contents into. If no mount path is specified, the TAR will be mounted to a folder of the same name but without a file extension. (default: None) Index Options: --index-file INDEX_FILE Specify a path to the .index.sqlite file. Setting this will disable fallback index folders. If the given path is ":memory:", then the index will not be written out to disk. (default: None) --index-folders INDEX_FOLDERS Specify one or multiple paths for storing .index.sqlite files. Paths will be tested for suitability in the given order. An empty path will be interpreted as the location in which the TAR resides. If the argument begins with a bracket "[", then it will be interpreted as a JSON-formatted list. If the argument contains a comma ",", it will be interpreted as a comma-separated list of folders. Else, the whole string will be interpreted as one folder path. Examples: --index-folders ",~/.foo" will try to save besides the TAR and if that does not work, in ~/.foo. --index- folders '["~/.ratarmount", "foo,9000"]' will never try to save besides the TAR. --index-folder ~/.ratarmount will only test ~/.ratarmount as a storage location and nothing else. Instead, it will first try ~/.ratarmount and the folder "foo,9000". (default: ,~/.ratarmount) --verify-mtime By default, only the TAR file size is checked to match the one in the found existing ratarmount index. If this option is specified, then also check the modification timestamp. But beware that the mtime might change during copying or downloading without the contents changing. So, this check might cause false positives. (default: False) -c, --recreate-index If specified, pre-existing .index files will be deleted and newly created. (default: False) Recursion Options: --recursion-depth RECURSION_DEPTH This option takes precedence over --recursive. Mount archives inside the mounted archives recursively up to the given depth. A negative value represents infinite depth. A value of 0 will turn off recursion (same as not specifying --recursive in the first place). A value of 1 will recursively mount all archives in the given archives but not any deeper. Note that this only has an effect when creating an index. If an index already exists, then this option will be effectively ignored. Recreate the index if you want change the recursive mounting policy anyways. (default: None) --transform-recursive-mount-point REGEX_PATTERN REPLACEMENT Specify a regex pattern and a replacement string, which will be applied via Python's re module to the full path of the archive to be recursively mounted. E.g., if there are recursive archives: /folder/archive.tar.gz, you can substitute '[.][^/]+$' to '' and it will be mounted to /folder/archive.tar. Or you can replace '^.*/([^/]+).tar.gz$' to '/' to mount all recursive folders under the top-level without extensions. (default: None) -l, --lazy When used with recursively bind-mounted folders, TAR files inside the mounted folder will only be mounted on first access to it. (default: False) -s, --strip-recursive-tar-extension If true, then recursively mounted TARs named <file>.tar will be mounted at <file>/. This might lead to folders of the same name being overwritten, so use with care. The index needs to be (re)created to apply this option! (default: False) Tar Options: --gnu-incremental Will strip octal modification time prefixes from file paths, which appear in GNU incremental backups created with GNU tar with the --incremental or --listed-incremental options. (default: None) --no-gnu-incremental If specified, will never strip octal modification prefixes and will also not do automatic detection. (default: True) -e ENCODING, --encoding ENCODING Specify an input encoding used for file names among others in the TAR. This must be used when, e.g., trying to open a latin1 encoded TAR on an UTF-8 system. Possible encodings: https://docs.python.org/3/library/codecs.html#standard-encodings (default: utf-8) -i, --ignore-zeros Ignore zeroed blocks in archive. Normally, two consecutive 512-blocks filled with zeroes mean EOF and ratarmount stops reading after encountering them. This option instructs it to read further and is useful when reading archives created with the -A option. (default: False) Write Overlay Options: --commit-overlay Apply deletions and content modifications done in the write overlay to the archive. (default: False) -w WRITE_OVERLAY, --write-overlay WRITE_OVERLAY Specify an existing folder to be used as a write overlay. The folder itself will be union-mounted on top such that files in this folder take precedence over all other existing ones. Furthermore, all file creations and modifications will be forwarded to files in this folder. Modifying a file inside a TAR will copy that file to the overlay folder and apply the modification to that writable copy. Deleting files or folders will update the hidden metadata database inside the overlay folder. (default: None) Advanced Options: --password-file PASSWORD_FILE Specify a file with newline separated passwords for RAR and ZIP files. The passwords will be tried out in order of appearance in the file. (default: ) -d DEBUG, --debug DEBUG Sets the debugging level. Higher means more output. Currently, 3 is the highest. (default: 1) -f, --foreground Keeps the python program in foreground so it can print debug output when the mounted path is accessed. (default: False) -gs GZIP_SEEK_POINT_SPACING, --gzip-seek-point-spacing GZIP_SEEK_POINT_SPACING This only is applied when the index is first created or recreated with the -c option. The spacing given in MiB specifies the seek point distance in the uncompressed data. A distance of 16MiB means that archives smaller than 16MiB in uncompressed size will not benefit from faster seek times. A seek point takes roughly 32kiB. So, smaller distances lead to more responsive seeking but may explode the index size! (default: 16) -o FUSE, --fuse FUSE Comma separated FUSE options. See "man mount.fuse" for help. Example: --fuse "allow_other,entry_timeout=2.8,gid=0". (default: ) -p PREFIX, --prefix PREFIX [deprecated] Use "-o modules=subdir,subdir=<prefix>" instead. This standard way utilizes FUSE itself and will also work for other FUSE applications. So, it is preferable even if a bit more verbose.The specified path to the folder inside the TAR will be mounted to root. This can be useful when the archive as created with absolute paths. E.g., for an archive created with `tar -P cf /var/log/apt/history.log`, -p /var/log/apt/ can be specified so that the mount target directory >directly< contains history.log. (default: )
In order to reduce the mounting time, the created index for random access to files inside the tar will be saved to one of these locations. These locations are checked in order and the first, which works sufficiently, will be used. This is the default location order:
This list of fallback folders can be overwritten using the
option. Furthermore, an explicitly named index file may be specified using
--index-file option. If
--index-file is used, then the fallback
folders, including the default ones, will be ignored!
The mount sources can be TARs and/or folders. Because of that, ratarmount
can also be used to bind mount folders read-only to another path similar to
mount --bind. So, for:
ratarmount folder mountpoint
all files in
folder will now be visible in mountpoint.
If multiple mount sources are specified, the sources on the right side will be added to or update existing files from a mount source left of it. For example:
ratarmount folder1 folder2 mountpoint
will make both, the files from folder1 and folder2, visible in mountpoint.
If a file exists in both multiple source, then the file from the rightmost
mount source will be used, which in the above example would be
If you want to update / overwrite a folder with the contents of a given TAR, you can specify the folder both as a mount source and as the mount point:
ratarmount folder file.tar folder
The FUSE option -o nonempty will be automatically added if such a usage is detected. If you instead want to update a TAR with a folder, you only have to swap the two mount sources:
ratarmount file.tar folder folder
If a file exists multiple times in a TAR or in multiple mount sources, then
the hidden versions can be accessed through special
ratarmount folder updated.tar mountpoint
and the file
foo exists both in the folder and as two different versions
updated.tar. Then, you can list all three versions using:
ls -la mountpoint/foo.versions/ dr-xr-xr-x 2 user group 0 Apr 25 21:41 . dr-x------ 2 user group 10240 Apr 26 15:59 .. -r-x------ 2 user group 123 Apr 25 21:41 1 -r-x------ 2 user group 256 Apr 25 21:53 2 -r-x------ 2 user group 1024 Apr 25 22:13 3
In this example, the oldest version has only 123 bytes while the newest and by default shown version has 1024 bytes. So, in order to look at the oldest version, you can simply do:
Note that these version numbers are the same as when used with tar's
ratarmount -o modules=subdir,subdir=<prefix> to remove path prefixes
using the FUSE
subdir module. Because it is a standard FUSE feature, the
-o ... argument should also work for other FUSE applications.
When mounting an archive created with absolute paths, e.g.,
tar -P cf /var/log/apt/history.log, you would see the whole
hierarchy under the mount point. To avoid that, specified prefixes can be
stripped from paths so that the mount target directory directly contains
ratarmount -o modules=subdir,subdir=/var/log/apt/ to do
so. The specified path to the folder inside the TAR will be mounted to root,
i.e., the mount point.
If you want a compressed file not containing a TAR, e.g.,
you can also use ratarmount for that. The uncompressed view will then be
<mountpoint>/foo and you will be able to leverage ratarmount's
seeking capabilities when opening that file.
In contrast to bzip2 and gzip compressed files, true seeking on xz and zst files is only possible at block or frame boundaries. This wouldn't be noteworthy, if both standard compressors for xz and zstd were not by default creating unsuited files. Even though both file formats do support multiple frames and xz even contains a frame table at the end for easy seeking, both compressors write only a single frame and/or block out, making this feature unusable. In order to generate truly seekable compressed files, you'll have to use pixz for xz files. For zstd compressed, you can try with t2sz. The standard zstd tool does not support setting smaller block sizes yet although an issue does exist. Alternatively, you can simply split the original file into parts, compress those parts, and then concatenate those parts together to get a suitable multiframe zst file. Here is a bash function, which can be used for that:
createMultiFrameZstd() ( # Detect being piped into if [ -t 0 ]; then file=$1 frameSize=$2 if [[ ! -f "$file" ]]; then echo "Could not find file '$file'." 1>&2; return 1; fi fileSize=$( stat -c %s -- "$file" ) else if [ -t 1 ]; then echo 'You should pipe the output to somewhere!' 1>&2; return 1; fi echo 'Will compress from stdin...' 1>&2 frameSize=$1 fi if [[ ! $frameSize =~ ^[0-9]+$ ]]; then echo "Frame size '$frameSize' is not a valid number." 1>&2 return 1 fi # Create a temporary file. I avoid simply piping to zstd # because it wouldn't store the uncompressed size. if [[ -d /dev/shm ]]; then frameFile=$( mktemp --tmpdir=/dev/shm ); fi if [[ -z $frameFile ]]; then frameFile=$( mktemp ); fi if [[ -z $frameFile ]]; then echo "Could not create a temporary file for the frames." 1>&2 return 1 fi if [ -t 0 ]; then true > "$file.zst" for (( offset = 0; offset < fileSize; offset += frameSize )); do dd if="$file" of="$frameFile" bs=$(( 1024*1024 )) \ iflag=skip_bytes,count_bytes skip="$offset" count="$frameSize" 2>/dev/null zstd -c -q -- "$frameFile" >> "$file.zst" done else while true; do dd of="$frameFile" bs=$(( 1024*1024 )) \ iflag=count_bytes count="$frameSize" 2>/dev/null # pipe is finished when reading it yields no further data if [[ ! -s "$frameFile" ]]; then break; fi zstd -c -q -- "$frameFile" done fi 'rm' -f -- "$frameFile" )
In order to compress a file named
foo into a multiframe zst file called
foo.zst, which contains frames sized 4MiB of uncompressed ata, you would call it like this:
createMultiFrameZstd foo $(( 4*1024*1024 ))
It also works when being piped to. This can be useful for recompressing files to avoid having to decompress them first to disk.
lbzip2 -cd well-compressed-file.bz2 | createMultiFrameZstd $(( 4*1024*1024 )) > recompressed.zst
Ratarmount can also be used as a library. Using ratarmountcore, files inside archives can be accessed directly from Python code without requiring FUSE. For a more detailed description, see the ratarmountcore readme here.
If ratarmount helped you out and satisfied you so much that you can't help but want to donate, you can toss a coin to your programmer through one of these addresses: