Is there a library in Python which allows virtual file system management in single file?

2.4k Views Asked by At

I was working on a program. I do not think I need to show it here, but I was wondering is it possible to create virtual file system stored on a single file. for example I have a file named my_file_system.fs, is there a way to create virtual file system into that single file only. Basically:

/home/xcodz/
    |
    +--myfilesystem.fs
       |
       +--testdir
       +--test.txt
       +--downloads
          |
          +--example1.txt

I basically want basic filesystem interface. no owners, date or other metadata. Zip is a good idea to do that but it just reads the whole file in the system all at once and does not provide file like interface. So I rquired a very basic file system in single file, in which i am able to use files like normal IO objects.

EDIT The files stored in the file system will be as big as 3 GB for a single file, and I do not have that much of a ram. TarFiles doesn't seem to make my work any better

EDIT I really mean to say some filesystem just like the one with virtual box.

2

There are 2 best solutions below

9
Canopus On BEST ANSWER

You can use SVFS package.

SVFS allows to create virtual filesystem inside file on real filesystem. It can be used to store multiple files inside single file (with directory structure). Unlike archives, SVFS allows to modify files in-place. SVFS files use file-like interface, so they can be used (pretty much) like regular Python file objects. Finally, it’s implemented in pure python and doesn’t use any 3rd party modules, so it should be very portable. Tests show write speed to be around 10-12 MB/s and read speed to be around 26-28 MB/s.

1
root On

Solution #1 - TAR file

TAR files are basically a unix filesystem in a single file. You can work with them in python using tarfile.

Pros:

  • Works out of the box.
  • Has all the features of a POSIX filesystem.
  • tarfile provides stream reader & writer APIs for files.

Cons:

  • Doesn't have non-POSIX features like encryption or memory mapped files.
  • Files can't be edited in-place, you'd have to extract them and then re-add them.

Solution #2 - Loopback filesystem

If you can require that mounting is done in order to run your program, you can just use a loopback filesystem:

$ truncate -s 100M /tmp/loopback.ext4
$ mkfs -t ext4 /tmp/loopback.ext4
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done                            
Creating filesystem with 25600 4k blocks and 25600 inodes

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

$ sudo mkdir /mnt/loop
$ sudo mount -o loop /tmp/loopback.ext4 /mnt/loop/
$ df -T /mnt/loop
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop11    ext4   93M   72K   86M   1% /mnt/loop
$ sudo tree /mnt/loop/
/mnt/loop/
└── lost+found

1 directory, 0 files

Pros:

  • Used like a regular filesystem.
  • Accessible from outside the python process, offline and online.
  • Very easy to debug.
  • You can add encryption, use memory mapped files, and any other feature of real filesystems.

Cons:

  • Requires root.
  • Requires mounting before running your process.
  • Requires unmounting (at the very least, in case of crashes).
  • Have to set size upfront, resizing possible but not trivial.
  • Very difficult to support cross-platform.

Solution #3 - DYI filesystem

Since you care most about file I/O, you can implement that using BytesIO. To support multiple files in a filesystem hierarchy, you can put those files in a trie. You need to serialize and deserialize all that, for which you can use pickle.

Pros:

  • Easier to customize than a TAR-based solution.
  • Can be made into a library and be nice and reusable.

Cons:

  • Requires more coding on your side.
  • Pickling the whole data structure every time is not scalable.
  • If you need crash safety, you need to pickle after every (relevant) modification to the trie or any of the file.

What to choose

Since your needs are very basic, go for #1 - TAR files.