Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Capability to use .zip files as .yml file store. #33

Open
rbruelisauer opened this issue May 29, 2019 · 8 comments
Open

Comments

@rbruelisauer
Copy link

Idea

Currently, the Unicorn Rainbow Filesystem Provider uses the windows file system to store files.
It would be great, if the provider could also read and write .yml files from/to .zip files.

User Story

I'd like use .zip files to store Unicorn serialized .yml files on production systems.

  • All Unicorn Configurations can choose whether to read their content from a .zip file or directly from the file system.
  • Any Unicorn Configuration that uses a zip file storage is read-only (by default).
    Reserializing the content tree does not affect the .yml files in the .zip package.
  • Any Unicorn Configurations which store items in a physical folder can still be serialized as .yml.
    This is helpful for content migration.

Concept

In a few words:

If you don't find a .yml file physically, look out for a .zip File containing that .yml file.

or

mount .yml files into tree as usual
mount .yml files contained in .zip files virtually into the tree.

f.e. if an item is serialized under C:\Projects\myproject\Unicorn\Feature\ModuleA\SomeLong\PathsWithin\ThatModule\MyItem.yml
then it's .yml contents can also be stored in Entry SomeLong\PathsWithin\ThatModule\MyItem.yml of a file named C:\Projects\myproject\Unicorn\Feature\ModuleA.zip.

Read only?

It would be ok if the .zip file storage is read-only, since we're mostly having issues with long paths on production systems where we just sync items into the database, not vice-versa. And if we re-sync from prod it's ok to use a pure .yml store.

Performance Impact

It also seems ok to me, that .zip file store could be a bit slower than direct file access, since it needs to read the .zip file and decompress the files. But that can be done with memory efficient "in-memory" Streams using the System.IO.Compression namespace offered by .NET Framework.
In the end, it might turn out faster to read a single file from a storage into memory and then process it in-memory, than querying the filesystem for thousands of small .yml files.

Related Issues

issue #27: long paths would be solved for most situations too.

@cassidydotdk
Copy link
Member

While it probably will not be .zip files, I am actually currently working on a form of big-file implementation. Will also benefit Azure deployments immensely, to deal with a very few big files rather than 1000s of individual small files.

@rbruelisauer
Copy link
Author

Nice to hear that some improvements are going to happen.

The advantage of using zip over "just inventing of another package binary format" is significant.
I've seen quite some products out in wild that use that kind of .zip file "magic" to save disk space and keep file handling easy. The Zip format itself offers a great way to create file bundles; you also can turn compression to "none", to save any CPU time "wasted" for compression, so there should not be much performance overhead left than "content dictionary management" (which you have to do anyway).
Another advantage is, that there are a lot of standard OS tools that can view and pack .zip files. So it's easy to handle, easy to bundle and offers you flexibility on how to bundle your features.

Direct serialization into .zip files isn't really the goal, because .yml file structure can be turned into .zip files in many other ways.
But directly injecting (read-only) .yml files stored in zip files into unicorn's file system provider is definitely a thing that could come in handy for solution deployments and content migration.

So far I've seen, Unicorn is using "some kind of tree structure" for files, so it should be easy to tweak the part that reads the .yml files to look out for "virtual .yml files" that may exist in .zip files.

Maybe I can invest some time to clone and try a proof-of-concept implementation myself.

@cassidydotdk
Copy link
Member

It's important to keep in mind here; the concepts currently being worked on would be to allow some form of "redistribution" of the file assets related to a configuration. Something like verb=Deploy or so.

At the very core and foundation of Rainbow sits the original premise. The premise, the entire toolset is built upon. And that is, the change in Sitecore's original serialization format to the line-based YAML-like syntax which allows for easy tool-less merging, and git auto-merging.

Changing this into a native .zip format just isn't going to happen. Not as it's default file store. That would have to live in a fork of your own if that's what you're after. It would completely shift the foundation of what Rainbow is and wants to achieve.

@akuryan
Copy link

akuryan commented May 24, 2021

I am also trying to pursue the same solution as proposed by @rbruelisauer - we have tons of tiny yaml files and theirs deployment on our production via MsDeploy takes up to an hour (looks like that's a problem of NTFS volume, mounted via NFS file storage, because sometimes it is rocket fast, especially on lower environments, where we have much faster deployment cadence).
So, it would be nice addition to put a zip file instead of unpacking tons of files and make Unicorn read from zip directly (mount it as FS and do regular import operations), while keeping regular serialization to separate files in order to keep easy tool-less merging, and git auto-merging.

@cassidydotdk
Copy link
Member

cassidydotdk commented May 24, 2021

There are two things I am currently considering.

  1. The ability to "bundle" a configuration or predicate into 1 yml file. The idea being, you do ?verb=Bundle and Unicorn will run through configurations and generate 1 big yaml file per configuration which can then be consumed at an upstream environment. These bundles would be read-only.

  2. And the obvious one with the introduction of protobuf in 10.1; the ability to generate protobuf files based off Unicorn configurations. These would (obviously) again be read-only - but it would effectively completely remove the need to have Unicorn on anything other than the dev environment and Sync would no longer be required at all.

As for the specific question of zipping/unzipping, it might already be doable if you use the built-in powershell commandlets.

For the record, I know the pain of deploying thousands of tiny yaml files onto upstream environments. I think it's more a case of thousands of files however than the actual size of these files. Zipping would help - sure - but getting the number of files down from say 8.000 to 8 - will mean a lot.

@akuryan
Copy link

akuryan commented May 24, 2021

I like idea of big yaml read-only yaml file for better portability between versions.
Thanks for an idea about built-in powershell, but we are not exposing it to outside world; and still this will be the same time issues, but on the moment, when Unicorn is executed (unpacking time will be added before Unicorn execution and I suppose that they will be roughly the same)

@cassidydotdk
Copy link
Member

If anyone is feeling adventurous, both the "big yaml" and "zipped yaml" could and should be implemented entirely by creating a new Rainbow.Storage.SerializationFileSystemDataStore. It's the one gateway between filesystem storage and the rest of the system.

For read-only, methods such as Save, MoveOrRenameItem can be ignored.

@cassidydotdk
Copy link
Member

Finally, once again, consider the PowerShell commandlets. You could generate Sitecore packages at an earlier stage in your deployment process and send those upstream. PowerShell would not be needed to be active upstream, only in your dev environments.

https://github.com/SitecoreUnicorn/Unicorn/blob/master/src/Unicorn/PowerShell/NewUnicornItemSourceCommand.cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants