-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement S3 Object Storage for Package Repositories #291
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Packages can come from either packages or system-packages directories and both need to be checked to see if the cached package should be kept. Packages coming from the system-packages directory were not taken into account and pruned from the builder cache on each buildrun only to then be re-uploaded.
As the comment indiciated, earlier versions of Paramiko did not provide a rename/move operation that was compatible with BFS due to the use of hardlinks. Newer versions do now provide posix_rename that can be used instead of the manual "mv" shell command.
Since building the host tools as part of the container image build, the source volume is not actually needed for the backend and the frontend never needed that volume in the first place. Originally the shared source volume was meant to reduce used disk space when running multiple instances. This is not needed anymore as the image contains and shares the host tools and the bootstrap process for getting the system-packages has been externalized. The only user of the shared sources was the built in licenses in the Haiku repository. For now, provide these in the image as well. This could later also be moved to an external archive like what is done for the system-packages.
Adding the HaikuPorter sources to the image invalidates the cache for each change that is made. Move that install to the end and into separate steps so that package installation and minisign build can be cached.
This makes this work more out of the box.
This is only printed when system-packages are missing.
The echo command, introduced to make the output easier to read, was hiding the return value of the actual package repository creation command.
This furthers abstraction and will be needed when packages are not necessarily local anymore. Read and write are implemented as streaming operations using file objects to allow for various backends without the need for local temporary copies of files.
That's what the member variable is called and what that list actually contains.
These are never used as the obsoletion is handled at the Repository and PackageRepository level.
The storage backend is used to hold the actual packages while the local packages directory is only used to keep track of the current package list. New packages are spooled to the local packages directory as they are built and are kept there for adding them to the package repo file (where package information is needed and the checksum is calculated). Once added to the repo, the packages are uploaded to object storage and the local copy is stubbed out to an 0 byte file. When dependency packages are needed on the builder (and are not already cached there), they are streamed directly from object storage without repopulating the local packages directory. After the package repo is updated it is uploaded to object storage as well, along with its info file, sha256 checksum and the package list file. This allows the object storage to be used as a complete package repo by pkgman directly. Finally packages in object storage are then pruned based on the list of current local stub package files to keep the state in sync. Note that this requires a "package_repo" command that supports the "-t" argument to the "update" command as only stub packages are available locally and the package info can therefore not be extracted from them. Instead the package names are assumed to be canonical and the package info to be immutable. This is unproblematic, as the buildmaster setup ensures that packages cannot be overwritten (this would also have failed previously as the checksums were intentionally not revalidated). The storage backend config file path is given with a new "--storage-backend-config" option. It should point to a JSON file with a "backend_type" string (only "s3" is supported for now). A sample config is also included. An empty path is allowed and causes no storage backend to be used. The S3 storage backend needs an "endpoint_url", "access_key_id", "secret_access_key" and "bucket_name" to be specified in the config file. An optional "prefix" can also be supplied to place multiple instances into the same bucket. Include the storage backend config option in the buildmaster scripts fed from a "STORAGE_BACKEND_CONFIG" environment variable for easy configuration.
The packages repository never actually needed to be shared or separate and can just as well be located on the main buildmaster volume. It was originally shared only so that repositories for multiple architectures could be served from a single server. When using object storage as the storage backend, the repository directories are only used to keep the state and don't provide the actual repo or package files. In this case a separate volume is even less useful. Point frontend container to the single buildmaster volume instead of the previously shared instances directory on the packages volume. This means that the fontend will generally not be shared across architectures anymore. Since it reduces the scope of the shared volumes this does ease deployment. The "repo_consistency.txt" and "report.txt", that report the consistency of the recipe and package repository respectively, are moved from the packages volume to the output directory as this makes them accessible through the normal frontend.
kallisti5
approved these changes
Aug 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea on the licenses from the build container! They were always a pain to groom.
This patchset is pretty amazing! It's going to solve a lot of maintenance issues we have had over the years. NICE WORK!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After deployment this should fix #258.