-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BoltDB to map "Names" to Files #8
Comments
As far as caching is concerned, it doesn't really care about what the original name was. Why do you need to know the name at all? Seems like
|
That's only the start of the ugliness I do in there. It basically says: if the key can be base64 encoded and fit in a std. filename then do that (then the key is decodable from the filename). Otherwise, hash the key and use that as the filename. Then create another file with the name "${hash}.key" and write the key to that file so it can be restored later. I'd much rather have a nice clean, key/value store for managing which keys map to which files. |
Did this go anywhere, as I am using this in a prototype to have multi topology caches. |
Submitting a sample version (very rough), but boltdb is incredibly slow on my windows box (boltdb/bolt#516) sooo don't know if it's worth including. |
badger will get your better perf. Its often used instead of bolt. |
Badger seems like a neat idea to use instead of bolt for managing the key db (certainly open to pull requests to add new "FileSystems" etc. though I might consider adding a subpkg. or something for holding extra ones besides the basic set). I've certain considered trying to build an FS which uses a distributed/cloud-based file system, the main issues I've run into are:
I'd like to build a new library actually that takes advantage of:
|
Hey @djherbis, I didn't get why you can't use a md5 hash of the key as the filename and avoid having a key file. That would also help with a potential issue of having too many files in one folder. Ex: This way, every time you need to do a lookup for some key, you'd need to do something like this:
Thoughts? |
The main issue was with Filesystem.Reload which needs to produce the key from the name. That being said, I might be able to drop that for a Filesystem impl depending on the features that degrades. What's the number of files in a folder limitation your seeing? I thought the limit on that was pretty high. |
I haven't experienced any slowdowns based on the number of files (yet), but this is a notorious thing to avoid when storing data in the filesystem. Good example is Git, that implements a schema similar to what I proposed. For my use case (the music server) I can get to as much as 2-10 times the number of music files in a library (one file for each image size requested by the clients). For my personal library of 40K songs, this would be 400K files. I have a user with 150K+ songs, which would yield to 1.5million files!! I was looking into Reload, can't the Filesystem implementation just return the hash identifiers for the files, instead of a key/name pair? That way, you convert the key to the the hash before calling the Filesystem methods (Create, Delete, etc...) |
I think the main problem is that things like expiration run over all the keys, and we don't know what all the keys are if we can't reload them. But that may not be all that important. Good to know about files per dir and performance, I'll look into another Filesystem implementation. |
Hey @djherbis, I took a stab at creating a FileSystem with the structure I proposed above, storing only the files and not the keys: https://github.com/deluan/navidrome/blob/spread-fscache/core/cache/spread_fs.go Without making any changes to the |
@deluan I took a quick look, I think the problem is that your Reload method returns different keys than were passed to Create. This means when you call cache.Get with a key that should already exist (from Reload), the cache won't think it has it (the map will contain the hashed paths, instead of the passed in key) and you'll get a cache miss on things that should have been reloaded. Now I'm thinking of a simpler soln though, I think I could lift the hashing key logic into the cache map. Then the cache map keys and the filenames would match, instead of having this silly convertion logic. I might run some tests to verify backwards compatibility, or just introduce a feature to control the hashing and a boring filesystem. |
You are right, it was late at night and I overlooked the key mapping... I was trying to do two things with this filesystem:
Well, looks like the item 2 is unattainable without changes to the Anyway, I added a key mapping, similar to the original FS, to my SpreadFS, seems to be working now. Would really appreciate if you could take another look: https://github.com/deluan/navidrome/blob/spread-fscache/core/cache/spread_fs.go Again, thanks for you awesome library! |
From a very brief look at your updates, I think your FS should work (through testing is of course the best way to verify). Key mapping does already happen in the filesystem. Not sure what you mean about not needing to load all keys into memory, we do need an internal map of [key] => [fileStream] at least for running streams, because that's how we're able to internally lookup the stream.Stream which lets you read while it is being written. I think signature-wise Reload/Traverse would actually be the same as both iterate the FS and report all (key, filename) pairs. I considered allowing cache-miss in memory to check the filesytem to load it 'on demand', I suppose we'd need to be able to do a spot-check on that load to verify it hasn't expired, if that's even possible (depends on how the expiration system works). I'm still gonna give a shot at my other idea of moving the key production up a level (optionally), since that would solve "getting rid of the key file". Thanks for the appreciation for this library :) I'm really amazed how much value people get from it, because looking back at it 4-5 years later I'd probably have built it differently from the ground up, but I'm still happy it's useful even if the API is fairly messy imho. Someday I'll get around to building a v2 or new package redesigned from the ground-up that has a cleaner API, and hopefully support for more of the cool 'cloud' use cases that are more relevant today. |
I pushed some changes which should allow this kind of control, dumb example in the tests. I'll expand a bit here:
I think it would take a little more messing with EncodeKey/DecodeKey/SetKeyMapper to get a set of functions that will work correctly going forward but can also load existing cached files. But I think this is enough for a Saturday :) |
^ note I think your new FS is just a special case of this, if you want to remove the keyfiles you can just do this but set your KeyMapper function to your function which produces the /xx/xx/xxxxxxxx hashed filenames. |
The default FileSystem has to do some ugly things to map the "key/name" into a filename. I should just use a Key/Value store like BoltDB.
The text was updated successfully, but these errors were encountered: