should_update without internet connection #888

drodarie · 2024-09-24T13:51:07Z

I have a file (1.json from the Allen website) that is stored locally on my machine thanks to the cache system.
Yet, every time I need to fetch the file with bsb, it checks the meta of the file which means it needs to fetch its equivalent from internet to compare.

The question is: should we be able to bypass this test if the internet connection is down but the file exists locally?
This became an issue on clusters where the internet connection is closed once you launch a job.

Helveg · 2024-09-28T15:30:52Z

Could you post the relevant code for me? I'd like to check what you mean by "checks the meta of the file", the best way to deal with this is probably by setting a shorter timeout on the connection attempt, and to fall back as gracefully as possible? Indeed, if we have a cached file we could after falling back continue with the cached version.

I'm not sure if the Allen partition can operate without access to such a file at all? So I'm not sure what we could do on a machine without an internet connection; we could let the user provide the file manually? We make an OfflineAllenPartition ;p

drodarie · 2024-10-03T17:41:53Z

Could you post the relevant code for me?

The function that calls for the json file is AllenStructure._dl_structure_ontology from bsb-core/bsb/topology/partition.py
In short this leverages a FileDependency which, when you need to load the file content, checks the meta of the file to see if it needs to be updated.

I'm not sure if the Allen partition can operate without access to such a file at all?

Indeed no. It is a requirement. The precise situation is that we are launching a job on a cluster node which has no access to internet. So, we downloaded in advance the file with bsb (so that it is stored properly in the cache folder) and it should not need to be updated.

we could let the user provide the file manually?

Yes I think this is the safest option but just out of curiosity, I was wondering why the code was failing despite having the file locally (in the bsb cache folder).

Helveg · 2024-10-04T21:04:13Z

I wouldn't know without taking a deeper look at the code for which I don't have the time :( There might be a couple of causes. I'm assuming that the cached files get a hashed filename? If so, the hash might differ between machines, or even worse, between Python processes (which would mean the cache kind of sucks). Another cause might be that the whole file dependency code is probably a lot too branchy and complex, and that in some of the branches the cache isn't used?

In any case, the true solution to get out of the woods with complicated branchy stuff is to add unit tests that can spy whether the cached file is hit, and can assert that the code doesn't fetch the remote file again if it has it cached.

You can add a config attr to provide the file

drodarie added the question Further information is requested label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should_update without internet connection #888

should_update without internet connection #888

drodarie commented Sep 24, 2024 •

edited

Loading

Helveg commented Sep 28, 2024

drodarie commented Oct 3, 2024

Helveg commented Oct 4, 2024 •

edited

Loading

should_update without internet connection #888

should_update without internet connection #888

Comments

drodarie commented Sep 24, 2024 • edited Loading

Helveg commented Sep 28, 2024

drodarie commented Oct 3, 2024

Helveg commented Oct 4, 2024 • edited Loading

drodarie commented Sep 24, 2024 •

edited

Loading

Helveg commented Oct 4, 2024 •

edited

Loading