Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when loading array #884

Closed
lgarrison opened this issue Nov 5, 2020 · 4 comments
Closed

segfault when loading array #884

lgarrison opened this issue Nov 5, 2020 · 4 comments
Labels

Comments

@lgarrison
Copy link
Contributor

The following code causes a segfault on my machine:

import numpy as np
import asdf

asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
with asdf.open('tmp.asdf') as f:
  d = f.tree['data'][:]
print(d)

I'm on Python 3.8, Numpy 1.19.4, asdf 2.7.1 from pip (fresh conda env). Also fails out of asdf master. The segfault happens on the last line, but only when trying to force a non-lazy load with [:].

I'm probably abusing asdf with this syntax! I was experimenting to see if I could use [:] as shorthand for lazy_load=False, copy_arrays=True. But I think calling copy() achieves the same thing (possibly at the minor expense of a temporary, extra copy?). Or I should stop being lazy and just use the proper arguments!

@eslavich
Copy link
Contributor

eslavich commented Nov 6, 2020

I think we can do better than a segfault here, but [:] just gives you a view over the same memory mapped ndarray, so it also loses access to the data when the file is closed. copy() is probably your best option -- I think that will only result in one copy of the ndarray in process memory, since the original ndarray lives in the page cache.

Let's keep this issue open so that we remember to replace the segfault with a reasonable exception.

@eslavich eslavich added the bug label Nov 6, 2020
@jdavies-st
Copy link
Contributor

The case is handled when it's not a view.

In [10]: import numpy as np
    ...: import asdf
    ...: 
    ...: asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
    ...: with asdf.open('tmp.asdf') as f:
    ...:   d = f.tree['data']
    ...: 

In [11]: print(d)
<array (unloaded) shape: [100] dtype: int64>

In [13]: d += 1
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-13-2237cfe673cf> in <module>
----> 1 d += 1

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in __operation__(self, *args)
    517 def _make_operation(name):
    518     def __operation__(self, *args):
--> 519         return getattr(self._make_array(), name)(*args)
    520     return __operation__
    521 

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in _make_array(self)
    266 
    267             self._array = np.ndarray(
--> 268                 shape, dtype, block.data,
    269                 self._offset, self._strides, self._order)
    270             self._array = self._apply_mask(self._array, self._mask)

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/block.py in data(self)
   1167         if self._data is None:
   1168             if self._fd.is_closed():
-> 1169                 raise IOError(
   1170                     "ASDF file has already been closed. "
   1171                     "Can not get the data.")

OSError: ASDF file has already been closed. Can not get the data.

So handling the case where it is view is the issue it seems?

@eslavich
Copy link
Contributor

eslavich commented May 7, 2021

One of the astropy maintainers pointed out a technique they use for safely closing mmaps:

https://github.com/astropy/astropy/blob/bb4c1973faffea88edc9068df6e95d4452e82928/astropy/io/fits/file.py#L405-L417

Maybe useful for asdf?

@braingram
Copy link
Contributor

Fixed in 3.1.0 with #1668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants