Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[profile:matplotlib/jbmc ERROR] hit command failed #112

Closed
certik opened this issue Sep 30, 2013 · 41 comments
Closed

[profile:matplotlib/jbmc ERROR] hit command failed #112

certik opened this issue Sep 30, 2013 · 41 comments
Assignees

Comments

@certik
Copy link
Member

certik commented Sep 30, 2013

I just hit this error at cloud.sagemath.com:

$ ./update
Up to date: launcher
Up to date: m4
Up to date: autoconf
Up to date: automake
Up to date: libtool
Up to date: pkgconf
Up to date: patchelf
Up to date: bzip2
Up to date: cmake
Up to date: ncurses
Up to date: zlib
Up to date: openssl
Up to date: readline
Up to date: sqlite
Up to date: python
Up to date: cython
Up to date: distribute
Up to date: freetype
Up to date: szip
Up to date: hdf5
Up to date: jinja2
Up to date: python-readline
Up to date: pyzmq
Up to date: tornado
Up to date: ipython
Up to date: lapack
Up to date: numpy
Up to date: png
Up to date: matplotlib
Up to date: matplotlib-basemap
Up to date: netcdf4
Up to date: nose
Up to date: python-netcdf4
Up to date: scipy
Building profile
[profile] Building zk5n.., follow log with:
[profile]   tail -f /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-1/build.log
[profile:matplotlib/jbmc ERROR] hit command failed
[profile ERROR] hit command failed

It used to work previously. I set the priority to high, because this bug prevents usage of hashstack.

Part of the bug is that the error message is not helpful --- there should be some obvious way to debug this.

@ahmadia
Copy link

ahmadia commented Sep 30, 2013

what sort of filesystem is /mnt/home? Have you tried running the script from an IPython debugger session?

@certik
Copy link
Member Author

certik commented Sep 30, 2013

I think I know what the issue is --- the /mnt/home/NBgQrbd5 part is dynamically changing when I log in the next time. I have no idea what filesystem is /mnt/home.

@certik
Copy link
Member Author

certik commented Sep 30, 2013

IPython debugging session gives:

~/repos/hashstack(packages)$ ipython
Python 2.7.5 (default, Aug 15 2013, 09:07:40)
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: run ./update
Up to date: launcher
Up to date: m4
Up to date: autoconf
Up to date: automake
Up to date: libtool
Up to date: pkgconf
Up to date: patchelf
Up to date: bzip2
Up to date: cmake
Up to date: ncurses
Up to date: zlib
Up to date: openssl
Up to date: readline
Up to date: sqlite
Up to date: python
Up to date: cython
Up to date: distribute
Up to date: freetype
Up to date: szip
Up to date: hdf5
Up to date: jinja2
Up to date: python-readline
Up to date: pyzmq
Up to date: tornado
Up to date: ipython
Up to date: lapack
Up to date: numpy
Up to date: png
Up to date: matplotlib
Up to date: matplotlib-basemap
Up to date: netcdf4
Up to date: nose
Up to date: python-netcdf4
Up to date: scipy
Building profile
[profile] Building zk5n.., follow log with:
[profile]   tail -f /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
[profile:matplotlib/jbmc ERROR] hit command failed
[profile ERROR] hit command failed
An exception has occurred, use %tb to see the full traceback.

SystemExit: 127


In [2]: %tb
---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
/usr/local/sage/sage-5.11/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    202             else:
    203                 filename = fname
--> 204             __builtin__.execfile(filename, *where)

/mnt/home/NBgQrbd5/repos/hashstack/update in <module>()
     18 # Rest of builder assume the python-hpcmp dir is the cwd
     19 os.chdir(root_dir)
---> 20 sys.exit(help_on_exceptions(logger, main, logger, get_hdist_config_filename()))

SystemExit: 127

@ahmadia
Copy link

ahmadia commented Sep 30, 2013

And what's in: /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
?

@certik
Copy link
Member Author

certik commented Sep 30, 2013

It ends like this:

~/repos/hashstack(packages)$ tail /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/lib/pkgconfig/libpng16.pc')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpng.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpng.3')
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpngpf.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpngpf.3')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man5/png.5', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5/png.5')
Linking matplotlib/jbmc into /mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n
running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json']
hit command failed
hit command failed

There don't seem to be any other errors previously.

@ahmadia
Copy link

ahmadia commented Sep 30, 2013

That's a file system error. You might try disabling symbolic links in
builder. Let me know if you want me to point out how I'm doing that in
Cygwin.

On Mon, Sep 30, 2013 at 3:26 PM, Ondřej Čertík [email protected]:

It ends like this:

~/repos/hashstack(packages)$ tail /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/lib/pkgconfig/libpng16.pc')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpng.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpng.3')
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpngpf.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpngpf.3')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man5/png.5', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5/png.5')
Linking matplotlib/jbmc into /mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n
running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json']
hit command failed
hit command failed


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395416
.

@certik
Copy link
Member Author

certik commented Sep 30, 2013

If you can point me to that, that would be awesome. What sort of filesystem error is it?

@ahmadia
Copy link

ahmadia commented Sep 30, 2013

Almost certainly:

running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json']

Is raising an IOError. We should probably catch that Exception (or provide
an option to) instead of swallowing it, which might give you a little bit
of a hint about what went wrong.

You can try disabling the links with the following modification:

9  builder/recipes.py
View file @ 13ba05fhttps://github.com/hashdist/hashstack/blob/13ba05fc8db02adad0e56dcdef4a513830399ca3/builder/recipes.py
@@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec):
] rules += [- {"action": "relative_symlink",

  •     "select": "$ARTIFACT/lib/python_/site-packages/_",
    
  •     "prefix": "$ARTIFACT",-         "target": "$PROFILE",
    
  •     "dirs": True},-        {"action": "exclude",
    
  •     "select": "$ARTIFACT/lib/python_/site-packages/__/_"},
    
  •    {"action": "relative_symlink",+        {"action": "copy",
      "select": "$ARTIFACT/_/__/_",          "prefix": "$ARTIFACT",
      "target": "$PROFILE"}
    

On Mon, Sep 30, 2013 at 3:29 PM, Ondřej Čertík [email protected]:

If you can point me to that, that would be awesome. What sort of
filesystem error is it?


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395662
.

@ahmadia
Copy link

ahmadia commented Sep 30, 2013

Bleh, that didn't format well, it should look like this:

@@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec):

     rules += [
-        {"action": "relative_symlink",
-         "select": "$ARTIFACT/lib/python*/site-packages/*",
-         "prefix": "$ARTIFACT",
-         "target": "$PROFILE",
-         "dirs": True},
-        {"action": "exclude",
-         "select": "$ARTIFACT/lib/python*/site-packages/**/*"},
-        {"action": "relative_symlink",
+        {"action": "copy",
          "select": "$ARTIFACT/*/**/*",
          "prefix": "$ARTIFACT",
          "target": "$PROFILE"}

@ahmadia
Copy link

ahmadia commented Oct 1, 2013

Weird, I think I may be seeing something similar if the destination already
exists. I don't think this is high priority as long as you can workaround
it, but it's definitely something I'll look at this month.

On Mon, Sep 30, 2013 at 3:35 PM, Aron Ahmadia [email protected] wrote:

Almost certainly:

running ['hit', u'create-links', u'/tmp/hashdist-run-job-
Vc2DXI/1_in0.json']

Is raising an IOError. We should probably catch that Exception (or
provide an option to) instead of swallowing it, which might give you a
little bit of a hint about what went wrong.

You can try disabling the links with the following modification:

9  builder/recipes.py
View file @ 13ba05fhttps://github.com/hashdist/hashstack/blob/13ba05fc8db02adad0e56dcdef4a513830399ca3/builder/recipes.py
@@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec):
] rules += [

  •    {"action": "relative_symlink",
    
  •     "select": "$ARTIFACT/lib/python_/site-packages/_",
    
  •     "prefix": "$ARTIFACT",  -         "target": "$PROFILE",
    
  •     "dirs": True},  -        {"action": "exclude",
    
  •     "select": "$ARTIFACT/lib/python_/site-packages/__/_"},
    
  •    {"action": "relative_symlink",  +        {"action": "copy",
      "select": "$ARTIFACT/_/__/_",            "prefix": "$ARTIFACT",
      "target": "$PROFILE"}
    

On Mon, Sep 30, 2013 at 3:29 PM, Ondřej Čertík [email protected]:

If you can point me to that, that would be awesome. What sort of
filesystem error is it?


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395662
.

@certik
Copy link
Member Author

certik commented Oct 2, 2013

Ok, on my work computer I am now getting the same error:

Up to date: png
Up to date: matplotlib
Building profile
[profile] Building kpmj.., follow log with:
[profile]   tail -f /auto/netscratch/ondrej/bld/profile-n-kpmj-1/build.log
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed

and

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-kpmj-1/build.log
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/auto/netscratch/ondrej/opt/profile/kpmj/lib/pkgconfig/libpng16.pc')
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3',)
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man3/libpng.3', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3/libpng.3')
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man3/libpngpf.3', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3/libpngpf.3')
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man5',)
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man5/png.5', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man5/png.5')
Linking matplotlib/om73 into /auto/netscratch/ondrej/opt/profile/kpmj
running ['hit', u'create-links', u'/tmp/hashdist-run-job-5r20OA/1_in0.json']
hit command failed
hit command failed

So it has nothing to do with changing the /mnt/path.

@certik
Copy link
Member Author

certik commented Oct 2, 2013

Ok, it's now rebuilding everything again with your patch. We'll see if it fixes it.

It only happens with matplotlib, not with other things...

@certik
Copy link
Member Author

certik commented Oct 2, 2013

Ok, so the patch does not fix it:

[png] Unpacking sources files:tzkhasbvvydlpjkjd6plccbfv6pkcqoy
[png] Unpacking sources tar.gz:dj4va2fjpzsuvcl3usxe76jiywh6phjz
[png] Building cthf.., follow log with:
[png]   tail -f /auto/netscratch/ondrej/bld/png-n-cthf/build.log
Downloading sources for matplotlib
Building matplotlib
[matplotlib] Unpacking sources files:ywt35gj3h7ucyjgzisnqnzht64fjgx5m
[matplotlib] Unpacking sources tar.gz:klqys4vo3bptbmc455axpdwho2c56yas
[matplotlib] Building 46q2.., follow log with:
[matplotlib]   tail -f /auto/netscratch/ondrej/bld/matplotlib-n-46q2/build.log
Building profile
[profile] Building 67ki.., follow log with:
[profile]   tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

with:

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.pyc')
Linking python-readline/2b76 into /auto/netscratch/ondrej/opt/profile/67ki
running ['hit', u'create-links', u'/tmp/hashdist-run-job-9ZMhbL/1_in0.json']
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages',)
hit command failed
hit command failed

So we still need to figure out a proper patch. Until then it's a high priority issue, since I can't work with hashdist anymore until I figure out a workaround. Something must have happened in the last patches, since things have been working for me perfectly before.

@ahmadia
Copy link

ahmadia commented Oct 2, 2013

I will take a look.

A

On Wednesday, October 2, 2013, Ondřej Čertík wrote:

Ok, so the patch does not fix it:

[png] Unpacking sources files:tzkhasbvvydlpjkjd6plccbfv6pkcqoy
[png] Unpacking sources tar.gz:dj4va2fjpzsuvcl3usxe76jiywh6phjz
[png] Building cthf.., follow log with:
[png] tail -f /auto/netscratch/ondrej/bld/png-n-cthf/build.log
Downloading sources for matplotlib
Building matplotlib
[matplotlib] Unpacking sources files:ywt35gj3h7ucyjgzisnqnzht64fjgx5m
[matplotlib] Unpacking sources tar.gz:klqys4vo3bptbmc455axpdwho2c56yas
[matplotlib] Building 46q2.., follow log with:
[matplotlib] tail -f /auto/netscratch/ondrej/bld/matplotlib-n-46q2/build.log
Building profile
[profile] Building 67ki.., follow log with:
[profile] tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

with:

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.pyc')
Linking python-readline/2b76 into /auto/netscratch/ondrej/opt/profile/67ki
running ['hit', u'create-links', u'/tmp/hashdist-run-job-9ZMhbL/1_in0.json']
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages',)
hit command failed
hit command failed

So we still need to figure out a proper patch.


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25586931
.

@ghost ghost assigned ahmadia Oct 3, 2013
@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Just to clarify, this is breaking when 'create-links' gets called for either png or matplotlib? To what level can you reproduce this? What commits of hashdist and hashstick are you using?

I'll try to reproduce this on my local OS X box.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

This is 100% reproducible on my machine. I know it fails for matplotlib. It seems to work for some other packages.

Do you have any ideas how to debug it? I can do the debugging.

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Have you tried running:

hit create-links /tmp/hashdist-run-job-5r20OA/1_in0.json?

You could even run that in gdb or IPython for a better trace.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

I think I know. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.py
index c7cc77b..6e751a5 100644
--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -254,19 +254,26 @@ def execute_links_dsl(rules, env={}, launcher_program=None
     logger : Logger

     """
+    print "I am here"
     actions = dry_run_links_dsl(rules, env)
     for action in actions:
         action_desc = "%s%r" % (action[0].__name__, action[1:])
         try:
+            print "1"
             if action[0] is make_launcher:
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
                 action[0](*action[1:])
+            print "2"
             logger.debug(action_desc)
+            print "3"
         except OSError, e:
             # improve error message to include operation attempted
+            print "exception 1"
             msg = str(e) + " in " + action_desc
             logger.error(msg)
             exc_type, exc_val, exc_tb = sys.exc_info()
+            print "exception 2"
             raise OSError, OSError(e.errno, msg), exc_tb
+    print "OK"

produces:

I am here
1
2
3
1
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

So the problem is in the lines:

             if action[0] is make_launcher:
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
                 action[0](*action[1:])

I'll keep digging.

@dagss
Copy link
Member

dagss commented Oct 3, 2013

If you print action you'll have nailed it..

@certik
Copy link
Member Author

certik commented Oct 3, 2013

On Thu, Oct 3, 2013 at 12:37 PM, ahmadia [email protected] wrote:

Have you tried running:

hit create-links /tmp/hashdist-run-job-5r20OA/1_in0.json?

You could even run that in gdb or IPython for a better trace.

The /tmp/hashdist-run-job-5r20OA/1_in0.json file does not exist, so I can't
easily run it.
But my "print" method debugging will get me there.
O.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

I think I've nailed it. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.py
index c7cc77b..0b970ce 100644
--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -254,19 +254,29 @@ def execute_links_dsl(rules, env={}, launcher_program=None
     logger : Logger

     """
+    print "I am here"
     actions = dry_run_links_dsl(rules, env)
     for action in actions:
         action_desc = "%s%r" % (action[0].__name__, action[1:])
         try:
+            print "1"
+            print action
             if action[0] is make_launcher:
+                print "1a"
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
+                print "1b"
                 action[0](*action[1:])
+            print "2"
             logger.debug(action_desc)
+            print "3"
         except OSError, e:
             # improve error message to include operation attempted
+            print "exception 1"
             msg = str(e) + " in " + action_desc
             logger.error(msg)
             exc_type, exc_val, exc_tb = sys.exc_info()
+            print "exception 2"
             raise OSError, OSError(e.errno, msg), exc_tb
+    print "OK"

produces

I am here
1
(<function silent_makedirs at 0x1dc48c0>, u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages')
1b
2
3
1
(<function silent_copy at 0x1dc4758>, '/auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/easy-install.pth', u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages/easy-install.pth')
1b
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Because easy-install.pth already exists and hit is trying to link in?

On Thu, Oct 3, 2013 at 2:40 PM, Ondřej Čertík [email protected]:

I think I've nailed it. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.pyindex c7cc77b..0b970ce 100644--- a/hashdist/core/links.py+++ b/hashdist/core/links.py@@ -254,19 +254,29 @@ def execute_links_dsl(rules, env={}, launcher_program=None

 logger : Logger

 """+    print "I am here"
 actions = dry_run_links_dsl(rules, env)
 for action in actions:
     action_desc = "%s%r" % (action[0].__name__, action[1:])
     try:+            print "1"
  •        print action
    
         if action[0] is make_launcher:
    
  •            print "1a"
    
             make_launcher(*action[1:], launcher_program=launcher_program)
         else:
    
  •            print "1b"
    
             action[0](*action[1:])+            print "2"
         logger.debug(action_desc)+            print "3"
     except OSError, e:
         # improve error message to include operation attempted+            print "exception 1"
         msg = str(e) + " in " + action_desc
         logger.error(msg)
         exc_type, exc_val, exc_tb = sys.exc_info()+            print "exception 2"
         raise OSError, OSError(e.errno, msg), exc_tb+    print "OK"
    

produces

I am here
1
(<function silent_makedirs at 0x1dc48c0>, u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages')
1b
2
3
1
(<function silent_copy at 0x1dc4758>, '/auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/easy-install.pth', u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages/easy-install.pth')
1b
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25646701
.

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

That would be a bug :)

Also, why is /tmp getting wiped out? We should definitely have the control to never delete anything that hashdist does. Perhaps we should raise separate issues now that we've identified the problems?

@certik
Copy link
Member Author

certik commented Oct 3, 2013

I think the egg for python-readline needs to be unpacked:

diff --git a/packages.yml.linux b/packages.yml.linux
index 5672fd0..fe859ca 100644
--- a/packages.yml.linux
+++ b/packages.yml.linux
@@ -20,6 +20,7 @@
   recipe: distutils
   url: https://pypi.python.org/packages/source/r/readline/readline-6.2.4.1.tar.
   key: tar.gz:4ahynyb57zjopukqftwfyzahbmzgehef
+  unpack_egg: true
   deps: [python, distribute]

 - package: pyzmq

Then it works!!!

@certik
Copy link
Member Author

certik commented Oct 3, 2013

Ok. So the python-readline package was broken, because it provided duplicated files (easy-install.pth, site.py, ...):

$ ll /auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/
total 24
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:22 ./
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:21 ../
-r--r--r-- 1 ondrej cnls  227 Oct  2 17:22 easy-install.pth
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:22 readline-6.2.4.1-py2.7-linux-x86_64.egg/
-r--r--r-- 1 ondrej cnls 2418 Oct  2 17:22 site.py
-r--r--r-- 1 ondrej cnls 1815 Oct  2 17:22 site.pyc

After the fix:

$ ll /netscratch/ondrej/opt/python-readline/tyko/lib/python2.7/site-packages/
total 728
dr-xr-xr-x 3 ondrej cnls   4096 Oct  3 12:44 ./
dr-xr-xr-x 3 ondrej cnls   4096 Oct  3 12:43 ../
dr-xr-xr-x 2 ondrej cnls   4096 Oct  3 12:44 readline-6.2.4.1-py2.7.egg-info/
-r-xr-xr-x 1 ondrej cnls 729511 Oct  3 12:44 readline.so*

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Why would the readline installer try to hijack easy-install.pth or site.py?
What are the contents of those files? I can only assume that it creates
both of them if they don't exist (probably unintentionally).

Anyway, thanks for chasing this one down @certik and sorry I wasn't more
help.

On Thu, Oct 3, 2013 at 2:49 PM, Ondřej Čertík [email protected]:

Ok. So the python-readline package was broken, because it provided
duplicated files (easy-install.pth, site.py, ...):

$ ll /auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/
total 24
dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:22 ./
dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:21 ../
-r--r--r-- 1 ondrej cnls 227 Oct 2 17:22 easy-install.pth
dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:22 readline-6.2.4.1-py2.7-linux-x86_64.egg/
-r--r--r-- 1 ondrej cnls 2418 Oct 2 17:22 site.py
-r--r--r-- 1 ondrej cnls 1815 Oct 2 17:22 site.pyc

After the fix:

$ ll /netscratch/ondrej/opt/python-readline/tyko/lib/python2.7/site-packages/
total 728
dr-xr-xr-x 3 ondrej cnls 4096 Oct 3 12:44 ./
dr-xr-xr-x 3 ondrej cnls 4096 Oct 3 12:43 ../
dr-xr-xr-x 2 ondrej cnls 4096 Oct 3 12:44 readline-6.2.4.1-py2.7.egg-info/
-r-xr-xr-x 1 ondrej cnls 729511 Oct 3 12:44 readline.so*


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25647447
.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

Why would the readline installer try to hijack easy-install.pth or site.py?

That's done by setuptools, resp. distribute, so that the egg can be imported by Python automagically.

What are the contents of those files?

Just import hooks for this specific package. So each setuptools package has a specific hook in it --- and so it only works if you install things into an existing profile using setup.py, because the contents gets properly added to the single site.py file... But if you install like us, then the only sane way is to unpack the egg and use the old style import.

I can only assume that it creates both of them if they don't exist (probably unintentionally).

No, it is very intentional.

But when I remove the "copy" hack, i.e. use symlinks, then it still fails with:

I am here
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed

@certik
Copy link
Member Author

certik commented Oct 3, 2013

So for the last problem, we need to apply:

--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -219,16 +219,24 @@ def dry_run_links_dsl(rules, env={}):
         where `func` is one of `os.symlink`, :func:`silent_makedirs`,
         `shutil.copyfile`.
     """
+    print "X1"
     assert os.path.sep == '/'
+    print "X2"
     actions = []
     excluded = set()
     makedirs_cache = set()
+    print "X3"
     for rule in rules:
+        print "X4"
         if 'select' in rule:
+            print "X5a"
             _glob_actions(rule, excluded, makedirs_cache, env, actions)
         else:
+            print "X5b"
             _single_action(rule, excluded, makedirs_cache, env, actions)
+        print "X6"

+    print "X7"
     return actions

and we get

I am here
X1
X2
X3
X4
X5a
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed

So this line fails:

            _glob_actions(rule, excluded, makedirs_cache, env, actions)

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Python Eggs were such a terrible idea...

I assume you are doing a similar egg-install with matplotlib?

On Thu, Oct 3, 2013 at 2:55 PM, Ondřej Čertík [email protected]:

Why would the readline installer try to hijack easy-install.pth or site.py?

That's done by setuptools, resp. distribute, so that the egg can be
imported by Python automagically.

What are the contents of those files?

Just import hooks for this specific package. So each setuptools package
has a specific hook in it --- and so it only works if you install things
into an existing profile using setup.py, because the contents get
properly added to the single site.py file... But if you install like us,
the only sane way is to unpack the egg and use the old style import.

I can only assume that it creates both of them if they don't exist
(probably unintentionally).

No, it is very intentional.

But when I remove the "copy" hack, i.e. use symlinks, then it still fails
with:

I am here
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25647946
.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

We unpack all eggs. That way one can install things like mayavi and so on. Matplotlib does not use eggs.

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

I tend to only install from source. I don't understand why Mayavi would be
an exception.

On Thu, Oct 3, 2013 at 3:03 PM, Ondřej Čertík [email protected]:

We unpack all eggs. That way one can install things like mayavi and so on.
Matplotlib does not use eggs.


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25648540
.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

Mayavi and related packages use eggs. You can't install eggs with hashdist. That is unless you use the --copy flag to ./update... ;)

@certik
Copy link
Member Author

certik commented Oct 3, 2013

Ok, how do I enable exception printing in "hit"? It's a pain to debug it...

Now it fails on this line:

        selected.update(ant_iglob(pattern, '', include_dirs=rule.get('dirs', False)))

and pattern is /auto/netscratch/ondrej/opt/matplotlib/om73/lib/python*/site-packages/mpl_toolkits/**. But I don't know what exception it raises...

@certik
Copy link
Member Author

certik commented Oct 3, 2013

It raises:

*** ValueError: ValueError('does not make sense with ** at end of pattern with glob_files',)

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

We can add egg support, but I don't consider it essential right now. I'm
not aware of packages that are distributed as eggs where you can't get the
source.

On Thu, Oct 3, 2013 at 3:06 PM, Ondřej Čertík [email protected]:

Mayavi and related packages use eggs. You can't install eggs with hashdist.


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25648825
.

@dagss
Copy link
Member

dagss commented Oct 3, 2013

You want ....mpl_toolkits/**/*.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

Yeah, I just realized:

diff --git a/builder/recipes.py b/builder/recipes.py
index 3140720..121abdf 100644
--- a/builder/recipes.py
+++ b/builder/recipes.py
@@ -26,11 +26,11 @@ def add_profile_install(ctx, pkg_attrs, build_spec):

     rules += [
         {"action": "relative_symlink",
-         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**",
+         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**/*",
          "prefix": "$ARTIFACT",
          "target": "$PROFILE"},
         {"action": "exclude",
-         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**"},
+         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**/*"},
         {"action": "relative_symlink",
          "select": "$ARTIFACT/lib/python*/site-packages/*",
          "prefix": "$ARTIFACT",

@dagss --- how do we enable proper exception printing? At least to a log file. This debugging is a madness.

@dagss
Copy link
Member

dagss commented Oct 3, 2013

Some places there's more printing with DEBUG=1.

In general, adding patches to do more printing is fair game. There's many places to add such printing, I don't know if it makes sense that I try to anticipate it, it's much easier if you add the printing where you need it to be.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

There should be a printing to a log file when exception occurs and this log file should stay around. Currently the exception gets swallowed.

@ahmadia
Copy link

ahmadia commented Oct 3, 2013

Agreed on logging swallowed exceptions.

On Thu, Oct 3, 2013 at 3:29 PM, Ondřej Čertík [email protected]:

There should be a printing to a log file when exception occurs and this
log file should stay around. Currently the exception gets swallowed.


Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25650592
.

@certik
Copy link
Member Author

certik commented Oct 3, 2013

See #113 for the exceptions logging.

This issue has been fixed by ec04a41 and aebffc5.

@certik certik closed this as completed Oct 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants