Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining a recursive remote function inside of a regular function #307

Open
robertnishihara opened this issue Jul 27, 2016 · 1 comment
Open
Assignees
Labels

Comments

@robertnishihara
Copy link
Collaborator

I came across this problem while trying to add a test for recursive remote functions. I think the problem occurs because our testing paradigm requires defining remote functions inside of regular functions.

Defining a remote function inside a regular function works.

def f():
  @ray.remote([int], [int])
  def g(x):
    return x
  return ray.get(g(0))
f()  # prints 0

Defining a recursive remote function works.

@ray.remote([int], [int])
def factorial(n):
  if n == 0:
    return 1
  return n * ray.get(factorial(n - 1))
ray.get(factorial(2))  # prints 2

Calling a recursive remote function from another remote function works.

@ray.remote([int], [int])
def getfactorial(n):
  return ray.get(factorial(n))
ray.get(getfactorial(2)) # prints 2

However, defining a recursive remote function inside of a regular function seems to note work.

def getfact():
  @ray.remote([int], [int])
  def fact(n):
    if n == 0:
      return 1
    return n * ray.get(fact(n - 1))

Calling getfact() fails with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/rkn/Workspace/ray/scripts/shell.py in <module>()
----> 1 getfact()

/Users/rkn/Workspace/ray/scripts/shell.py in getfact()
      1 def getfact():
----> 2   @ray.remote([int], [int])
      3   def fact(n):
      4     if n == 0:
      5       return 1

/Users/rkn/Workspace/ray/lib/python/ray/worker.pyc in remote_decorator(func)
    840       func.__globals__[func.__name__] = func_call  # Allow the function to reference itself as a global variable
    841       try:
--> 842         to_export = pickling.dumps((func, arg_types, return_types, func.__module__))
    843       finally:
    844         # Undo our changes

/Users/rkn/Workspace/ray/lib/python/ray/pickling.pyc in dumps(obj)
     17 def dumps(obj):
     18   stringio = cloudpickle.StringIO()
---> 19   dump(obj, stringio)
     20   return stringio.getvalue()
     21 

/Users/rkn/Workspace/ray/lib/python/ray/pickling.pyc in dump(obj, file, protocol)
     13 
     14 def dump(obj, file, protocol=2):
---> 15   return BetterPickler(file, protocol).dump(obj)
     16 
     17 def dumps(obj):

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dump(self, obj)
    105         self.inject_addons()
    106         try:
--> 107             return Pickler.dump(self, obj)
    108         except RuntimeError as e:
    109             if 'recursion' in e.args[0]:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    566         write(MARK)
    567         for element in obj:
--> 568             save(element)
    569 
    570         if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_function(self, obj, name)
    203                 or getattr(obj.__code__, 'co_filename', None) == '<stdin>'
    204                 or themodule is None):
--> 205             self.save_function_tuple(obj)
    206             return
    207         else:

/Users/rkn/Workspace/ray/lib/python/ray/pickling.pyc in save_function_tuple(self, func)
     44 class BetterPickler(CloudPickler):
     45   def save_function_tuple(self, func):
---> 46     code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func)
     47 
     48     self.save(_fill_function)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in extract_func_data(self, func)
    315 
    316         # process closure
--> 317         closure = [c.cell_contents for c in func.__closure__] if func.__closure__ else []
    318 
    319         # save the dict

ValueError: Cell is empty
@mehrdadn
Copy link
Collaborator

mehrdadn commented Jul 28, 2016

This is quite an interesting issue that in fact has nothing to do with pickling itself, but merely the timing of when we are trying to pickle the function.

The issue is that we need fact to be fully defined at the point where remote is called, but in fact, it turns out that this is impossible because Python has not yet finished constructing the closure of fact at that point! (i.e., while it does have a Cell object whose contents are supposed to be func, its contents have not yet been set at the point where we try to pickle it.)

Here's an illustration to show what I mean:

def annotated(func):
  global closure
  closure = func.__closure__[0]
  print closure  # <cell at 0x0000000004027108: empty>
  return func

def getfact():
  @annotated
  def fact(n): return n * fact(n - 1) if n else 1
  return fact
getfact()(3)

print closure    # <cell at 0x0000000004027108: function object at 0x00000000046E5438>

The only solution that i can see? You won't like it, but it's the same thing I've said multiple times in the past -- annotations should merely annotate; they shouldn't have side effects like sending the function to the scheduler. The actual remote definition should occur along with the call itself, and furthermore, the definition and the call should probably be packaged into the same message, not separate messages -- because a separate definition and call doesn't make sense when you consider that different calls might be using the "same" function with a different closure, and they need to work correctly.

I was originally going to close the issue as invalid since this isn't a pickling bug. However, it is a bug, so I'm leaving it open and assigning it back to you so you can finally make remote wait until the invocation. ;)

@mehrdadn mehrdadn assigned robertnishihara and unassigned mehrdadn Jul 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants