Improve pulp_smash.api.poll_task #565

elyezer · 2017-02-26T14:07:26Z

We've seen some cases where for some unknown reason the poll_task won't never timeout. In order to avoid that we could improve the poll_task to use a threading.Timer like:

def poll_task(server_config, href):
    """Wait for a task and its children to complete. Yield response bodies.

    Poll the task at ``href``, waiting for the task to complete. When a
    response is received indicating that the task is complete, yield that
    response body and recursively poll each child task.

    :param server_config: A :class:`pulp_smash.config.ServerConfig` object.
    :param href: The path to a task you'd like to monitor recursively.
    :returns: An generator yielding response bodies.
    :raises pulp_smash.exceptions.TaskTimedOutError: If a task takes too
        long to complete.
    """
    timeout = 1800  # 1800s == 30m
    timer = threading.Timer(timeout, thread.interrupt_main)
    try:
        timer.start()
        while True:
            response = requests.get(
                urljoin(server_config.base_url, href),
                **server_config.get_requests_kwargs()
            )
            response.raise_for_status()
            attrs = response.json()
            if attrs['state'] in _TASK_END_STATES:
                # This task has completed. Yield its final state, then iterate
                # through each of its children and yield their final states.
                yield attrs
                for href in (task['_href'] for task in attrs['spawned_tasks']):
                    for final_task_state in poll_task(server_config, href):
                        yield final_task_state
                break
            sleep(5)
    except KeyboardInterrupt:
        raise exceptions.TaskTimedOutError(
            'Task {} is ongoing after {} seconds.'.format(href, timeout)
        )
    finally:
        timer.cancel()

The above approach is fine but now on Python 3 the thread module was renamed to _thread to encourage the usage of the threading instead. The only usage of the thread here is to interrupt the main thread after the timeout.

Another approach is to use a for loop:

def poll_task(server_config, href):
    """Wait for a task and its children to complete. Yield response bodies.

    Poll the task at ``href``, waiting for the task to complete. When a
    response is received indicating that the task is complete, yield that
    response body and recursively poll each child task.

    :param server_config: A :class:`pulp_smash.config.ServerConfig` object.
    :param href: The path to a task you'd like to monitor recursively.
    :returns: An generator yielding response bodies.
    :raises pulp_smash.exceptions.TaskTimedOutError: If a task takes too
        long to complete.
    """
    # 360 * 5s == 1800s == 30m
    # NOTE: The timeout counter is synchronous. We query Pulp, then count down,
    # then query pulp, then count down, etc. This is… dumb.
    poll_limit = 360
    for _ in range(poll_limit):
        response = requests.get(
            urljoin(server_config.base_url, href),
            **server_config.get_requests_kwargs()
        )
        response.raise_for_status()
        attrs = response.json()
        if attrs['state'] in _TASK_END_STATES:
            # This task has completed. Yield its final state, then iterate
            # through each of its children and yield their final states.
            yield attrs
            for href in (task['_href'] for task in attrs['spawned_tasks']):
                for final_task_state in poll_task(server_config, href):
                    yield final_task_state
            return
        sleep(5)
    raise exceptions.TaskTimedOutError(
        'Task {} is ongoing after {} polls.'.format(href, poll_limit)
    )

This have the same effect as the approach with the while loop but avoid managing the end of the iteration.

The text was updated successfully, but these errors were encountered:

Ichimonji10 · 2017-02-27T15:39:07Z

I think the first approach is great. The problem with the for loop approach is that we synchronously wait for Pulp to respond to our "get status" request before letting the timer run, and we repeatedly do that, meaning that the actual amount of time spent in the for loop can be much longer than 1800s. We don't know the exact reason that poll_task() is sometimes causing issues, but getting rid of unreliable synchronous code seems like a good start.

nixocio added the Issue Type: Plan label Sep 22, 2017

nixocio mentioned this issue Feb 5, 2018

Poll asynchronous tasks more frequently #855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve pulp_smash.api.poll_task #565

Improve pulp_smash.api.poll_task #565

elyezer commented Feb 26, 2017

Ichimonji10 commented Feb 27, 2017

Improve pulp_smash.api.poll_task #565

Improve pulp_smash.api.poll_task #565

Comments

elyezer commented Feb 26, 2017

Ichimonji10 commented Feb 27, 2017