Skip to content
This repository has been archived by the owner on Jun 25, 2022. It is now read-only.

greta is not working #18

Open
ignacio82 opened this issue Apr 19, 2019 · 8 comments
Open

greta is not working #18

ignacio82 opened this issue Apr 19, 2019 · 8 comments

Comments

@ignacio82
Copy link

I was trying to play with greta using this container but I'm getting an error. This is what I am doing:

nvidia-docker run -it rocker/ml-gpu:latest bash

root@7dc3309926d4:/# nvidia-smi
Fri Apr 19 12:25:12 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| 45%   42C    P0    27W / 120W |   1382MiB /  6076MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

root@7dc3309926d4:/# R

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- iris$Petal.Length
> y <- iris$Sepal.Length
> library(greta)

Attaching package: 'greta'

The following objects are masked from 'package:stats':

    binomial, poisson

The following objects are masked from 'package:base':

    %*%, backsolve, beta, colMeans, colSums, diag, forwardsolve, gamma,
    rowMeans, rowSums, sweep

> int <- normal(0, 5)
> coef <- normal(0, 3)
> sd <- lognormal(0, 3)
> mean <- int + coef * x
> distribution(y) <- normal(mean, sd)
> m <- model(int, coef, sd)
> draws <- mcmc(m, n_samples = 1000)

/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
Error: greta hit a tensorflow error:

Error in py_call_impl(callable, dots$args, dots$keywords): NotFoundError: ./libdevice.compute_30.10.bc not found
	 [[{{node cluster_0_1/xla_compile}} = _XlaCompile[Nresources=0, Targs=[DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE, DT_DOUBLE], Tconstants=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32], function=cluster_0[_XlaCompiledKernel=true, _XlaNumConstantArgs=6, _XlaNumResourceArgs=0], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Const, Tile_3/multiples/1, Reshape/shape, strided_slice_3/stack, strided_slice_3/stack_1, Sum_1/reduction_indices, _arg_Placeholder_0_0/_3, _arg_Placeholder_1_0_1/_5, _arg_Placeholder_2_0_2/_7, _arg_Placeholder_3_0_3/_9, _arg_Placeholder_4_0_4/_11, _arg_Placeholder_5_0_5/_13, _arg_Placeholder_6_0_6/_15, _arg_Placeholder_7_0_7/_17, _arg_Placeholder_8_0_8/_19)]]
	 [[{{node cluster_0_1/xla_run/_1}} = _Recv[client_terminated=false, recv_device="/job:localh


@cboettig
Copy link
Member

thanks for the report, I'll take a look.

@cboettig
Copy link
Member

cboettig commented Apr 20, 2019

hmm... we can solve the errors such as NotFoundError: ./libdevice.compute_30.10.bc not found by copying /usr/local/cuda-9.0 from the rocker/cuda-dev image, but then I seem to be running up against https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc#L485-L489 instead.

Not exactly clear to me how to cherrypick ptxas 9.2.88 though.

Bumping all of cuda to 9.2.88 seems to break tensorflow, as it looks like the binaries installed by pip (for 0.12.0) are build only for cuda 9.0.

A second error I encounter, e.g. via either the virtualenv install route or in building on tensorflow/tensorflow:1.13.1-gpu-py3 is ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32. Longer trace below.

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32: 'Tensor("Placeholder_13:0", dtype=int32)'

Detailed traceback: 
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_probability/python/mcmc/sample.py", line 216, in sample_chain
    name="num_steps_between_results")
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 977, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))

still digging...

@goldingn
Copy link

I think that last error just means you have the CRAN release of greta, but need the current GitHub version.

Something changed in the most recent Tensorflow Probability release, and the greta-side patch hasn't yet made its way to CRAN.

@cboettig
Copy link
Member

@goldingn thanks Nick, that's the ticket!

@ignacio82 Once rocker/tensorflow-gpu builds (probably by tomorrow, or just docker build locally), you should be able to do a remotes::install_github("greta-dev/greta") and then gpu-accelerated greta should be working now.

Thanks again for the bug report, hadn't gotten around to testing greta, it's still somewhat early days for these ML images.

@ignacio82
Copy link
Author

Thanks! A couple of question:

  1. You said to use rocker/tensorflow-gpu but I think i should use rocker/ml-gpu:latest. With the former i got a mesage saying that i needed to install tensor flow probability. Is that right or should I use rocker/tensorflow-gpu ?
  2. Although greta seems to be working, I am getting the following message:
/usr/local/lib/python3.5/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)

Is this a problem that the greta developers need to fix?

@cboettig
Copy link
Member

@ignacio82 Right, I moved tensorflow-probability into the tensorflow image now since it seemed more logical to keep those together, but the latest rocker/tensorflow-gpu instance hasn't finished building. We're still figuring out the right organizational modularity.

Re the DeprecationWarning, yeah, I see that too, @goldingn can probably give us more insight on that but I don't think it's much of a problem.

@pbhogale
Copy link

pbhogale commented Nov 6, 2019

Not sure this ought to be a different error or not, but I get a strange error when trying greta with the ml-gpu container.

remotes::install_github("greta-dev/greta")
rm(list=ls())
library(reticulate)
py_discover_config()
use_python("/opt/virtualenvs/r-tensorflow/bin/python")
use_virtualenv("/opt/virtualenvs/r-tensorflow/", required=T)
library(greta)
library(DiagrammeR)
library(bayesplot)
library(tidyverse)

length_of_data <- 100
sd_eps <- pi^exp(1)
intercept <- -5.0
slope <- pi
x <- seq(-10*pi, 10*pi, length.out = length_of_data)
y <- intercept + slope*x + rnorm(n = length_of_data, mean = 0, sd = sd_eps)
data <- data_frame(y = y, x = x)

intercept_p <- uniform(-10, 10)
sd_eps_p <- uniform(0, 50)
slope_p <- uniform(0, 10)


mean_y <- intercept_p+slope_p*x
distribution(y) <- normal(mean_y, sd_eps_p)
our_model <- model(intercept_p, slope_p, sd_eps_p)

num_samples <- 1000
param_draws <- mcmc(our_model, n_samples = num_samples, warmup = num_samples / 10)

that gives the error

Error in py_call_impl(callable, dots$args, dots$keywords) :
 ValueError: Tensor conversion requested dtype int64 for Tensor with dtype int32: 
'Tensor("Placeholder_13:0", dtype=int32)'

@cboettig
Copy link
Member

cboettig commented Nov 6, 2019

So greta requires pretty careful coordination between versions of CUDA, tensorflow, and greta itself. I think this particular is due to using the most recent dev version of greta with an older tensorflow (see greta-dev/greta#248).

We're still exploring the best way to help users triangulate these versions. (The current tensorflow-gpu image is iirc still on cuda 9.0, which is too old for tensorflow > 1.13 which is required for greta > 0.3.0 or so? don't quote me on those versions).

Can you try testing on rocker/ml:cuda-10.0? (Note that it should already have greta installed).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants