Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection fails when trying to run inside a Docker container on AWS/EMR #191

Closed
danfran opened this issue Jul 7, 2021 · 4 comments
Closed
Labels
bug Something isn't working Stale

Comments

@danfran
Copy link

danfran commented Jul 7, 2021

Describe the bug

I have a Docker image based on fishtownanalytics/dbt:0.19.2 where dbt-spark is installed. If I try to connect to an EMR cluster on AWS via thrift server, the connection returns an undefined error.

Steps To Reproduce

  1. Create a Docker image with something like:
FROM fishtownanalytics/dbt:0.19.2

RUN apt-get update && \
 apt-get install libsasl2-dev && \
 pip install "dbt-spark[PyHive]"==0.19.2
  1. Log into the container with: docker run --rm -it --entrypoint bash <your_image>

  2. Create a sample dbt project and change the configuration using spark and pointing to some EMR cluster with enabled thriftserver, like:

my_spark_proj:
  target: dev
  outputs:
    dev:
      type: spark
      method: thrift
      schema: dbt
      host: <emr master node ip>
      port: 10001
      user: hadoop
  1. Run the command dbt config versus it.

At that point the connection will return an error.

Expected behavior

Connection:
  host: x.x.x.x
  port: 10001
  cluster: None
  endpoint: None
  schema: dbt
  organization: 0
  Connection test: ERROR

Screenshots and log output

The output of dbt --version:

  - bigquery: 0.19.2
  - snowflake: 0.19.2
  - redshift: 0.19.2
  - postgres: 0.19.2
  - spark: 0.19.2

The operating system you're using:
MacOS Catalina 10.15.7

The output of python --version:
Python 2.7.16
Python 3.8.10

Additional context

The only workaround I found is building a new image based on ubuntu 20.04 and installing dbt and dbt-spark after.

@danfran danfran added bug Something isn't working triage labels Jul 7, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 7, 2021

@danfran I'm not sure as to the cause of this. I'm glad you were able to get it working in the meantime:

The only workaround I found is building a new image based on ubuntu 20.04 and installing dbt and dbt-spark after.

You should be able to simplify this by installing dbt-spark only. It will include the other packages it needs (namely dbt-core).

@jtcohen6 jtcohen6 removed the triage label Jul 7, 2021
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@PrimOox
Copy link

PrimOox commented Oct 5, 2023

I have this same issue running ghcr.io/dbt-labs/dbt-spark:1.6.0 image, any solution?

@PrimOox
Copy link

PrimOox commented Oct 6, 2023

Found the solution installing all this libs:

libsasl2-dev
build-essential
sasl2-bin
postfix
cyrus-imapd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants