Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cannot run unit tests against Spark/Hudi, receiving "NoneType is not iterable" error #1047

Open
2 tasks done
KLarrabee-Arcadia opened this issue May 22, 2024 · 4 comments
Labels
bug Something isn't working help_wanted Extra attention is needed unit tests Issues related to built-in dbt unit testing functionality

Comments

@KLarrabee-Arcadia
Copy link

KLarrabee-Arcadia commented May 22, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We are attempting to run DBT unit tests against a local dockerized Spark/Hudi container, and while dbt run is perfectly successful, we cannot for the life of us get dbt test to work properly despite following docs and searching through issues. Anything we do (assuming the test spec is valid - otherwise we appropriately get hints like "need given", etc) just results in 'NoneType' object is not iterable errors:

17:34:38    Runtime Error in unit_test test_characters (models/unit_tests/test_characters.yml)
  An error occurred during execution of unit test 'test_characters'. There may be an error in the unit test definition: check the data types.
   Compilation Error
    'NoneType' object is not iterable

One of my coworkers was successfully running the unit tests against DuckDB, so we assumed the test specs were fine.

Expected Behavior

The test should pass/fail rather than error out.

Steps To Reproduce

I created a very simple repository where (1) dbt run is successful and (2) dbt test results in the error above (see repo README for steps to reproduce success and failure).

Relevant log output

After locally adding import traceback; print(traceback.format_exc()) right before this line I observed the following stack trace:

16:28:21  Traceback (most recent call last):
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 368, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 314, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 415, in run
    return self.execute(compiled_node, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/test.py", line 265, in execute
    unit_test_node, unit_test_result = self.execute_unit_test(test, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/test.py", line 225, in execute_unit_test
    macro_func()
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 61, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 33, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 52, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 350, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/sql/connections.py", line 159, in execute
    table = self.get_result_from_cursor(cursor, limit)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/sql/connections.py", line 141, in get_result_from_cursor
    rows = cursor.fetchall()
  File "/Users/kevinlarrabee/projects/pydeps/dbt-spark/dbt/adapters/spark/connections.py", line 251, in fetchall
    return self._cursor.fetchall()
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 142, in fetchall
    return list(iter(self.fetchone, None))
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 111, in fetchone
    self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 51, in _fetch_while
    self._fetch_more()
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/hive.py", line 507, in _fetch_more
    zip(response.results.columns, schema)]
TypeError: 'NoneType' object is not iterable

Is there some configuration that I missed for unit tests that is making this fail via PyHive, Thrift, etc?

When I look at the compiled SQL for the unit tests in the targets/ folder, it does indeed create valid SQL that, when I run manually in beeline, returns expected results.

Environment

- OS: macOS 14.5
- Python: 3.10
- dbt: 1.8

Which database adapter are you using with dbt?

spark

Additional Context

This is specific to unit tests, since data tests run perfectly fine.

@KLarrabee-Arcadia KLarrabee-Arcadia added bug Something isn't working triage labels May 22, 2024
@dbeatty10
Copy link
Contributor

Thanks for raising this issue @KLarrabee-Arcadia.

To help narrow down the issue, could you try out this "hello world" example to see if it works for you?

models/hello_world.sql

select 'world' as hello

models/_properties.yml

unit_tests:
  - name: test_hello_world
    model: hello_world
    given: []
    expect:
      rows:
        - {hello: world}

Run this command to execute the unit tests and then build the model if they pass:

dbt build --select hello_world

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core May 22, 2024
@dbeatty10 dbeatty10 added awaiting_response unit tests Issues related to built-in dbt unit testing functionality and removed triage labels May 22, 2024
@KLarrabee-Arcadia
Copy link
Author

KLarrabee-Arcadia commented May 22, 2024

Thanks for responding @dbeatty10! I just added it and see the same error:

❯ docker compose run dbt dbt build --select hello_world

WARN[0000] /Users/kevinlarrabee/projects/dbt-unit-test-example/docker-compose.yml: `version` is obsolete
18:33:27  Running with dbt=1.8.0
18:33:27  Registered adapter: spark=1.8.0
18:33:27  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.int_customers_per_store
18:33:27  Found 3 models, 8 data tests, 453 macros, 3 unit tests
18:33:27
18:33:32  Concurrency: 1 threads (target='dev')
18:33:32
18:33:32  1 of 2 START unit_test hello_world::test_hello_world ........................... [RUN]
18:33:37  1 of 2 ERROR hello_world::test_hello_world ..................................... [ERROR in 5.11s]
18:33:37  2 of 2 SKIP relation hudi_dbt.hello_world ...................................... [SKIP]
18:33:43
18:33:43  Finished running 1 unit test, 1 view model in 0 hours 0 minutes and 15.35 seconds (15.35s).
18:33:43
18:33:43  Completed with 1 error and 0 warnings:
18:33:43
18:33:43    Runtime Error in unit_test test_hello_world (models/_properties.yml)
  An error occurred during execution of unit test 'test_hello_world'. There may be an error in the unit test definition: check the data types.
   Compilation Error
    'NoneType' object is not iterable

    > in macro run_query (macros/etc/statement.sql)
    > called by macro materialization_unit_default (macros/materializations/tests/unit.sql)
    > called by <Unknown>
18:33:43
18:33:43  Done. PASS=0 WARN=0 ERROR=1 SKIP=1 TOTAL=2

KLarrabee-Arcadia referenced this issue in KLarrabee-Arcadia/dbt-unit-test-example May 22, 2024
@KLarrabee-Arcadia
Copy link
Author

KLarrabee-Arcadia commented May 23, 2024

@dbeatty10 I also set up a different branch in that example repo that has two different Spark backends, one configured for Hudi and one with a default configuration, as well as a Postgre backend.

Running dbt test shows the same NoneType error as reported against either of the Spark backends, but (as a sanity check) it does work against the Postgres backend:

$ docker compose run --build dbt dbt test --target postgres

...
17:38:35  8 of 11 PASS unique_professors_name ............................................ [PASS in 5.05s]
17:38:35  9 of 11 START unit_test characters::test_characters ............................ [RUN]
17:38:35  9 of 11 FAIL 1 characters::test_characters ..................................... [FAIL 1 in 0.07s]
17:38:35  10 of 11 START unit_test hello_world::test_hello_world ......................... [RUN]
17:38:35  10 of 11 PASS hello_world::test_hello_world .................................... [PASS in 0.02s]
17:38:35  11 of 11 START unit_test professors::test_professors ........................... [RUN]
17:38:35  11 of 11 ERROR professors::test_professors ..................................... [ERROR in 0.02s]
17:38:35
17:38:35  Finished running 8 data tests, 3 unit tests in 0 hours 0 minutes and 10.44 seconds (10.44s).
17:38:35
17:38:35  Completed with 2 errors and 0 warnings:
17:38:35
17:38:35  Failure in unit_test test_characters (models/unit_tests/test_characters.yml)
17:38:35

actual differs from expected:

@@ ,id,name
→  ,1 ,kevin→Philip J. Fry
+++,2 ,Turanga Leela
+++,3 ,Bender Bending Rodríguez
+++,4 ,Prof. Hubert J. Farnsworth
+++,5 ,Professor Ogden Wernstrom

@amychen1776 amychen1776 added help_wanted Extra attention is needed and removed triage labels Aug 27, 2024
@jausanca
Copy link

Hello, did you manage to fix this @KLarrabee-Arcadia? I had unit tests working perfectly with thrift + docker on my local environment but I just started getting this same error after updating my image to use spark 3.5

I don't know if it's related in any way to this past issue but I was able to make it work after replicating the change specified here on the pyhive code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Extra attention is needed unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

4 participants