Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]preprocess.sh 1 criteo failed with 'Schema' object has no attribute 'write' #435

Open
SeekPoint opened this issue Dec 9, 2023 · 1 comment
Assignees

Comments

@SeekPoint
Copy link

root@0da37f00b33c:/share/criteo_data# bash preprocess.sh 1 criteo_tmp nvt 0 0 0
Warning: existing criteo_tmp is erased
Preprocessing script: NVTabular
Getting the first few examples from the uncompressed dataset...
Counting the number of samples in day_1 dataset...
The first 45840617 examples will be used in day_1 dataset.
Shuffling dataset...
Preprocessing...
Splitting into 36672493-sample training, 4584062-sample val, and 4584062-sample test datasets...
/usr/local/lib/python3.10/dist-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'
warn(f"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}")
/usr/local/lib/python3.10/dist-packages/merlin/dtypes/mappings/torch.py:43: UserWarning: PyTorch dtype mappings did not load successfully due to an error: No module named 'torch'
warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
2023-12-08 17:36:24,676 NVTabular processing
2023-12-08 17:36:24,699 To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-12-08 17:36:24,702 State start
2023-12-08 17:36:24,712 Scheduler at: tcp://127.0.0.1:40579
2023-12-08 17:36:24,712 dashboard at: 127.0.0.1:8787
2023-12-08 17:36:24,781 Start Nanny at: 'tcp://127.0.0.1:40985'
/usr/local/lib/python3.10/dist-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'
warn(f"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}")
/usr/local/lib/python3.10/dist-packages/merlin/dtypes/mappings/torch.py:43: UserWarning: PyTorch dtype mappings did not load successfully due to an error: No module named 'torch'
warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")
2023-12-08 17:36:26,969 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-12-08 17:36:26,969 Creating preload: dask_cuda.initialize
2023-12-08 17:36:26,969 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-12-08 17:36:26,969 Import preload module: dask_cuda.initialize
2023-12-08 17:36:26,993 Run preload setup: dask_cuda.initialize
2023-12-08 17:36:27,086 Start worker at: tcp://127.0.0.1:46779
2023-12-08 17:36:27,086 Listening to: tcp://127.0.0.1:46779
2023-12-08 17:36:27,086 Worker name: 0
2023-12-08 17:36:27,086 dashboard at: 127.0.0.1:38661
2023-12-08 17:36:27,086 Waiting to connect to: tcp://127.0.0.1:40579
2023-12-08 17:36:27,086 -------------------------------------------------
2023-12-08 17:36:27,086 Threads: 1
2023-12-08 17:36:27,086 Memory: 251.56 GiB
2023-12-08 17:36:27,087 Local Directory: /tmp/dask-worker-space/worker-1fd2e7px
2023-12-08 17:36:27,087 Starting Worker plugin PreImport-047eaac1-e793-4dd5-b59e-bd5028f4f909
2023-12-08 17:36:27,087 Starting Worker plugin RMMSetup-c63a5516-1d21-4b3e-8187-a1f883e8fb5e
2023-12-08 17:36:27,087 Starting Worker plugin CPUAffinity-1188f5ac-00e8-4229-9bea-18511f88e776
2023-12-08 17:36:27,087 -------------------------------------------------
2023-12-08 17:36:27,097 Register worker <WorkerState 'tcp://127.0.0.1:46779', name: 0, status: init, memory: 0, processing: 0>
2023-12-08 17:36:27,103 Starting worker compute stream, tcp://127.0.0.1:46779
2023-12-08 17:36:27,103 Starting established connection to tcp://127.0.0.1:48730
2023-12-08 17:36:27,103 Registered to: tcp://127.0.0.1:40579
2023-12-08 17:36:27,103 -------------------------------------------------
2023-12-08 17:36:27,104 Starting established connection to tcp://127.0.0.1:40579
2023-12-08 17:36:27,142 Receive client connection: Client-508fa7e6-95f0-11ee-8237-0242ac110004
2023-12-08 17:36:27,143 Starting established connection to tcp://127.0.0.1:48732
2023-12-08 17:36:27,147 Run out-of-band function 'reinitialize'
/usr/local/lib/python3.10/dist-packages/merlin/core/utils.py:361: FutureWarning: The client argument is deprecated from DaskExecutor and will be removed in a future version of NVTabular. By default, a global client in the same python context will be detected automatically, and merlin.utils.set_dask_client (as well as Distributed and Serial) can be used for explicit control.
warnings.warn(
2023-12-08 17:36:45,511 Preprocessing
2023-12-08 17:36:45,863 Train Datasets Preprocessing.....
2023-12-08 17:36:46,095 Run out-of-band function 'clean_worker_cache'
2023-12-08 17:36:46,332 Run out-of-band function 'clean_worker_cache'
2023-12-08 17:37:07,465 Run out-of-band function 'clean_worker_cache'
2023-12-08 17:37:08,855 Run out-of-band function 'clean_worker_cache'
Traceback (most recent call last):
File "/share/criteo_data/criteo_script/preprocess_nvt.py", line 418, in
process_NVT(args)
File "/share/criteo_data/criteo_script/preprocess_nvt.py", line 216, in process_NVT
workflow.transform(train_ds_iterator).to_hugectr(
File "/usr/local/lib/python3.10/dist-packages/merlin/io/dataset.py", line 1070, in to_hugectr
self.schema.write(output_path)
AttributeError: 'Schema' object has no attribute 'write'

@jershi425
Copy link
Collaborator

Hi @SeekPoint , thanks for the feedback. We will fix this issue and let you know.

@jershi425 jershi425 self-assigned this Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants