-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyperparameter search error with Ray tune #27598
Comments
Hello, this PR #26499 might fix this issue. Could you please try it out and let us know? |
I tried running the notebook with the PR, however, i found a different error now: 2023-11-20 16:02:53,411 INFO worker.py:1673 -- Started a local Ray instance.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py](https://localhost:8080/#) in put_object(self, value, object_ref, owner_address)
702 try:
--> 703 serialized_value = self.get_serialization_context().serialize(value)
704 except TypeError as e:
18 frames
[/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py](https://localhost:8080/#) in serialize(self, value)
493 else:
--> 494 return self._serialize_to_msgpack(value)
[/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py](https://localhost:8080/#) in _serialize_to_msgpack(self, value)
471 metadata = ray_constants.OBJECT_METADATA_TYPE_PYTHON
--> 472 pickle5_serialized_object = self._serialize_to_pickle5(
473 metadata, python_objects
[/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py](https://localhost:8080/#) in _serialize_to_pickle5(self, metadata, value)
424 self.get_and_clear_contained_object_refs()
--> 425 raise e
426 finally:
[/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py](https://localhost:8080/#) in _serialize_to_pickle5(self, metadata, value)
419 self.set_in_band_serialization()
--> 420 inband = pickle.dumps(
421 value, protocol=5, buffer_callback=writer.buffer_callback
[/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dumps(obj, protocol, buffer_callback)
87 cp = CloudPickler(file, protocol=protocol, buffer_callback=buffer_callback)
---> 88 cp.dump(obj)
89 return file.getvalue()
[/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle_fast.py](https://localhost:8080/#) in dump(self, obj)
732 try:
--> 733 return Pickler.dump(self, obj)
734 except RuntimeError as e:
TypeError: cannot pickle '_thread.lock' object
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
[<ipython-input-38-12c3f54763db>](https://localhost:8080/#) in <cell line: 1>()
----> 1 best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")
[/content/transformers/src/transformers/trainer.py](https://localhost:8080/#) in hyperparameter_search(self, hp_space, compute_objective, n_trials, direction, backend, hp_name, **kwargs)
2548 self.compute_objective = default_compute_objective if compute_objective is None else compute_objective
2549
-> 2550 best_run = backend_obj.run(self, n_trials, direction, **kwargs)
2551
2552 self.hp_search_backend = None
[/content/transformers/src/transformers/hyperparameter_search.py](https://localhost:8080/#) in run(self, trainer, n_trials, direction, **kwargs)
85
86 def run(self, trainer, n_trials: int, direction: str, **kwargs):
---> 87 return run_hp_search_ray(trainer, n_trials, direction, **kwargs)
88
89 def default_hp_space(self, trial):
[/content/transformers/src/transformers/integrations/integration_utils.py](https://localhost:8080/#) in run_hp_search_ray(trainer, n_trials, direction, **kwargs)
352 dynamic_modules_import_trainable.__mixins__ = trainable.__mixins__
353
--> 354 analysis = ray.tune.run(
355 dynamic_modules_import_trainable,
356 config=trainer.hp_space(None),
[/usr/local/lib/python3.10/dist-packages/ray/tune/tune.py](https://localhost:8080/#) in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, storage_path, storage_filesystem, search_alg, scheduler, checkpoint_config, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, reuse_actors, raise_on_failed_trial, callbacks, max_concurrent_trials, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, chdir_to_trial_dir, local_dir, _experiment_checkpoint_dir, _remote, _remote_string_queue, _entrypoint)
509 }
510
--> 511 _ray_auto_init(entrypoint=error_message_map["entrypoint"])
512
513 if _remote is None:
[/usr/local/lib/python3.10/dist-packages/ray/tune/tune.py](https://localhost:8080/#) in _ray_auto_init(entrypoint)
217 logger.info("'TUNE_DISABLE_AUTO_INIT=1' detected.")
218 elif not ray.is_initialized():
--> 219 ray.init()
220 logger.info(
221 "Initializing Ray automatically. "
[/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
101 if func.__name__ != "init" or is_client_mode_enabled_by_default:
102 return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)
104
105 return wrapper
[/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py](https://localhost:8080/#) in init(address, num_cpus, num_gpus, resources, labels, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs)
1700
1701 for hook in _post_init_hooks:
-> 1702 hook()
1703
1704 node_id = global_worker.core_worker.get_current_node_id()
[/usr/local/lib/python3.10/dist-packages/ray/tune/registry.py](https://localhost:8080/#) in flush(self)
306 self.references[k] = v
307 else:
--> 308 self.references[k] = ray.put(v)
309 self.to_flush.clear()
[/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py](https://localhost:8080/#) in auto_init_wrapper(*args, **kwargs)
22 def auto_init_wrapper(*args, **kwargs):
23 auto_init_ray()
---> 24 return fn(*args, **kwargs)
25
26 return auto_init_wrapper
[/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
101 if func.__name__ != "init" or is_client_mode_enabled_by_default:
102 return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)
104
105 return wrapper
[/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py](https://localhost:8080/#) in put(value, _owner)
2634 with profiling.profile("ray.put"):
2635 try:
-> 2636 object_ref = worker.put_object(value, owner_address=serialize_owner_address)
2637 except ObjectStoreFullError:
2638 logger.info(
[/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py](https://localhost:8080/#) in put_object(self, value, object_ref, owner_address)
710 f"{sio.getvalue()}"
711 )
--> 712 raise TypeError(msg) from e
713 # This *must* be the first place that we construct this python
714 # ObjectRef because an entry with 0 local references is created when
TypeError: Could not serialize the put value <transformers.trainer.Trainer object at 0x7e90dd830340>:
================================================================================
Checking Serializability of <transformers.trainer.Trainer object at 0x7e90dd830340>
================================================================================
!!! FAIL serialization: cannot pickle '_thread.lock' object
Serializing 'compute_metrics' <function compute_metrics at 0x7e90dd9123b0>...
!!! FAIL serialization: cannot pickle '_thread.lock' object
Detected 3 global variables. Checking serializability...
Serializing 'task' cola...
Serializing 'np' <module 'numpy' from '/usr/local/lib/python3.10/dist-packages/numpy/__init__.py'>...
Serializing 'metric' Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
predictions: list of predictions to score.
Each translation should be tokenized into a list of tokens.
references: list of lists of references for each translation.
Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
"accuracy": Accuracy
"f1": F1 score
"pearson": Pearson Correlation
"spearmanr": Spearman Correlation
"matthews_correlation": Matthew Correlation
Examples:
>>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0, 'f1': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'stsb')
>>> references = [0., 1., 2., 3., 4., 5.]
>>> predictions = [0., 1., 2., 3., 4., 5.]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print({"pearson": round(results["pearson"], 2), "spearmanr": round(results["spearmanr"], 2)})
{'pearson': 1.0, 'spearmanr': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'cola')
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'matthews_correlation': 1.0}
""", stored examples: 0)...
!!! FAIL serialization: cannot pickle '_thread.lock' object
Serializing '_build_data_dir' <bound method Metric._build_data_dir of Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
predictions: list of predictions to score.
Each translation should be tokenized into a list of tokens.
references: list of lists of references for each translation.
Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
"accuracy": Accuracy
"f1": F1 score
"pearson": Pearson Correlation
"spearmanr": Spearman Correlation
"matthews_correlation": Matthew Correlation
Examples:
>>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0, 'f1': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'stsb')
>>> references = [0., 1., 2., 3., 4., 5.]
>>> predictions = [0., 1., 2., 3., 4., 5.]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print({"pearson": round(results["pearson"], 2), "spearmanr": round(results["spearmanr"], 2)})
{'pearson': 1.0, 'spearmanr': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'cola')
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'matthews_correlation': 1.0}
""", stored examples: 0)>...
!!! FAIL serialization: cannot pickle '_thread.lock' object
Serializing '_add_sm_patterns_to_gitignore' <bound method Trainer._add_sm_patterns_to_gitignore of <transformers.trainer.Trainer object at 0x7e90dd830340>>...
!!! FAIL serialization: cannot pickle '_thread.lock' object
Serializing '__func__' <function Trainer._add_sm_patterns_to_gitignore at 0x7e90dd95d7e0>...
WARNING: Did not find non-serializable object in <bound method Trainer._add_sm_patterns_to_gitignore of <transformers.trainer.Trainer object at 0x7e90dd830340>>. This may be an oversight.
================================================================================
Variable:
FailTuple(_build_data_dir [obj=<bound method Metric._build_data_dir of Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
predictions: list of predictions to score.
Each translation should be tokenized into a list of tokens.
references: list of lists of references for each translation.
Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
"accuracy": Accuracy
"f1": F1 score
"pearson": Pearson Correlation
"spearmanr": Spearman Correlation
"matthews_correlation": Matthew Correlation
Examples:
>>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0, 'f1': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'stsb')
>>> references = [0., 1., 2., 3., 4., 5.]
>>> predictions = [0., 1., 2., 3., 4., 5.]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print({"pearson": round(results["pearson"], 2), "spearmanr": round(results["spearmanr"], 2)})
{'pearson': 1.0, 'spearmanr': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'cola')
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'matthews_correlation': 1.0}
""", stored examples: 0)>, parent=Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
predictions: list of predictions to score.
Each translation should be tokenized into a list of tokens.
references: list of lists of references for each translation.
Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
"accuracy": Accuracy
"f1": F1 score
"pearson": Pearson Correlation
"spearmanr": Spearman Correlation
"matthews_correlation": Matthew Correlation
Examples:
>>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'accuracy': 1.0, 'f1': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'stsb')
>>> references = [0., 1., 2., 3., 4., 5.]
>>> predictions = [0., 1., 2., 3., 4., 5.]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print({"pearson": round(results["pearson"], 2), "spearmanr": round(results["spearmanr"], 2)})
{'pearson': 1.0, 'spearmanr': 1.0}
>>> glue_metric = datasets.load_metric('glue', 'cola')
>>> references = [0, 1]
>>> predictions = [0, 1]
>>> results = glue_metric.compute(predictions=predictions, references=references)
>>> print(results)
{'matthews_correlation': 1.0}
""", stored examples: 0)])
was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
================================================================================
Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
================================================================================ The deprecation error has been fixed. |
What's the above error related to ? |
Hey @Shamik-07, the Ray Tune integration serializes the HuggingFace Trainer along with your remote function. In this case, a non-serializable To fix it: def compute_metrics(eval_pred):
predictions, labels = eval_pred
if task != "stsb":
predictions = np.argmax(predictions, axis=1)
else:
predictions = predictions[:, 0]
+ metric = load_metric('glue', actual_task) # load the metric inside the method, instead of implicitly pickling it
return metric.compute(predictions=predictions, references=labels) |
Thank you very much for the explanation @justinvyu :) |
Closing this as this has been fixed by #26499 thanks to @justinvyu |
System Info
Hello Someone,
The version of Ray is 2.8.0 and the version of Transformers is 4.35.2
I am trying to run the hyperparameter search for this notebook with ray tune notebooks/examples/text_classification.ipynb at main · huggingface/notebooks (github.com)
and getting the following error:
Who can help?
@muellerzr / @pacman100
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Runnning the hyperparameter search with ray tune.
Expected behavior
Hyperparameter trials with ray tune
The text was updated successfully, but these errors were encountered: