Skip to content

Commit

Permalink
update annotation
Browse files Browse the repository at this point in the history
  • Loading branch information
v-chen_data committed Jul 15, 2024
1 parent acc2a70 commit 8950d63
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion llmfoundry/data_prep/convert_delta_to_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -651,9 +651,21 @@ def convert_delta_to_json_from_args(
cluster_id: Optional[str],
use_serverless: bool,
batch_size: int,
processes: int, # type: ignore
processes: int,
json_output_filename: str,
) -> None:
"""A wrapper for `convert_dataset_json` that parses arguments.
Args:
delta_table_name (str): UC table <catalog>.<schema>.<table name>
json_output_folder (str): Local path to save the converted json
http_path (Optional[str]): If set, dbsql method is used
batch_size (int): Row chunks to transmit a time to avoid OOM
processes (int): Number of processes allowed to use
cluster_id (Optional[str]): Cluster ID with runtime newer than 14.1.0 and access mode of either assigned or shared can use databricks-connect
use_serverless (bool): Use serverless or not. Make sure the workspace is entitled with serverless
json_output_filename (str): The name of the combined final jsonl that combines all partitioned jsonl
"""
w = WorkspaceClient()
DATABRICKS_HOST = w.config.host
DATABRICKS_TOKEN = w.config.token
Expand Down

0 comments on commit 8950d63

Please sign in to comment.