Skip to content

Commit

Permalink
Merge pull request #185 from camsellem/master
Browse files Browse the repository at this point in the history
Doc fix for Python script task - {{ outputDir }} is deprecated
  • Loading branch information
anna-geller authored Nov 6, 2024
2 parents 9f4b88c + 87ce79f commit 0f356b6
Showing 1 changed file with 12 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -111,16 +111,18 @@ with open('{{ outputs.previousTaskId.uri }}', 'r') as f:
tasks:
- id: python
type: io.kestra.plugin.scripts.python.Script
outputFiles:
- "myfile.txt"
script: |
f = open("{{ outputDir }}/myfile.txt", "a")
f.write("Hello from a Kestra task!")
f.close()
f = open("myfile.txt", "a")
f.write("Hello from a Kestra task!")
f.close()
"""
),
@Example(
full = true,
title = """
If you want to generate files in your script to make them available for download and use in downstream tasks, you can leverage the `{{outputDir}}` expression. Files stored in that directory will be persisted in Kestra's internal storage. The first task in this example creates a file `'myfile.txt'` and the next task can access it by leveraging the syntax `{{outputs.yourTaskId.outputFiles['yourFileName.fileExtension']}}`.
If you want to generate files in your script to make them available for download and use in downstream tasks, you can leverage the `outputFiles` property as shown in the example above. Files will be persisted in Kestra's internal storage. The first task in this example creates a file `'clean_dataset.csv'` and the next task can access it by leveraging the syntax `{{outputs.yourTaskId.outputFiles['yourFileName.fileExtension']}}`.
""",
code = """
id: python_outputs
Expand All @@ -130,19 +132,21 @@ with open('{{ outputs.previousTaskId.uri }}', 'r') as f:
- id: clean_dataset
type: io.kestra.plugin.scripts.python.Script
containerImage: ghcr.io/kestra-io/pydata:latest
outputFiles:
- "clean_dataset.csv"
script: |
import pandas as pd
df = pd.read_csv("https://huggingface.co/datasets/kestra/datasets/raw/main/csv/messy_dataset.csv")
# Replace non-numeric age values with NaN
df["Age"] = pd.to_numeric(df["Age"], errors="coerce")
# mean imputation: fill NaN values with the mean age
mean_age = int(df["Age"].mean())
print(f"Filling NULL values with mean: {mean_age}")
df["Age"] = df["Age"].fillna(mean_age)
df.to_csv("{{ outputDir }}/clean_dataset.csv", index=False)
df.to_csv("clean_dataset.csv", index=False)
- id: readFileFromPython
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
Expand Down

0 comments on commit 0f356b6

Please sign in to comment.