Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/streamlit #576

Merged
merged 37 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
96e6a31
add plan update logic
dahaipeng Jul 4, 2024
0373d07
adding update task logic
dahaipeng Jul 5, 2024
6c418a3
update datascience assistant logic to achieve better results
dahaipeng Jul 12, 2024
61d6953
Merge branch 'master' into feature/datascience_assistant
dahaipeng Jul 12, 2024
7c1c20b
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 15, 2024
70d844c
add ds tools
dahaipeng Jul 18, 2024
2e54983
add ds tools
dahaipeng Jul 19, 2024
0f42462
update prompt
dahaipeng Jul 22, 2024
dd7d05d
update utils
dahaipeng Jul 22, 2024
8b752a5
update init
dahaipeng Jul 22, 2024
800fc06
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 22, 2024
4f7bd6a
update log
dahaipeng Jul 23, 2024
62c2bed
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 23, 2024
cb05393
delete yml
dahaipeng Jul 23, 2024
5b70701
update ds_assistant
dahaipeng Jul 24, 2024
49c50d2
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 25, 2024
428401a
update ds_assistant
dahaipeng Jul 25, 2024
f6bd4e2
update ds_assistant
dahaipeng Jul 25, 2024
ee8857f
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 26, 2024
1f1be09
update ds_assistant
dahaipeng Jul 26, 2024
e2310b5
fix openapi tool
dahaipeng Jul 26, 2024
458d4d5
Merge branch 'refs/heads/feature/datascience_assistant'
dahaipeng Jul 30, 2024
c536e6d
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 1, 2024
292c5b9
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 1, 2024
67fa1ba
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 5, 2024
84bff9a
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 6, 2024
dd09e85
add streamlit for better visualization
dahaipeng Aug 11, 2024
2b0cb29
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 12, 2024
62162b1
add streamlit for better visualization
dahaipeng Aug 11, 2024
97c5d9f
add streamlit app for better visualization
dahaipeng Aug 12, 2024
6e287e3
Merge remote-tracking branch 'origin/feature/streamlit' into feature/…
dahaipeng Aug 12, 2024
44413fb
fix bug
dahaipeng Aug 12, 2024
8ac12a6
fix bug
dahaipeng Aug 12, 2024
124a4ea
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 13, 2024
8f0cee9
add upload file feature
dahaipeng Aug 14, 2024
dd351d3
add upload file feature
dahaipeng Aug 19, 2024
e87b6ac
Merge branch 'refs/heads/master' into feature/streamlit
dahaipeng Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions apps/datascience_assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Data Science Assistant with Streamlit ⭐
Data Science Assistant (hereinafter referred to as DS Assistant) is a Data Science Assistant developed based on the modelscope-agent framework, which can automatically perform exploratory Data analysis (EDA) in Data Science tasks according to user needs, Data preprocessing, feature engineering, model training, model evaluation and other steps are fully automated.

Detailed information can be found in the [documentation](../../docs/source/agents/data_science_assistant.md).

## Quick Start
Streamlit is a Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.

To run the DS Assistant in streamlit, you need to install additional libraries. You can install it using pip:
```bash
pip install streamlit mistune matplotlib nbconvert
```

Then, you can run the DS Assistant using the following command:
```bash
cd ../../
streamlit run ./apps/datascience_assistant/app.py
```

After running the command, a new tab will open in your default web browser with the DS Assistant running.

you can upload your dataset and write your request.
![img_2.png](../../resources/data_science_assistant_streamlit_1.png)

After submitting your request, DS Assistant will automatically generate a plan for this request.
![img_2.png](../../resources/data_science_assistant_streamlit_4.png)

After that, DS Assistant will automatically excute every task, you can view all of the codes and details in streamlit
![img_3.png](../../resources/data_science_assistant_streamlit_2.png)

After you have finished using the DS Assistant, you can directly convert the running process to a pdf
![img_5.png](../../resources/data_science_assistant_streamlit_3.png)
35 changes: 35 additions & 0 deletions apps/datascience_assistant/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os
import sys

import streamlit as st
from modelscope_agent.agents.data_science_assistant import DataScienceAssistant
from modelscope_agent.tools.metagpt_tools.tool_recommend import \
TypeMatchToolRecommender

current_dir = os.path.dirname(os.path.abspath(__file__))
project_root_path = os.path.abspath(os.path.join(current_dir, '../../'))
sys.path.append(project_root_path)
llm_config = {
'model': 'qwen2-72b-instruct',
'model_server': 'dashscope',
}
os.environ['DASHSCOPE_API_KEY'] = input(
'Please input your dashscope api key: ')
data_science_assistant = DataScienceAssistant(
llm=llm_config, tool_recommender=TypeMatchToolRecommender(tools=['<all>']))
st.title('Data Science Assistant')
st.write(
'This is a data science assistant that can help you with your data science tasks.'
)
st.write(
'Please input your request and upload files then click the submit button.')

files = st.file_uploader(
'Please upload files that you need. ', accept_multiple_files=True)
last_file_name = ''
user_request = st.text_area('User Request')
if st.button('submit'):
for file in files:
with open(file.name, 'wb') as f:
f.write(file.getbuffer())
data_science_assistant.run(user_request=user_request, streamlit=True)
97 changes: 77 additions & 20 deletions modelscope_agent/agents/data_science_assistant.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Implementation inspired by the paper "DATA INTERPRETER: AN LLM AGENT FOR DATA SCIENCE"
import asyncio
import copy
import os
import time
from datetime import datetime
Expand All @@ -18,6 +19,12 @@
from modelscope_agent.utils.logger import agent_logger as logger
from modelscope_agent.utils.utils import parse_code

try:
import streamlit as st # noqa
from nbconvert import HTMLExporter
from traitlets.config import Config
except Exception as e:
print(f'import error: {str(e)}, please install streamlit and nbconvert')
PLAN_TEMPLATE = """
# Context:
{context}
Expand All @@ -28,9 +35,9 @@
- **feature engineering**: Only for creating new columns fo input data.
- **model train**: Only for training model.
- **model evaluate**: Only for evaluating model.
- **ocr**: Only for OCR tasks.
- **other**: Any tasks not in the defined categories


# Task:
Based on the context, write a simple plan or modify an existing plan of what you should do to achieve the goal. A plan \
consists of one to four tasks.
Expand Down Expand Up @@ -226,14 +233,12 @@
these are the previous code blocks, which have been executed successfully in the previous jupyter notebook code blocks \
{previous_code_blocks}

Attention: your response should be one of the following:
- [your step by step thought], correct
- [your step by step thought], incorrect

at the end of your thought, you need to give the final judgement with a new line( correct or incorrect).
don't generate code , just give the reason why the code is correct or incorrect.

## Attention
don't use the word 'incorrect' in your step by step thought.
your answer should be short and clear, don't need to be too long.
"""

CHECK_DATA_PROMPT = """
Expand Down Expand Up @@ -311,6 +316,7 @@ def __init__(self,
self.code_interpreter = CodeInterpreter()
self.plan = None
self.total_token = 0
self.streamlit = False

def _update_plan(self, user_request: str, curr_plan: Plan = None) -> Plan:
call_llm_success = False
Expand All @@ -325,18 +331,26 @@ def _update_plan(self, user_request: str, curr_plan: Plan = None) -> Plan:
}]
while not call_llm_success and call_llm_count < 10:
resp = self._call_llm(prompt=None, messages=messages, stop=None)
resp_streamlit = resp
tasks_text = ''
for r in resp:
tasks_text += r
if self.streamlit:
st.write('#### Generate a plan based on the user request')
tasks_text = st.write_stream(resp_streamlit)
else:
for r in resp:
tasks_text += r
if 'Error code' in tasks_text:
call_llm_count += 1
time.sleep(10)
else:
call_llm_success = True
print('Tasks_text: ', tasks_text)
tasks_text = parse_code(text=tasks_text, lang='json')

logger.info(f'tasks: {tasks_text}')
tasks = json5.loads(tasks_text)
tasks = [Task(**task) for task in tasks]

if curr_plan is None:
new_plan = Plan(goal=user_request)
new_plan.add_tasks(tasks=tasks)
Expand Down Expand Up @@ -429,9 +443,8 @@ def _generate_code(self, code_counter: int, task: Task,
else:
# reflect the error and ask user to fix the code
if self.tool_recommender:
tool_info = asyncio.run(
self.tool_recommender.get_recommended_tool_info(
plan=self.plan))
tool_info = self.tool_recommender.get_recommended_tool_info(
plan=self.plan)
prompt = CODE_USING_TOOLS_REFLECTION_TEMPLATE.format(
instruction=task.instruction,
task_guidance=TaskType.get_type(task.task_type).guidance,
Expand Down Expand Up @@ -555,9 +568,6 @@ def _check_data(self):

def _judge_code(self, task, previous_code_blocks, code,
code_interpreter_resp):
success = True
failed_reason = ''

judge_prompt = JUDGE_TEMPLATE.format(
instruction=task.instruction,
previous_code_blocks=previous_code_blocks,
Expand All @@ -578,13 +588,12 @@ def _judge_code(self, task, previous_code_blocks, code,
self._get_total_tokens()
if 'Error code' in judge_result:
call_llm_count += 1
time.sleep(10)
time.sleep(5)
else:
call_llm_success = True
if not call_llm_success:
raise Exception('call llm failed')
logger.info(f'judge result for task{task.task_id}: \n {judge_result}')

if 'incorrect' in judge_result.split('\n')[-1]:
success = False
failed_reason = (
Expand All @@ -593,11 +602,17 @@ def _judge_code(self, task, previous_code_blocks, code,
return success, failed_reason

else:
return True, 'The code logic is correct'
return True, judge_result

def _run(self, user_request, save: bool = True, **kwargs):
before_time = time.time()
try:
self.streamlit = kwargs.get('streamlit', False)
if self.streamlit:
st.write("""# DataScience Assistant """)
st.write("""### The user request is: \n""")
st.write(user_request)
print('streamlit: ', self.streamlit)
self.plan = self._update_plan(user_request=user_request)
jupyter_file_path = ''
dir_name = ''
Expand All @@ -610,7 +625,9 @@ def _run(self, user_request, save: bool = True, **kwargs):

while self.plan.current_task_id:
task = self.plan.task_map.get(self.plan.current_task_id)
# write_and_execute_code(self)
if self.streamlit:
st.write(
f"""### Task {task.task_id}: {task.instruction}\n""")
logger.info(
f'new task starts: task_{task.task_id} , instruction: {task.instruction}'
)
Expand All @@ -622,7 +639,6 @@ def _run(self, user_request, save: bool = True, **kwargs):
code_execute_success = False
code_logic_success = False
temp_code_interpreter = CodeInterpreter()

temp_code_interpreter.call(
params=json.dumps({
'code':
Expand All @@ -633,26 +649,56 @@ def _run(self, user_request, save: bool = True, **kwargs):
# generate code
code = self._generate_code(code_counter, task,
user_request)
code = '%matplotlib inline \n' + code
code_execute_success, code_interpreter_resp = temp_code_interpreter.call(
params=json.dumps({'code': code}),
nb_mode=True,
silent_mode=True)
# 删除临时 jupyter环境
temp_code_interpreter.terminate()
if self.streamlit:
st.divider()
st_notebook = nbformat.v4.new_notebook()
st_notebook.cells = [
temp_code_interpreter.nb.cells[-1]
]
c = Config()
c.HTMLExporter.preprocessors = [
'nbconvert.preprocessors.ConvertFiguresPreprocessor'
]
# create the new exporter using the custom config
html_exporter_with_figs = HTMLExporter(config=c)
(html, resources_with_fig
) = html_exporter_with_figs.from_notebook_node(
st_notebook)
st.write(
'We have generated the code for the current task')
st.html(html)
judge_resp = ''
if not code_execute_success:
logger.error(
f'code execution failed, task{task.task_id} code_counter{code_counter}:\n '
f'{code_interpreter_resp}')
if self.streamlit:
st.write(
'The code execution failed. Now we will take a reflection and regenerate the code.'
)
else:
logger.info(
f'code execution success, task{task.task_id} code_counter{code_counter}:\n '
f'{code_interpreter_resp}')
if self.streamlit:
st.write(
'The code execution is successful. Now we will ask the judge to check the code.'
)
code_logic_success, judge_resp = self._judge_code(
task=task,
previous_code_blocks=previous_code_blocks,
code=code,
code_interpreter_resp=code_interpreter_resp)
if self.streamlit:
st.write(
'The judge has checked the code, here is the result.'
)
st.write(judge_resp)
success = code_execute_success and code_logic_success
task.code_cells.append(
CodeCell(
Expand All @@ -663,6 +709,10 @@ def _run(self, user_request, save: bool = True, **kwargs):
if success:
self.code_interpreter.call(
params=json.dumps({'code': code}), nb_mode=True)
if self.streamlit:
st.write(
'The code is correct, we will move to the next task.'
)
task.code = code
task.result = code_interpreter_resp
code_counter += 1
Expand Down Expand Up @@ -699,6 +749,13 @@ def _run(self, user_request, save: bool = True, **kwargs):
json.dumps(plan_dict, indent=4, cls=TaskEncoder))
except Exception as e:
print(f'json write error: {str(e)}')
if self.streamlit:
st.divider()
st.write('### We have finished all the tasks! ')
st.balloons()
st.write(
f"""#### The total time cost is: {time_cost}\n #### The total token cost is: {total_token}"""
)

except Exception as e:
logger.error(f'error: {e}')
Expand Down
10 changes: 9 additions & 1 deletion modelscope_agent/tools/metagpt_tools/task_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,13 @@
- Ensure that the evaluated data is same processed as the training data.
- Use trained model from previous task result directly, do not mock or reload model yourself.
"""
OCR_PROMPT = """
The current task is about OCR, please note the following:
- you can follow the following code to get the OCR result:
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('/path/to/the/pic', cls=True) # please replace the path with the real path
"""


class TaskTypeDef(BaseModel):
Expand Down Expand Up @@ -92,7 +99,8 @@ class TaskType(Enum):
desc='Only for evaluating model.',
guidance=MODEL_EVALUATE_PROMPT,
)

OCR = TaskTypeDef(
name='ocr', desc='For performing OCR tasks', guidance=OCR_PROMPT)
OTHER = TaskTypeDef(
name='other', desc='Any tasks not in the defined categories')

Expand Down
Binary file added resources/data_science_assistant_streamlit_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added resources/data_science_assistant_streamlit_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added resources/data_science_assistant_streamlit_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added resources/data_science_assistant_streamlit_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading