Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mathislucka
Copy link
Member

Related Issues

Proposed Changes:

This PR's purpose is to highlight and discuss current issues with the Pipeline.run logic.
It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.

Findings

  1. Insertion order of components and connections influences component execution order, number of times a component is executed and pipeline outputs
  2. Components with only optional inputs run in cases where they did not receive any input from connected components and already ran

(3. at least locally, some tests fail for different reasons with no changes to the code)

How did you test it?

  • behavioral tests

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

return {"noop": None}

pipeline = Pipeline(max_runs_per_component=1)
pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner's output will be swapped.

This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.

"joiner",
"agent_llm",
"router",
"answer_builder"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, rag_prompt and rag_llm will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.

},
expected_run_order=[
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator',
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very similar to the one above but without user input to the second PromptBuilder. The test will fail because feedback_prompt and feedback_llm are executed a third time. feedback_prompt does not receive any inputs but still executes.

inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}},
expected_outputs={
"answer_builder": {
"answers": [GeneratedAnswer(data="valid code", query=task, documents=[])]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very similar to the one above but we changed the order of connections.

    + pp.connect("concatenator.output", "code_prompt.feedback")
    pp.connect("code_prompt.prompt", "code_llm.prompt")
    pp.connect("code_llm.replies", "feedback_prompt.code")
    pp.connect("feedback_llm.replies", "router.replies")
    pp.connect("router.fail", "concatenator.feedback")
    pp.connect("feedback_prompt.prompt", "feedback_llm.prompt")
    pp.connect("router.pass", "answer_builder.replies")
    pp.connect("code_llm.replies", "router.code")
    pp.connect("code_llm.replies", "concatenator.current_prompt")
    - pp.connect("concatenator.output", "code_prompt.feedback")

As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".

Looking into both our code and networkx, I don't understand how these different outcomes can occur.

)

@given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data")
def agent_with_feedback_cycle():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test intends to replicate: #8657

However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'") but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.') instead.

@mathislucka mathislucka marked this pull request as ready for review January 9, 2025 14:49
@mathislucka mathislucka requested a review from a team as a code owner January 9, 2025 14:49
@mathislucka mathislucka requested review from vblagoje and removed request for a team January 9, 2025 14:49
@julian-risch julian-risch requested review from Amnah199 and julian-risch and removed request for vblagoje January 9, 2025 14:49
@mathislucka
Copy link
Member Author

Confirmed point 3 in the CI. Both tests yield different results on different runs.

See here for an example:
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35379360449#step:8:605
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35378636775#step:8:590

@coveralls
Copy link
Collaborator

coveralls commented Jan 9, 2025

Pull Request Test Coverage Report for Build 12693854002

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 9 unchanged lines in 3 files lost coverage.
  • Overall coverage remained the same at 91.094%

Files with Coverage Reduction New Missed Lines %
components/converters/openapi_functions.py 1 78.63%
marshal/yaml.py 1 95.24%
document_stores/in_memory/document_store.py 7 96.03%
Totals Coverage Status
Change from base Build 12689025274: 0.0%
Covered Lines: 8653
Relevant Lines: 9499

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants