Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

Closed
2 tasks done
oguzhancelik2425 opened this issue May 20, 2024 · 3 comments · Fixed by #1903
Closed
2 tasks done

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

oguzhancelik2425 opened this issue May 20, 2024 · 3 comments · Fixed by #1903
Labels
doc Improvements or additions to documentation external This issue was filed by someone outside of the Morpheus team

Comments

@oguzhancelik2425
Copy link

How would you describe the priority of this documentation request

High

Please provide a link or source to the relevant docs

https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/examples/digital_fingerprinting/starter/README.md

Describe the problems in the documentation

In this file there are 3 given examples for the DFP pipeline which includes cloudtrail, azure aad, and duo message log use cases. However, when I deep dive into the documentation for running examples, the explanation is somehow poor. In the examples of the uses cases above there are some training and validation dataset samples. When I check for the cloudtrail input validation sample dataset, it already has the ae_anomaly_score and ts_anomaly values, but those are calculated by the model should not be in the training validation dataset I assume. Moreover, when I check the train ae model stage, I cannot see the ae_anomlay_score and ts_anomaly calculation, but these are in the hammah-inference.py module in the /models/validation-scripts/dfp-models/ directory. From those information there is no easy way to understand what is the order, or running the pipeline and scripts, what are the training, validation and inference files, and how those anomaly scores are being calculated?

On the other hand, I realized that some directory paths are not updated yet which causes errors in the CLI. For example: the CLI example here shows that --columns_file=morpheus/data/columns_ae_cloudtrail.txt option should read the feature columns txt file but there is no path from /morpheus/ instead it should be models/data/columns_ae_cloudtrail.txt. Similar problem exists in the rest of the README.md file.

(Optional) Propose a correction

I think this tarted example needs to be reviewed in detail and should be given more details to understand how this model works, what resources need to be used for the given examples and what is the order to run. In my current example that I tried to explain above, I understand that user should run the hammah-20211017.ipynb notebook first right after feature selection with the training dataset, then trained the model here in order to get the ae_anomaly_score and ts_anomaly values via running hammah-inference.py module. Then, the user needs to run the dfp pipeline CLI based on the use case's features with the file which is overridden by the previous hammah-inference.py, in order to get reconstruction_loss, z_loss, scores values etc. If I am wrong it is because of poor documentation of the example, unfortunately.

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open documentation issues and have found no duplicates for this bug report
@oguzhancelik2425 oguzhancelik2425 added the doc Improvements or additions to documentation label May 20, 2024
@efajardo-nv
Copy link
Contributor

efajardo-nv commented May 21, 2024

@oguzhancelik2425 Thank you for submitting this issue. The Starter DFP implementation and documentation are somewhat stale as we decided to focus on single DFP implementation, Production DFP. The two implementations have diverged significantly so we have been encouraging everyone to now start with Production DFP which comes with a bit more complexity but incorporates new Morpheus features and is more scalable.

We plan on having Starter DFP removed for the next release (#1715).

@oguzhancelik2425
Copy link
Author

@oguzhancelik2425 Thank you for submitting this issue. The Starter DFP implementation and documentation are somewhat stale as we decided to focus on single DFP implementation, Production DFP. The two implementations have diverged significantly so we have been encouraging everyone to now start with Production DFP which comes with a bit more complexity but incorporates new Morpheus features and is more scalable.

We plan on having Starter DFP removed for the next release (#1715).

Thanks for the quick response @efajardo-nv. I wonder about the part that returns the anomaly scores in the hammah_inference.py, it shows index 3 to get some values but If I use it, I am getting uniform values for the inference dataset. Could this module be outdated/stale as well like the rest of the starter pipeline? When I removed the [3] from the line I got different anomaly scores for my inference. I wonder is this line mistakenly written or so?

@efajardo-nv
Copy link
Contributor

I wonder about the part that returns the anomaly scores in the hammah_inference.py, it shows index 3 to get some values but If I use it, I am getting uniform values for the inference dataset. Could this module be outdated/stale as well like the rest of the starter pipeline? When I removed the [3] from the line I got different anomaly scores for my inference. I wonder is this line mistakenly written or so?

@oguzhancelik2425 You are correct. That inference script which is run independent of Morpheus has not been kept in sync with the changes in the autoencoder and data file paths. We'll get that updated. Thanks!

@mdemoret-nv mdemoret-nv added the external This issue was filed by someone outside of the Morpheus team label Jun 11, 2024
@morpheus-bot-test morpheus-bot-test bot moved this from Todo to Review - Ready for Review in Morpheus Boards Sep 20, 2024
@github-project-automation github-project-automation bot moved this from Review - Ready for Review to Done in Morpheus Boards Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation external This issue was filed by someone outside of the Morpheus team
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants