Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sankey diagram to visualize Decision Tree results #29

Open
hski-github opened this issue Jan 10, 2022 · 5 comments
Open

Sankey diagram to visualize Decision Tree results #29

hski-github opened this issue Jan 10, 2022 · 5 comments

Comments

@hski-github
Copy link
Owner

Hi there, great mod! I'm wondering if it could be used to visualize the decision tree outputs from Spotfire's classification/regression tree tool? The tool's outputs would probably need to be modified but the idea does not seem to be crazy: https://www.greenbook.org/mr/market-research-methodology/sankey-diagrams-a-better-way-to-visualize-decision-trees/ Cheers, Mark

See comment on TIBCO Community page https://community.tibco.com/wiki/sankey-diagram-mod-tibco-spotfirer

@Mark-iGit
Copy link

I've played a little bit with the mod but I cannot figure out a proper data format allowing to use it for decision tree visualizations. Is this me being stupid or is this related to the other comment on TIBCO community ("Would it be possible to have the input table set up just like the setup which is needed for example the NetworkD3 R package? So one column with the category, value, from and to. With this setup it is possible to have different values for the same category, to be used in for examples material flows where you have losses. ")?

@hski-github
Copy link
Owner Author

Could you provide some example data, what you get out of the decision tree?

@Mark-iGit
Copy link

Since I couldn't figure out a suitable data format I'll attach an image instead. The data set is classifying products into one of three classes (above spec, in spec or below spec) based on processing parameters such as Temperature, Pressure or Etch_Rate. Each of the nodes has an ID and a count of parts falling into that node (N):
image

No need to try and capture all of it but I think the important bit is that a given variable, e.g. Etch_Rate, can show up as splitting factor with various values at various levels of the tree. I've tried to provide data in tabular form for some of the nodes but again, couldn't figure out a suitable format:

<style> </style>
ID L0_Temperature L0_Pressure L0_Etch_Rate L1_Temperature L1_Pressure L1_Etch_Rate L2_Temperature L2_Pressure L2_Etch_Rate L3_Temperature L3_Pressure L3_Etch_Rate Count
1 all all all                   251
2       <=68.8                 152
3       >68.8                 93
4       <=68.8         <=20       55
5       <=68.8         >20       87
16       >68.8         <=22       7
17       >68.8         >22       81
6       <=68.8         >20     <=20.8 42
7       <=68.8         >20     >20.8 45

@hski-github
Copy link
Owner Author

Can you maybe describe what raw data you used and the steps to calculate the decision tree classification?

@Mark-iGit
Copy link

Mark-iGit commented Jan 13, 2022

It is a synthetic data set where I had simulated the outcome of a critical dimension (CD) measurement on many parts based on Temperature, Pressure and Etch_Rate of the manufacturing process.
The decision tree was then asked to find a model which explains the classification of the CD measurement result (too large = above spec, just right = in spec, or too small = below spec). It does so by splitting all the measured parts based on Temperature, Pressure or Etch_Rate always selecting a split in a way the improves the prediction (it was Random Forest and I have visualized the result from one of the trees). It is not so much about the data or model but in principle if the data can be turned into a format which would allow visualizing how many parts are split of e.g. by the first split (Temperature >68.8 or <= 68.8) and then how many say of the one with >68.8 are again split of by Etch_Rate<22 and so on. Similar to what is described e.g. here: https://www.greenbook.org/mr/market-research-methodology/sankey-diagrams-a-better-way-to-visualize-decision-trees/
Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants