Sankey diagram to visualize Decision Tree results #29

hski-github · 2022-01-10T13:46:44Z

Hi there, great mod! I'm wondering if it could be used to visualize the decision tree outputs from Spotfire's classification/regression tree tool? The tool's outputs would probably need to be modified but the idea does not seem to be crazy: https://www.greenbook.org/mr/market-research-methodology/sankey-diagrams-a-better-way-to-visualize-decision-trees/ Cheers, Mark

See comment on TIBCO Community page https://community.tibco.com/wiki/sankey-diagram-mod-tibco-spotfirer

Mark-iGit · 2022-01-10T15:18:18Z

I've played a little bit with the mod but I cannot figure out a proper data format allowing to use it for decision tree visualizations. Is this me being stupid or is this related to the other comment on TIBCO community ("Would it be possible to have the input table set up just like the setup which is needed for example the NetworkD3 R package? So one column with the category, value, from and to. With this setup it is possible to have different values for the same category, to be used in for examples material flows where you have losses. ")?

hski-github · 2022-01-10T17:22:07Z

Could you provide some example data, what you get out of the decision tree?

Mark-iGit · 2022-01-11T19:23:34Z

Since I couldn't figure out a suitable data format I'll attach an image instead. The data set is classifying products into one of three classes (above spec, in spec or below spec) based on processing parameters such as Temperature, Pressure or Etch_Rate. Each of the nodes has an ID and a count of parts falling into that node (N):

No need to try and capture all of it but I think the important bit is that a given variable, e.g. Etch_Rate, can show up as splitting factor with various values at various levels of the tree. I've tried to provide data in tabular form for some of the nodes but again, couldn't figure out a suitable format:

ID	L0_Temperature	L0_Pressure	L0_Etch_Rate	L1_Temperature	L2_Etch_Rate	L3_Etch_Rate	Count
1	all	all	all				251
2				<=68.8			152
3				>68.8			93
4				<=68.8	<=20		55
5				<=68.8	>20		87
16				>68.8	<=22		7
17				>68.8	>22		81
6				<=68.8	>20	<=20.8	42
7				<=68.8	>20	>20.8	45

hski-github · 2022-01-11T20:31:14Z

Can you maybe describe what raw data you used and the steps to calculate the decision tree classification?

Mark-iGit · 2022-01-13T15:24:17Z

It is a synthetic data set where I had simulated the outcome of a critical dimension (CD) measurement on many parts based on Temperature, Pressure and Etch_Rate of the manufacturing process.
The decision tree was then asked to find a model which explains the classification of the CD measurement result (too large = above spec, just right = in spec, or too small = below spec). It does so by splitting all the measured parts based on Temperature, Pressure or Etch_Rate always selecting a split in a way the improves the prediction (it was Random Forest and I have visualized the result from one of the trees). It is not so much about the data or model but in principle if the data can be turned into a format which would allow visualizing how many parts are split of e.g. by the first split (Temperature >68.8 or <= 68.8) and then how many say of the one with >68.8 are again split of by Etch_Rate<22 and so on. Similar to what is described e.g. here: https://www.greenbook.org/mr/market-research-methodology/sankey-diagrams-a-better-way-to-visualize-decision-trees/
Does this make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sankey diagram to visualize Decision Tree results #29

Sankey diagram to visualize Decision Tree results #29

hski-github commented Jan 10, 2022

Mark-iGit commented Jan 10, 2022

hski-github commented Jan 10, 2022

Mark-iGit commented Jan 11, 2022

hski-github commented Jan 11, 2022

Mark-iGit commented Jan 13, 2022 •

edited

Loading

Sankey diagram to visualize Decision Tree results #29

Sankey diagram to visualize Decision Tree results #29

Comments

hski-github commented Jan 10, 2022

Mark-iGit commented Jan 10, 2022

hski-github commented Jan 10, 2022

Mark-iGit commented Jan 11, 2022

hski-github commented Jan 11, 2022

Mark-iGit commented Jan 13, 2022 • edited Loading

Mark-iGit commented Jan 13, 2022 •

edited

Loading