Skip to content

Commit

Permalink
changed introduction and training
Browse files Browse the repository at this point in the history
  • Loading branch information
elizaan committed Jul 17, 2024
1 parent 66b2987 commit 26b19a6
Show file tree
Hide file tree
Showing 5 changed files with 150 additions and 93 deletions.
45 changes: 44 additions & 1 deletion public/Upset-Alttext-User-Survey/assets/help.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,46 @@
# Help

This is a questionnaire. For each question, be sure to provide and answer and then click **Next** when you’re ready to move onto the next question.
<!-- This is a questionnaire. For each question, be sure to provide and answer and then click **Next** when you’re ready to move onto the next question. -->

# What Is UpSet?

The major challenge in understanding relationships between sets is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. The most common set visualization approach – Venn Diagrams – doesn't scale beyond three or four sets. **UpSet, in contrast, is well suited for the quantitative analysis of data with more than three sets.**

UpSet visualizes set intersections in a matrix layout. The matrix layout enables the effective representation of associated data, such as the number of elements in the intersections.


## UpSet Explained

UpSet plots the intersections of a set as a matrix, as shown in the following figure. Each column corresponds to a set, and bar charts on top show the size of the set. Each row corresponds to a possible intersection: the filled-in cells show which set is part of an intersection. Also notice the lines connecting the filled-in cells: they show in which direction you should read the plot:

<img style="width: 350px; height: 400px" class="centered-image" src="./assets/concept_1_matrix.svg" alt="Explaining the matrix approach in UpSet.">

Here you can see examples of how these intersections correspond to the segments in a Venn diagram. The first row in the figure is completely empty – it corresponds to all the elements that are in none of the sets. The green (third) row corresponds to the elements that are only in set B, (not in A or C). The orange (fifth) row represents elements that are shared by sets A and B, but not with C. Finally, the last (violet) row represents the elements shared between all sets.

<img style="height: 400px; width: 490.5px" class="centered-image" src="./assets/concept_2_intersections.svg" alt="Explaining the intersections in UpSet">

This layout is great because we can plot the size of the intersections (the “cardinality”) as bar charts right next ot the matrix, as you can see in the following example:

<img style="height: 400px; width: 531.8px" class="centered-image" src="./assets/concept_3_cardinality.svg" alt="Plotting intersection sizes with bars in UpSet.">

This makes the size of intersections easy to compare.

The matrix is also very useful because it can be sorted in various ways. A common way is to sort by the cardinality (size), as shown in the following figure, but it's also possible to sort by degree, or sets, or any other desired sorting.

<img style="height: 400px; width: 298.4px" class="centered-image" src="./assets/concept_4_sorting.svg" alt="Sorting by cardinality in UpSet">


These are the basiscs of UpSet! There's a lot more than you can do with UpSet plots, such as visualize attributes of the intersections, or group intersections. Look at the [upset.multinet.app](https://upset.multinet.app/) for details.

## Interpreting UpSet Plots

UpSet Plots are generally easy to read. Look at the following UpSet plot that presents movie genres as intersecting sets.

<img src="./assets/upset.png" alt="A simple UpSet Example" width="500"/>


### Sets and Set Sizes:
In this UpSet plot, Drama, Comedy, Thrillar, Crime, Fantasy are the movie genres (sets) that is repreented in five columns and the bars correspondent to their sizes. For tis particular example, the set sizes are sorted in descending order, but usually they can be of any order. The largest set is "Drama" and the smallest set is "Fantasy". With naked eye, we can easily say that the difference between the largest and the smallest set is quite significant. Based on this difference, we classify them in three types : Roughly equal, Diverging a bit, Diverging a lot. For this example, it is "Diverging a lot".

### Intersection and Intersection Sizes:
In this UpSet plot, there are 20 intersections in 20 different rows. Based on the number of sets present in each intersection, we classify them in three parts. If the intersection only contains one set, then it is made of single sets. If the intersection contains 2-3 sets, then it is called made of 2-3 sets. If more than 3 sets, then it is called made of many sets. Sometimes a intersection can have no set at all, and we name it as "the empty intersection". Another variant is, suppose the plot has five intersecting sets present and an intersection contains all of them, we name it as "all-set intersection". For the above example, there is no intersection having all the five sets, and the third row is empty intersection. There are bars for corresponding rows that show intersection sizes. Intersection sizes can be shown in any order. For above example, it is in descending order and the largest intersection contains single set (Drama). An example of intersection containing two sets is row six (Drama and Thrillar), and an example of intersection having three sets is row 20 (Comedy, Crime and Fantasy).
8 changes: 7 additions & 1 deletion public/Upset-Alttext-User-Survey/assets/introduction.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Introduction

Welcome to our study. In this study, we ask you questions about our generated text description for UpSet Plots. UpSet plot is a data visualization technique. Data visualization techniques serve as a visual language that conveys intricate patterns, trends, and relationships within data. With their increasing popularity, it’s also necessary that the visualizations reach a wide range of people. Our research aims to make data visualizations more accessible to communities so that people with visual impairments such as low vision, residual vision, and blind users do not miss any valuable information from the visualization. To serve this purpose, we have generated textual-natural-language rich textual descriptions. Before delving into the question-answer session, we will introduce you with what UpSet plot is, why is it used, how you can interpret an UpSet plot.
Welcome to our study!

In this study, we ask you questions about our generated text description for UpSet Plots. UpSet plot is a data visualization technique. Data visualization techniques serve as a visual language that conveys intricate patterns, trends, and relationships within data. With their increasing popularity, it’s also necessary that the visualizations reach a wide range of people.

Our research aims to make data visualizations more accessible to communities so that people with visual impairments such as low vision, residual vision, and blind users do not miss any valuable information from the visualization. To serve this purpose, we have generated text description for charts.

Before delving into the question-answer session, we will introduce you with what UpSet plot is, why is it used, how you can interpret an UpSet plot.
2 changes: 1 addition & 1 deletion public/Upset-Alttext-User-Survey/assets/training3.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Sometimes you will find this type of contents that we call "Visualization and Te

<div class="container">
<div class="column">
<img style="width: 460px;" src="./assets/VO2.png" alt="Description of Image">
<img style="width: 460px;" src="./assets/T.png" alt="Description of Image">
</div>
<div class="column">
<h2>Dataset Properties</h2>
Expand Down
78 changes: 7 additions & 71 deletions public/Upset-Alttext-User-Survey/assets/upset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,7 @@

The major challenge in understanding relationships between sets is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. The most common set visualization approach – Venn Diagrams – doesn't scale beyond three or four sets. **UpSet, in contrast, is well suited for the quantitative analysis of data with more than three sets.**


<!-- ![A simple UpSet Example](./assets/upset.png) -->
<img src="./assets/upset.png" alt="A simple UpSet Example" width="500"/>

UpSet visualizes set intersections in a matrix layout. The matrix layout enables the effective representation of associated data, such as the number of elements in the intersections.



<!-- ## When should you use UpSet?
**UpSet works best for set data with more than three and less than about 30 sets**.
**UpSet is well suited for analyzing distributions and properties of many items**. Items are abstracted away as “counts”, though attributes of the items can be visualized in integrated or adjacenct plots. If you want to see individual items in your set, you should probably go with a [Euler Diagram](https://de.wikipedia.org/wiki/Datei:British_Isles_Euler_diagram_15.svg).
**UpSet shines when you want to look at all combinations of how sets intersect.**. If you want to look at pairwise intersections between sets, some sort of co-occurence matrix might be a better choice.
Also take a look at the [Nature Methods Points of View article](https://www.nature.com/articles/nmeth.3033) discussing these trade-offs. -->


## UpSet Explained
Expand All @@ -41,66 +25,18 @@ The matrix is also very useful because it can be sorted in various ways. A commo

<img style="height: 400px; width: 298.4px" class="centered-image" src="./assets/concept_4_sorting.svg" alt="Sorting by cardinality in UpSet">

Finally, UpSet works just as well horizontally or vertically. Vertical layouts are better for interactive UpSet plots that can be scrolled, while horizontal layouts are best for figures in papers.

<img style="height: 250px; width: 340.3px" class="centered-image" src="./assets/concept_5_horizontal.svg" alt="Horizontal layout in UpSet">


These are the basiscs of UpSet! There's a lot more than you can do with UpSet plots, such as visualize attributes of the intersections, or group intersections. Look at the [upset.multinet.app](https://upset.multinet.app/) for details.

## Interpreting UpSet Plots

<!-- ## Interpreting UpSet Plots
UpSet Plots are generally easy to read. There is one important caveat though: **you should be careful about interpreting data where the size of the sets is very different.** Look at the following example:
![UpSet and unequal set sizes.](./assets/unequal_set_size.png)
Here' we're looking at movie genres, and it looks like the 2-set combination of “Drama” and “Comedy” is the largest two-set intersection. While this is a correct obervation it seems odd: dramas and comedy don't seem to go together all that well. What we're seeing here is an effect of the large size of the “Drama” and “Comedy” sets. Compared to the “Children“ and “Documentary” sets, those two sets are huge. To understand this, it's important to also look at the set sizes, and hence **no upset plot should omit the visualization of set sizes**. The above example shows another metric that can be used to interprete this: the “Deviation” (orange and blue bars) that indicate how much an intersection deviates from the expected size if we assumed that set membership were random. We see that the comedy-drama intersection is actually much smaller than it should be if the data were random.
## UpSet vs. Venn Diagrams
UpSet Plots are generally easy to read. Look at the following UpSet plot that presents movie genres as intersecting sets.

Venn diagrams are not suitable to visualize intersections of more than three or four sets. The figure below shows an example of a six-set venn diagram [published in Nature](https://www.nature.com/nature/journal/v488/n7410/full/nature11241.html) that shows the relationship between the banana's genome and the genome of five other species by visualizing which genes are shared between the plant species.
![The six set banana venn diagram.](./assets/banana.png)
While this figure looks fun, it is not a useful visualization. Try to extract any information from it. It's really hard to trace which intersection involves which sets. It's not obvious which is the biggest intersection from the visualization – you have to read the labels one by one.
You might ask, how does the banana venn diagram look in UpSet? Here you go:
![UpSet showing the banana data.](./assets/upsetr-banana.png)
It is a little hard to read because the figure is rather small. But we can simply remove the small intersections, and we get a nice plot that shows us the main features of the data:
![UpSet showing the bana data with small intersections removed. ](./assets/upsetr-banana_clipped.png)
Notice how easy it is to see trends: the vast majority of genes is shared between all plants, as highlighted in the next figure:
![UpSet showing the banana data with highlight on largest intersection, which includes all sets.](./assets/upset_genome_top.png)
Similarily, the first three species (Oryza_sativa, Sorghum_bicolor, and Brachypodium_distachyon) seem to be highly related, as all of them are part of the top-three intersections. In contrast, the sixth species (Phoenix dactylifera) seems to be most different from the others, as it only again is part of the sixth-largest intersection.
![UpSet showing the banana data with highlight on the first three sets, and on the intersection of the date with the rest.](./assets/upset_genome_top-3.png)
Such an analysis is almost impossible with a Venn diagram! -->


<!-- ## Frequently Asked Questions
- _How can I create high-resolution UpSet plots for a paper or other publication?_
There are three options:
- If you prefer to use the interactive web-based version you can print an interactive UpSet plot to a PDF and edit the PDF with a vector editing software such as Adobe Illustrator.
- You can create an exportable figure to generate a plot using a programming language such as R or Python.
- You can create a static figure using, e.g., the R-Shiny versions of Upset.
To explore all of these options, please refer to the [implementations page](/implementations/).
- _Can I show attributes of the intersections?_
<img src="./assets/upset.png" alt="A simple UpSet Example" width="500"/>

Yes, [most implementations](/implementations/) support visualizing attributes in some way.

- _Can I export the elements in a particular intersection?_
### Sets and Set Sizes:
In this UpSet plot, Drama, Comedy, Thrillar, Crime, Fantasy are the movie genres (sets) that is repreented in five columns and the bars correspondent to their sizes. For tis particular example, the set sizes are sorted in descending order, but usually they can be of any order. The largest set is "Drama" and the smallest set is "Fantasy". With naked eye, we can easily say that the difference between the largest and the smallest set is quite significant. Based on this difference, we classify them in three types : Roughly equal, Diverging a bit, Diverging a lot. For this example, it is "Diverging a lot".

Yes, but to our knowledge, only the interactive [UpSet 2](/upset/#upset2) version supports this. -->
### Intersection and Intersection Sizes:
In this UpSet plot, there are 20 intersections in 20 different rows. Based on the number of sets present in each intersection, we classify them in three parts. If the intersection only contains one set, then it is made of single sets. If the intersection contains 2-3 sets, then it is called made of 2-3 sets. If more than 3 sets, then it is called made of many sets. Sometimes a intersection can have no set at all, and we name it as "the empty intersection". Another variant is, suppose the plot has five intersecting sets present and an intersection contains all of them, we name it as "all-set intersection". For the above example, there is no intersection having all the five sets, and the third row is empty intersection. There are bars for corresponding rows that show intersection sizes. Intersection sizes can be shown in any order. For above example, it is in descending order and the largest intersection contains single set (Drama). An example of intersection containing two sets is row six (Drama and Thrillar), and an example of intersection having three sets is row 20 (Comedy, Crime and Fantasy).
Loading

0 comments on commit 26b19a6

Please sign in to comment.