Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence and Prior? What do you mean by those? #1

Open
brucewlee opened this issue Mar 15, 2022 · 2 comments
Open

Confidence and Prior? What do you mean by those? #1

brucewlee opened this issue Mar 15, 2022 · 2 comments

Comments

@brucewlee
Copy link

Hi :) Thanks for the great work. DiscSense deserves more recognition.
It reveals so much more potential for discourse analysis, especially pertaining to its role in semantics.

As a peer researcher in the similar field, a great use case of DiscSense is the understanding of text semantics through simple token matching.
If the semantic labels presented in DiscSense were meaningful enough, such a semantic analysis system would be possible even without sophisticated BERT-like encoders involved.

However, you mention Confidence (Prior) calculations.
I read your Paper but it is difficult to conceptually grasp what you mean by "Confidence" and "Prior".

How exactly are these values computed (I find it unclear in your LREC paper)? And what do you qualitatively mean by "Confidence" and "Prior"?
I hope I could receive some help here.

@sileod
Copy link
Owner

sileod commented Mar 15, 2022

Hi, thank you for these kind words ! I also think that it can reveal dataset biases and connotations of markers.

I heavily relied on the association rules terminology, which is a bit old fashion now. I mine marker=>label rules in specific datasets. But labels are unbalanced and a label can be dominant. If a label y is dominant, any marker=>label rule will be accurate. The prior is the probability of getting the label regardless of the discourse marker presence.

The confidence is the probability of the rule marker=>label being true in a dataset.
In the CR dataset, if you encounter "sadly," the review has a 95.2% chance of being negative. It is the confidence of the sadly=>negative association in CR.
In the CR dataset, a review has a 21.8% chance to be negative in general (which is the prior for negative in CR). See table 2 of the paper

@brucewlee
Copy link
Author

@sileod Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants