-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explorer revamp #428
base: master
Are you sure you want to change the base?
Explorer revamp #428
Conversation
…hodsinitiative/4cat into explorer-improvements # Conflicts: # common/lib/config_definition.py
…rer settings page
…s functionality, start Instagram template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over the backend code and only noticed one real issue (in an edge case). I ran this version and tested out the Explorer on a number of datasets (instagram, custom, telegram, tumblr, tiktok, youtube, reddit). It looks good! Sort works well. Reddit was missing the "subject" field (it's probably the only dataset that uses subject anymore). Telegram has an issue which I will post separately.
I tested saving annotations and writing them to datasets. This worked for me (and broke one with my edge case 😬; see comment). I did notice that the new fields show up in the Dataset preview view, but the values saved to the database do not show up in preview. The values do show up after you have run "write annotations".
Changing deactivating/activating settings seem to work fine. There is an explorerflask
settings group that could probably be merged with the Explorer group.
If you want to merge now, I would deactivate Telegram as a default (till addressed) and consider how to address my comment re: field names for annotations.
datasources/ninegag/search_9gag.py
Outdated
@@ -8,6 +8,7 @@ | |||
|
|||
from backend.lib.search import Search | |||
from common.lib.item_mapping import MappedItem | |||
from common.lib.helpers import UserInput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're importing UserInput in a few datasources unnecessarily. Probably an oversight and otherwise has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we should do some cleanup..
@@ -101,7 +102,7 @@ def process(self): | |||
|
|||
# Write to top dataset | |||
for label, values in new_data.items(): | |||
self.add_field_to_parent("annotation_" + label, values, which_parent=self.source_dataset, update_existing=True) | |||
self.add_field_to_parent(label, values, which_parent=self.source_dataset, update_existing=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add_field_to_parent
function does not check for existing fields. If a User creates a field called "username" they will overwrite an existing field with the same name. If I recall, I could not figure out how to check that an existing column had the name because the add_field_to_parent function needs to be able to update existing annotation fields. This is just a bit dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on a dataset by creating a field called "author", adding some values, and writing to dataset. I was able to overwrite the original "author" field (which in my case was actually a dictionary of author related data which caused map item to break). I recommend reverting this change. We could even add 4CAT_annotation_
or something so that it would be virtually impossible for raw data to contain that fieldname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed an oversight for now, though I would like to have the option for annotation fields to have a 'clean' name; long names are quickly unreadable in spreadsheet software. This can be resolved by initially checking whether an annotation field key already exists in the dataset columns or, when annotated datasets are filtered and create a new dataset, if it is not a field registered in the annotations
table for the parent dataset.
This is a sort-of edge case for now, but I'll try to resolve this next week!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this with a back-end and front-end check
And this is the issue with a Telegram dataset I ran into:
Looks like perhaps the emojis are killing the template. Telegram, I think, is the only datasource using them. |
…hodsinitiative/4cat into explorer-improvements
…orer-improvements
# Conflicts: # setup.py # webtool/__init__.py # webtool/lib/template_filters.py # webtool/templates/components/result-result-row.html
# Conflicts: # common/lib/dataset.py # webtool/views/api_explorer.py
@stijn-uva this should be mergeable! Perhaps we want the OpenAI processor on a different branch for now since I'd like it to have some more features. But it works in its current state so also doesn't hurt to have it on master.. |
This PR revamps how the Explorer works and looks. It specifically does the following:
OPTION_DATASOURCES_TABLE
user input that creates a table with dynamic columns for each enabled dataset. Input fields per row can be text, dropdown, and checkboxstatic/css/explorer/
) and Jinja2 templates (inwebtool/templates/explorer/datasource-templates/
) instead of CSS and JSON files in the data source folders that need to be verified and parsed.iterate_items
.Note that some unused code is still present for future updates with respect to 4CAT scrapers and database-accessible data sources generally.