From e68c40e966527def173587a236c411f9e0f44eca Mon Sep 17 00:00:00 2001 From: Matthew Bernhardt Date: Tue, 29 Oct 2024 14:02:59 -0400 Subject: [PATCH 1/5] Update documentation ** Why are these changes being introduced: We are adding a Confirmation class for users to provide ground truth for how different Terms should be categorized. The first step is to update our documentation to work out how this is going to work. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-8 ** How does this address that need: This updates the class diagram and workflow documentation to include the Confirmation class. * User and Confirmation are now grouped in the class diagram, with a common visual language. * The categorization workflow document is updated to use the new vocabulary "confirmation" rather than our earlier "validation". ** Document any side effects to this change: As this commit only changes documentation, there are no side effects yet. --- docs/explanation/categorization-workflow.md | 33 +++++++++++++----- docs/reference/classes.md | 38 +++++++++++++++++---- 2 files changed, 56 insertions(+), 15 deletions(-) diff --git a/docs/explanation/categorization-workflow.md b/docs/explanation/categorization-workflow.md index 35037ae..614cbe0 100644 --- a/docs/explanation/categorization-workflow.md +++ b/docs/explanation/categorization-workflow.md @@ -91,16 +91,33 @@ One detector in this application is associated with different categories on a re SuggestedResource detector. The `calculateCategory()` method includes a lookup for this detector to make sure that any detections are scored appropriately. -### Human validation of these operations +### Human review of these operations -There will be an ability for humans to inspect these operations, and to submit feedback about any actions which were -not correct. These validations will be used to further refine the confidence values associated with our `Detector` and -`DetectionCategory` records, as well as to refine the operation of the detectors, or the mappings between these -elements. +Staff will have multiple ways to participate in this process: -This validation workflow has not been defined yet, nor has the data model been expanded to support this feedback. We do -anticipate, however, that successful or unsuccessful validations would end up adjusting the relevant confidence values -via the `incrementConfidence()` or `decrementConfidence()` methods. +1. Placing terms into categories themselves, independent of what the application does. These classifications will be +consulted during adjustments to the application, acting as a confirmation of what the tool _should_ be doing. +2. Inspecting the actions taken by the application about any term (which detectors activate, and how those detections +are summarized into the final categorization). + +#### Confirmation + +The first of these actions, the Confirmation workflow, is the first part we are developing. Any user of the system will +be able to review a Term record, and place it into one of the three basic categories. If unsure, the user can place the +term in an Undefined category. The final option during Confirmation will be to flag the term for review and removal +(occasionally we see search terms that are meaningless as a search term, but appear to be user passwords or otherwise +contain information that should not be retained). + +#### Inspection + +Staff will have the ability to inspect the detections and categorizations taken by the application, and to submit +feedback about any actions which were not correct. This feedback will be used to further refine the confidence values +associated with our `Detector` and `DetectionCategory` records, as well as to refine the operation of the detectors, or +the mappings between these elements. + +This feedback workflow has not been defined yet, nor has the data model been expanded to support this feedback. We +do anticipate, however, adjusting the relevant confidence values via the `incrementConfidence()` or +`decrementConfidence()` methods. --- diff --git a/docs/reference/classes.md b/docs/reference/classes.md index 99071e7..ecf0edb 100644 --- a/docs/reference/classes.md +++ b/docs/reference/classes.md @@ -1,12 +1,17 @@ # Modeling categorization -The application includes the following entities, most of which an be broken into one of the following three areas: - -* Search activity, which flow in continuously with Terms and Search Events; -* A knowledge graph, which includes the categories, detectors, and relationships - between the two which TACOS defines and maintains, and which is consulted during categorization; and -* The linkages between these search terms and the graph, which record which signals are - detected in each term, and how those signals are interpreted to place the term into a category. +The application includes the following entities, most of which an be broken into one of the following four areas: + +* Search activity, which flow in continuously +with Terms and Search Events; +* A knowledge graph, which includes the +categories, detectors, and relationships + between the two which TACOS defines and maintains, and which is consulted during categorization; +* The linkages between these search terms and the +graph, which record which signals are + detected in each term, and how those signals are interpreted to place the term into a category; and +* User activity which is provided by staff +who review the application's decisions and provide ground truth for future improvements. ```mermaid classDiagram @@ -24,6 +29,10 @@ classDiagram Detector "1" --> "0..*" DetectorCategory + Confirmation --> Term + Confirmation --> Category + User --> Confirmation : provides many + class User User: +String uid User: +String email @@ -93,6 +102,12 @@ classDiagram DetectorSuggestedResource: record() DetectorSuggestedResource: update_fingerprint() + class Confirmation + Confirmation: +Integer id + Confirmation: +Integer user_id + Confirmation: +Integer term_id + Confirmation: +Integer category_id + Confirmation: +Boolean flag namespace SearchActivity{ class Term @@ -113,6 +128,12 @@ classDiagram class DetectorSuggestedResource["Detector::SuggestedResource"] } + namespace UserActivity { + class Confirmation + class User + + } + style SearchEvent fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px; style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px; @@ -126,4 +147,7 @@ classDiagram style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5; style Detection fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5; + + style Confirmation fill:#000,stroke:#ffd407,stroke-dasharray: 5 10; + style User fill:#000,stroke:#ffd407,stroke-dasharray: 5 10; ``` From 16a492c4d0473fd761be0ecbf7d0b9000e15b57f Mon Sep 17 00:00:00 2001 From: Matthew Bernhardt Date: Tue, 29 Oct 2024 14:41:21 -0400 Subject: [PATCH 2/5] Create UserCategorization model ** Why are these changes being introduced: With the documentation updated, we need to actually change the models. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-8 ** How does this address that need: * Adds the Confirmation model, with relationships to User, Term, and Category (optionally). The category relationship is optional because there is also the option for a user to flag the term for removal. This flag is stored on the Confirmation model so that we have a record of who flagged it, in case we want to follow up with them. The implication is that we will review these flags, not perform the removal immediately. There is a uniqueness constraint on the Confirmation model, to limit users from confirming the same term twice. * Two scopes are added to the Term model which filter records that have (and have not) been confirmed by a user. * A fourth category is added to the database seeds - "Undefined". The automated categorization does not interact with this category, but users have the option to place a term there in their workflow. The UI which is coming in the next commit then will provide five options for manual confirmation: one for each category, and then a "flag" option. * Tests and fixtures are added to support and confirm all of the above. ** Document any side effects to this change: There are a few side effects here: * While adding the 'flagged' field makes sense to me on the Confirmation model, it may also need to be added to the Term model (or at least add a scope to grab flagged records) * The existing scopes on the Term model do not pay attention to which user has confirmed the term - so at the moment getting multiple confirmations from multiple users may not be possible without further work. * Because we are changing the database seeds, we will need to take care when promoting this change ito production. --- app/models/category.rb | 1 + app/models/confirmation.rb | 19 ++++ app/models/term.rb | 4 + app/models/user.rb | 2 + .../20241029181747_create_confirmation.rb | 14 +++ db/schema.rb | 19 +++- db/seeds.rb | 4 + test/fixtures/categories.yml | 4 + test/fixtures/confirmations.yml | 30 +++++++ test/models/category_test.rb | 16 ++++ test/models/confirmation_test.rb | 87 +++++++++++++++++++ test/models/term_test.rb | 78 +++++++++++++++++ test/models/user_test.rb | 16 ++++ 13 files changed, 293 insertions(+), 1 deletion(-) create mode 100644 app/models/confirmation.rb create mode 100644 db/migrate/20241029181747_create_confirmation.rb create mode 100644 test/fixtures/confirmations.yml create mode 100644 test/models/confirmation_test.rb diff --git a/app/models/category.rb b/app/models/category.rb index 686c460..4b389f5 100644 --- a/app/models/category.rb +++ b/app/models/category.rb @@ -14,4 +14,5 @@ class Category < ApplicationRecord has_many :detector_categories, dependent: :destroy has_many :detectors, through: :detector_categories has_many :categorizations, dependent: :destroy + has_many :confirmations, dependent: :destroy end diff --git a/app/models/confirmation.rb b/app/models/confirmation.rb new file mode 100644 index 0000000..30c0f64 --- /dev/null +++ b/app/models/confirmation.rb @@ -0,0 +1,19 @@ +# frozen_string_literal: true + +# == Schema Information +# +# Table name: confirmations +# +# id :integer not null, primary key +# user_id :integer not null +# term_id :integer not null +# category_id :integer +# flag :boolean +# created_at :datetime not null +# updated_at :datetime not null +# +class Confirmation < ApplicationRecord + belongs_to :user + belongs_to :term + belongs_to :category, optional: true +end diff --git a/app/models/term.rb b/app/models/term.rb index 04dc351..3038721 100644 --- a/app/models/term.rb +++ b/app/models/term.rb @@ -16,6 +16,10 @@ class Term < ApplicationRecord has_many :search_events, dependent: :destroy has_many :detections, dependent: :destroy has_many :categorizations, dependent: :destroy + has_many :confirmations, dependent: :destroy + + scope :user_confirmed, -> { where.associated(:confirmations).distinct } + scope :user_unconfirmed, -> { where.missing(:confirmations).distinct } # The record_detections method is the one-stop method to call every Detector's record method that is defined within # the application. diff --git a/app/models/user.rb b/app/models/user.rb index 99d9d99..b453ed7 100644 --- a/app/models/user.rb +++ b/app/models/user.rb @@ -12,6 +12,8 @@ # admin :boolean default(FALSE) # class User < ApplicationRecord + has_many :confirmations, dependent: :destroy + include FakeAuthConfig if FakeAuthConfig.fake_auth_enabled? diff --git a/db/migrate/20241029181747_create_confirmation.rb b/db/migrate/20241029181747_create_confirmation.rb new file mode 100644 index 0000000..b1d0ed3 --- /dev/null +++ b/db/migrate/20241029181747_create_confirmation.rb @@ -0,0 +1,14 @@ +class CreateConfirmation < ActiveRecord::Migration[7.1] + def change + create_table :confirmations do |t| + t.references :user, null: false, foreign_key: true + t.references :term, null: false, foreign_key: true + t.references :category, null: true, foreign_key: true + t.boolean :flag + + t.timestamps + end + add_index :confirmations, [:term_id, :user_id], unique: true + add_index :confirmations, [:user_id, :term_id], unique: true + end +end diff --git a/db/schema.rb b/db/schema.rb index 3a9f440..fa589df 100644 --- a/db/schema.rb +++ b/db/schema.rb @@ -10,7 +10,7 @@ # # It's strongly recommended that you check this file into your version control system. -ActiveRecord::Schema[7.1].define(version: 2024_10_11_181823) do +ActiveRecord::Schema[7.1].define(version: 2024_10_29_181747) do create_table "categories", force: :cascade do |t| t.string "name" t.text "description" @@ -32,6 +32,20 @@ t.index ["term_id"], name: "index_categorizations_on_term_id" end + create_table "confirmations", force: :cascade do |t| + t.integer "user_id", null: false + t.integer "term_id", null: false + t.integer "category_id" + t.boolean "flag" + t.datetime "created_at", null: false + t.datetime "updated_at", null: false + t.index ["category_id"], name: "index_confirmations_on_category_id" + t.index ["term_id", "user_id"], name: "index_confirmations_on_term_id_and_user_id", unique: true + t.index ["term_id"], name: "index_confirmations_on_term_id" + t.index ["user_id", "term_id"], name: "index_confirmations_on_user_id_and_term_id", unique: true + t.index ["user_id"], name: "index_confirmations_on_user_id" + end + create_table "detections", force: :cascade do |t| t.integer "term_id", null: false t.integer "detector_id", null: false @@ -125,6 +139,9 @@ add_foreign_key "categorizations", "categories" add_foreign_key "categorizations", "terms" + add_foreign_key "confirmations", "categories" + add_foreign_key "confirmations", "terms" + add_foreign_key "confirmations", "users" add_foreign_key "detections", "detectors" add_foreign_key "detections", "terms" add_foreign_key "detector_categories", "categories" diff --git a/db/seeds.rb b/db/seeds.rb index 307c7ac..887816d 100644 --- a/db/seeds.rb +++ b/db/seeds.rb @@ -23,6 +23,10 @@ name: 'Transactional', description: 'A type of search where the user has an item in mind, and wants to get that item. Also known as "known-item".' ) +Category.find_or_create_by( + name: 'Undefined', + description: 'A search which has not been placed in one of the other categories.' +) # Detectors Detector.find_or_create_by(name: 'DOI') diff --git a/test/fixtures/categories.yml b/test/fixtures/categories.yml index 059d827..9c7e02c 100644 --- a/test/fixtures/categories.yml +++ b/test/fixtures/categories.yml @@ -19,3 +19,7 @@ navigational: transactional: name: 'Transactional' description: '...' + +undefined: + name: 'Undefined' + description: '...' diff --git a/test/fixtures/confirmations.yml b/test/fixtures/confirmations.yml new file mode 100644 index 0000000..8215340 --- /dev/null +++ b/test/fixtures/confirmations.yml @@ -0,0 +1,30 @@ +# == Schema Information +# +# Table name: confirmations +# +# id :integer not null, primary key +# user_id :integer not null +# term_id :integer not null +# category_id :integer +# flag :boolean +# created_at :datetime not null +# updated_at :datetime not null +# +minimal: + user: valid + term: hi + +unsure: + user: admin + term: hi + category: undefined + +informational: + user: valid + term: lcsh + category: informational + +transactional: + user: valid + term: multiple_detections + category: transactional diff --git a/test/models/category_test.rb b/test/models/category_test.rb index 1ad5033..110b387 100644 --- a/test/models/category_test.rb +++ b/test/models/category_test.rb @@ -62,4 +62,20 @@ class CategoryTest < ActiveSupport::TestCase assert_equal(category_count - 1, Category.count) assert_equal(categorization_count - relevant_links, Categorization.count) end + + test 'destroying a Category will delete associated Confirmations' do + category_count = Category.count + confirmation_count = Confirmation.count + + record = categories('transactional') + + relevant_links = record.confirmations.count + + assert_operator(0, :<, relevant_links) + + record.destroy + + assert_equal(category_count - 1, Category.count) + assert_equal(confirmation_count - relevant_links, Confirmation.count) + end end diff --git a/test/models/confirmation_test.rb b/test/models/confirmation_test.rb new file mode 100644 index 0000000..02cdedb --- /dev/null +++ b/test/models/confirmation_test.rb @@ -0,0 +1,87 @@ +# frozen_string_literal: true + +# == Schema Information +# +# Table name: confirmations +# +# id :integer not null, primary key +# user_id :integer not null +# term_id :integer not null +# category_id :integer +# flag :boolean +# created_at :datetime not null +# updated_at :datetime not null +# +require 'test_helper' + +class ConfirmationTest < ActiveSupport::TestCase + test 'confirmations must have a user' do + sample = confirmations('minimal') + + assert_predicate sample.user, :present? + assert_predicate sample, :valid? + + sample.user = nil + + assert_not_predicate sample, :valid? + end + + test 'destroying a Confirmation will not affect that User' do + confirmation_count = Confirmation.count + user_count = User.count + + record = confirmations('unsure') + + record.destroy + + assert_equal(confirmation_count - 1, Confirmation.count) + assert_equal(user_count, User.count) + end + + test 'users cannot confirm the same term twice' do + old_record = confirmations('minimal') + new_record = { + user: old_record.user, + term: old_record.term + } + + assert_raises(ActiveRecord::RecordNotUnique) do + Confirmation.create!(new_record) + end + end + + test 'confirmations must have a term' do + sample = confirmations('minimal') + + assert_predicate sample.term, :present? + assert_predicate sample, :valid? + + sample.term = nil + + assert_not_predicate sample, :valid? + end + + test 'destroying a Confirmation will not affect that Term' do + confirmation_count = Confirmation.count + term_count = Term.count + + record = confirmations('unsure') + + record.destroy + + assert_equal(confirmation_count - 1, Confirmation.count) + assert_equal(term_count, Term.count) + end + + test 'destroying a Confirmation will not affect that Category' do + confirmation_count = Confirmation.count + category_count = Category.count + + record = confirmations('unsure') + + record.destroy + + assert_equal(confirmation_count - 1, Confirmation.count) + assert_equal(category_count, Category.count) + end +end diff --git a/test/models/term_test.rb b/test/models/term_test.rb index da06bee..d04702e 100644 --- a/test/models/term_test.rb +++ b/test/models/term_test.rb @@ -62,6 +62,22 @@ class TermTest < ActiveSupport::TestCase assert_operator(Categorization.count, :<, categorization_pre_count) end + test 'destroying a Term will delete associated Confirmations' do + term_count = Term.count + confirmation_count = Confirmation.count + + term = terms('lcsh') + + relevant_links = term.confirmations.count + + assert_operator(0, :<, relevant_links) + + term.destroy + + assert_equal(term_count - 1, Term.count) + assert_equal(confirmation_count - relevant_links, Confirmation.count) + end + test 'destroying a SearchEvent does not delete the Term' do t = terms('hi') s = t.search_events.first @@ -183,4 +199,66 @@ class TermTest < ActiveSupport::TestCase assert_equal categorization_count + 1, Categorization.count end end + + test 'user_confirmed scope returns an active record relation' do + assert_kind_of ActiveRecord::Relation, Term.user_confirmed + end + + test 'user_unconfirmed scope returns an active record relation' do + assert_kind_of ActiveRecord::Relation, Term.user_unconfirmed + end + + test 'user_confirmed scope includes terms with manual confirmations' do + confirmed_count = Term.user_confirmed.count + t = terms('doi') + + # Make sure that this term isn't yet confirmed + assert_equal 0, t.confirmations.count + + new_record = { + term: t, + user: users('basic'), + category: categories('transactional') + } + Confirmation.create!(new_record) + + # The count should now be one more. + assert_equal confirmed_count + 1, Term.user_confirmed.count + end + + test 'user_confirmed scope accounts for terms with multiple manual confirmations' do + confirmed_count = Term.user_confirmed.count + t = terms('lcsh') + + # Confirm this term has already been categorized manually (see fixtures) + assert_operator 1, :<=, t.confirmations.count + + new_record = { + term: t, + user: users('basic'), + category: categories('transactional') + } + Confirmation.create!(new_record) + + # The count should be unchanged + assert_equal confirmed_count, Term.user_confirmed.count + end + + test 'user_unconfirmed scope accounts for new manual confirmations' do + unconfirmed_count = Term.user_unconfirmed.count + t = terms('doi') + + # Confirm this term is not yet categorized + assert_equal 0, t.confirmations.count + + new_record = { + term: t, + user: users('basic'), + category: categories('transactional') + } + Confirmation.create!(new_record) + + # The count should now be one less. + assert_equal unconfirmed_count - 1, Term.user_unconfirmed.count + end end diff --git a/test/models/user_test.rb b/test/models/user_test.rb index f554370..8172ab5 100644 --- a/test/models/user_test.rb +++ b/test/models/user_test.rb @@ -57,4 +57,20 @@ class UserTest < ActiveSupport::TestCase assert_not user.admin? assert_predicate user, :valid? end + + test 'removing a user will remove their Confirmations' do + user_count = User.count + confirmation_count = Confirmation.count + + user = users('valid') + + relevant_links = user.confirmations.count + + assert_operator(0, :<, relevant_links) + + user.destroy + + assert_equal(user_count - 1, User.count) + assert_equal(confirmation_count - relevant_links, Confirmation.count) + end end From 61cc08a3f9d106b3e156fd9d23612dd8eaeedff8 Mon Sep 17 00:00:00 2001 From: Matthew Bernhardt Date: Wed, 30 Oct 2024 14:15:39 -0400 Subject: [PATCH 3/5] Controller and Views for confirmations ** Why are these changes being introduced: * With the models updated, we now turn to the controller and view layer. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-8 ** How does this address that need: * Defines two new routes for the confirmation workflow - an index and a form. * We add the Pagy gem to the application, for use on the confirmation index. * Adds a Term controller, with two methods for an index and form view. The controller has tests for its use of the ability model. * The ability model adds the ability to manage confirmations for any authenticated user. * View templates are added for these two displays, and a link to the index is added to the navigation. The form is a stub, with no actual controls (and no controller method to process their submission). ** Document any side effects to this change: * I'm not sure whether the controller needs more tests. * Our theme gem appears to anticipate the use of Kaminari, not Pagy - not sure what the lift will be to polish the pagination into something more presentable. * As noted above, the form is faked - there is no actual input fields, nor a controller method to receive them. --- Gemfile | 3 + Gemfile.lock | 2 + app/controllers/term_controller.rb | 14 ++ app/helpers/application_helper.rb | 1 + app/models/ability.rb | 4 + app/views/layouts/_site_nav.html.erb | 3 + app/views/term/confirm_index.html.erb | 14 ++ app/views/term/confirm_term.html.erb | 19 ++ config/initializers/pagy.rb | 220 +++++++++++++++++++++++ config/routes.rb | 4 + test/controllers/term_controller_test.rb | 51 ++++++ 11 files changed, 335 insertions(+) create mode 100644 app/controllers/term_controller.rb create mode 100644 app/views/term/confirm_index.html.erb create mode 100644 app/views/term/confirm_term.html.erb create mode 100644 config/initializers/pagy.rb create mode 100644 test/controllers/term_controller_test.rb diff --git a/Gemfile b/Gemfile index 8904ff6..8eeffd0 100644 --- a/Gemfile +++ b/Gemfile @@ -37,6 +37,9 @@ gem 'omniauth' gem 'omniauth_openid_connect' gem 'omniauth-rails_csrf_protection' +# Pagy used for pagination on long lists of records +gem 'pagy', '~> 9.1' + # Parser added explicitly to allow for the namespace to be available for scout gem 'parser' diff --git a/Gemfile.lock b/Gemfile.lock index 646e7db..0cbd8e0 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -277,6 +277,7 @@ GEM validate_url webfinger (~> 2.0) orm_adapter (0.5.0) + pagy (9.1.1) parallel (1.26.2) parser (3.3.4.2) ast (~> 2.4.1) @@ -490,6 +491,7 @@ DEPENDENCIES omniauth omniauth-rails_csrf_protection omniauth_openid_connect + pagy (~> 9.1) parser pg puma (>= 5.0) diff --git a/app/controllers/term_controller.rb b/app/controllers/term_controller.rb new file mode 100644 index 0000000..fe9bd8e --- /dev/null +++ b/app/controllers/term_controller.rb @@ -0,0 +1,14 @@ +# frozen_string_literal: true + +class TermController < ApplicationController + include Pagy::Backend + + def confirm_index + @pagy, @records = pagy(Term.user_unconfirmed) + end + + def confirm_term + @term = Term.find(params[:id]) + @categories = Category.all + end +end diff --git a/app/helpers/application_helper.rb b/app/helpers/application_helper.rb index 15b06f0..0e0edb4 100644 --- a/app/helpers/application_helper.rb +++ b/app/helpers/application_helper.rb @@ -1,4 +1,5 @@ # frozen_string_literal: true module ApplicationHelper + include Pagy::Frontend end diff --git a/app/models/ability.rb b/app/models/ability.rb index 0451a9a..493bea5 100644 --- a/app/models/ability.rb +++ b/app/models/ability.rb @@ -22,6 +22,10 @@ def initialize(user) # Allow all authenticated users to view reports can :view, :report + + # Create manual confirmation + can :manage, :confirmations + can :manage, Confirmation # End of Rules for all authenticated user with no additional roles required # Rules for admins diff --git a/app/views/layouts/_site_nav.html.erb b/app/views/layouts/_site_nav.html.erb index a0686b8..c6978ee 100644 --- a/app/views/layouts/_site_nav.html.erb +++ b/app/views/layouts/_site_nav.html.erb @@ -10,6 +10,9 @@