Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Citation detector with Metrics #126

Merged
merged 1 commit into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions app/models/metrics/algorithms.rb
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,9 @@ def generate(month = nil)
else
count_matches(SearchEvent.includes(:term))
end
Metrics::Algorithms.create(month:, doi: matches[:doi], issn: matches[:issn], isbn: matches[:isbn],
lcsh: matches[:lcsh], pmid: matches[:pmid], journal_exact: matches[:journal_exact],
Metrics::Algorithms.create(month:, citation: matches[:citation], doi: matches[:doi], issn: matches[:issn],
isbn: matches[:isbn], lcsh: matches[:lcsh], pmid: matches[:pmid],
journal_exact: matches[:journal_exact],
suggested_resource_exact: matches[:suggested_resource_exact],
unmatched: matches[:unmatched])
end
Expand Down Expand Up @@ -78,11 +79,25 @@ def count_matches(events)
# @return does not return anything (the same matches Hash is passed in each loop but not explicitly sent back)
def event_matches(event, matches)
ids = match_standard_identifiers(event, matches)
citation = match_citations(event, matches)
journal_exact = process_journals(event, matches)
suggested_resource_exact = process_suggested_resources(event, matches)
lcshs = match_lcsh(event, matches)

matches[:unmatched] += 1 if ids.detections.blank? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero?
matches[:unmatched] += 1 if ids.detections.blank? && !citation.detection? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to normalize Citation a bit more after seeing how it behaves differently. Not in this PR, but just eventually. This could take the form of adding the .detection? to all of our Detectors and/or normalizing what .detection returns. No changes are should be made as part of this work, but this difference made me pause and have to understand why it was done this way. It makes sense based on how Citation currently works but feels like it may be confusing in the future as to why it is different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that having a more standardized set of methods across our detectors would be helpful. Some differences might be unavoidable (the citation detector doesn't extract specific patterns, while the standard identifier one does), but looking now I think the others (which all treat the phrase holistically) could benefit from a .detection? method.

end

# Checks for Citation matches
#
# @param event [SearchEvent] an individual search event to check for matches
# @param matches [Hash] a Hash that keeps track of how many of each algorithm we match, to which this method will
# add.
# @return [Boolean] an indication of whether a citation was detector or not.
def match_citations(event, matches)
result = Detector::Citation.new(event.term.phrase)

matches[:citation] += 1 if result.detection?
result
end

# Checks for LCSH matches
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
class AddCitationToMetricsAlgorithms < ActiveRecord::Migration[7.1]
def change
add_column :metrics_algorithms, :citation, :integer
end
end
3 changes: 2 additions & 1 deletion db/schema.rb

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions test/fixtures/search_events.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ old_month_pmid:
term: pmid_38908367
source: test
created_at: <%= 1.year.ago %>
current_month_citation:
term: citation
source: test
current_month_issn:
term: issn_1075_8623
source: test
Expand Down
2 changes: 1 addition & 1 deletion test/fixtures/terms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ doi:
phrase: '10.1016/j.physio.2010.12.004'

isbn_9781319145446:
phrase: 'Sadava, D. E., D. M. Hillis, et al. Life: The Science of Biology. 11th ed. W. H. Freeman, 2016. ISBN: 9781319145446'
phrase: 'Sadava, D. E., D. M. Hillis, et al. Life The Science of Biology. 11th ed. W. H. Freeman, 2016. ISBN: 9781319145446'

journal_nature_medicine:
phrase: 'nature medicine'
Expand Down
24 changes: 24 additions & 0 deletions test/models/metrics/algorithms_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@

class Algorithms < ActiveSupport::TestCase
# Monthlies
test 'citation counts are included in monthly aggregation' do
aggregate = Metrics::Algorithms.new.generate(DateTime.now)

assert_equal 1, aggregate.citation
end

test 'dois counts are included in monthly aggregation' do
aggregate = Metrics::Algorithms.new.generate(DateTime.now)

Expand Down Expand Up @@ -85,6 +91,11 @@ class Algorithms < ActiveSupport::TestCase
# drop all searchevents to make math easier and minimize fragility over time as more fixtures are created
SearchEvent.delete_all

citation_expected_count = rand(1...100)
citation_expected_count.times do
SearchEvent.create(term: terms(:citation), source: 'test')
end

doi_expected_count = rand(1...100)
doi_expected_count.times do
SearchEvent.create(term: terms(:doi), source: 'test')
Expand Down Expand Up @@ -117,6 +128,7 @@ class Algorithms < ActiveSupport::TestCase

aggregate = Metrics::Algorithms.new.generate(DateTime.now)

assert_equal citation_expected_count, aggregate.citation
assert_equal doi_expected_count, aggregate.doi
assert_equal issn_expected_count, aggregate.issn
assert_equal isbn_expected_count, aggregate.isbn
Expand All @@ -126,6 +138,12 @@ class Algorithms < ActiveSupport::TestCase
end

# Total
test 'citation counts are included in total aggregation' do
aggregate = Metrics::Algorithms.new.generate

assert_equal 1, aggregate.citation
end

test 'dois counts are included in total aggregation' do
aggregate = Metrics::Algorithms.new.generate

Expand Down Expand Up @@ -178,6 +196,11 @@ class Algorithms < ActiveSupport::TestCase
# drop all searchevents to make math easier and minimize fragility over time as more fixtures are created
SearchEvent.delete_all

citation_expected_count = rand(1...100)
citation_expected_count.times do
SearchEvent.create(term: terms(:citation), source: 'test')
end

doi_expected_count = rand(1...100)
doi_expected_count.times do
SearchEvent.create(term: terms(:doi), source: 'test')
Expand Down Expand Up @@ -215,6 +238,7 @@ class Algorithms < ActiveSupport::TestCase

aggregate = Metrics::Algorithms.new.generate

assert_equal citation_expected_count, aggregate.citation
assert_equal doi_expected_count, aggregate.doi
assert_equal issn_expected_count, aggregate.issn
assert_equal isbn_expected_count, aggregate.isbn
Expand Down