Skip to content

Commit

Permalink
iNaturalist API/Zooniverse Integration (#3983)
Browse files Browse the repository at this point in the history
* Add webmock to Gemfile

* Add inat obs fixture

* Add client and spec

* Add Observation model and spec

* Add iNat API interface and spec

* Add webmock to spec_helper.rb

* Add SubjectImporter and spec

* Allow public access to update methods

* Expose total_results with an attr_reader

* Expose response full request_url with attr_reader

* Mine's funnier

* Use SubjectSetImport to track state

* Add InatImportWorker and spec

* a bit of cleanup

* liked mine better

* iNat import completion mailer and spec

* Worker for completion mailer

* remove line

* # frozen_string_literal: true

* Remove vestigial class constant

* Use instance vars to lookup ids

* Add specs for missing SubjectImporter params

* Use ss_importer method

* Sate the Hound

* cleanup

* Persist SSI in db so it's immediately updated

* Add some failure mode specs

* Split expects

* Remove unnecessary attr_reader

* Move no_change matcher def to spec_helper

* New route, controller, and spec

* Feed the Hound

* Hound

* más sabueso

* Upsert correctly if subject already exists in set

* Spaces for the Hound

* hound

* Fix specs for 5.1 (and a typo)

* extra      space

* Don't duplicate media on subject upserts

* typo

* Clearer and more useful check

* Add spec: count page fetches

* Hound

* Worker needs mini_mime require
  • Loading branch information
zwolf authored Nov 4, 2022
1 parent 17dada9 commit 2e1c8ec
Show file tree
Hide file tree
Showing 23 changed files with 895 additions and 2 deletions.
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,5 @@ group :test do
gem 'rspec-its'
gem 'rspec-rails'
gem 'spring-commands-rspec'
gem 'webmock'
end
8 changes: 8 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ GEM
connection_pool (>= 2.0)
redis (>= 3.1)
connection_pool (2.2.5)
crack (0.4.5)
rexml
crass (1.0.6)
dalli (3.2.0)
database_cleaner (1.99.0)
Expand Down Expand Up @@ -178,6 +180,7 @@ GEM
guard (~> 2.1)
guard-compat (~> 1.1)
rspec (>= 2.99.0, < 4.0)
hashdiff (1.0.1)
hashie (5.0.0)
honeybadger (5.0.1)
http-accept (1.7.0)
Expand Down Expand Up @@ -467,6 +470,10 @@ GEM
yard (~> 0.9.20)
warden (1.2.9)
rack (>= 2.0.9)
webmock (3.18.1)
addressable (>= 2.8.0)
crack (>= 0.3.2)
hashdiff (>= 0.4.0, < 2.0.0)
webrick (1.7.0)
websocket-driver (0.6.5)
websocket-extensions (>= 0.1.0)
Expand Down Expand Up @@ -551,6 +558,7 @@ DEPENDENCIES
ten_years_rails
uglifier (~> 4.2)
versionist (~> 2.0)
webmock
zoo_stream (~> 1.0.1)

BUNDLED WITH
Expand Down
14 changes: 14 additions & 0 deletions app/controllers/api/v1/inaturalist_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# frozen_string_literal: true

class Api::V1::InaturalistController < Api::ApiController
def import
subject_set = SubjectSet.find(params[:subject_set_id])

unless subject_set.project.owners_and_collaborators.include?(api_user.user)
raise Api::Unauthorized, 'Must be owner or collaborator to import'
end

InatImportWorker.perform_async(api_user.id, params[:taxon_id], params[:subject_set_id], params[:updated_since])
json_api_render(:ok, {})
end
end
22 changes: 22 additions & 0 deletions app/mailers/inat_import_completed_mailer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

class InatImportCompletedMailer < ApplicationMailer
layout false

def inat_import_complete(ss_import)
@user = User.find(ss_import.user_id)
@email_to = @user.email
@imported_count = ss_import.imported_count
project_id = ss_import.subject_set.project_id

lab_url_prefix = "#{Panoptes.frontend_url}/lab/#{project_id}"
@subject_set_lab_url = "#{lab_url_prefix}/subject-sets/#{ss_import.subject_set_id}"
@subject_set_name = ss_import.subject_set.display_name

@no_errors = ss_import.failed_count.zero?
import_status = @no_errors ? 'was successful!' : 'completed with errors'
subject = "Your iNaturalist subject import #{import_status}"

mail(to: @email_to, subject: subject)
end
end
2 changes: 0 additions & 2 deletions app/models/subject_set_import.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ def import!(manifest_row_count)
end
end

private

def save_imported_row_count(imported_row_count)
self.imported_count = imported_row_count
save! # ensure we touch updated_at for busting any serializer cache
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Hello,

Your iNaturalist subject import has finished processing.

<% if @no_errors %>
The iNaturalist observations have been imported successfully.
<% else %>
There were some errors when importing your iNaturalist observations.
<% end %>

<%= @imported_count %> subjects were imported into subject set '<%= @subject_set_name %>'.

To view them, visit: <%= @subject_set_lab_url %>

Cheers,
The Zooniverse Team

This is an automated email, please do not respond.

To manage your Zooniverse email subscription preferences visit https://zooniverse.org/settings

To unsubscribe to all Zooniverse messages please visit https://zooniverse.org/unsubscribe
Please be aware that the above link will unsubscribe you from ALL Zooniverse emails.
12 changes: 12 additions & 0 deletions app/workers/inat_import_completed_mailer_worker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# frozen_string_literal: true

class InatImportCompletedMailerWorker
include Sidekiq::Worker

sidekiq_options queue: :data_high

def perform(ss_import_id)
ss_import = SubjectSetImport.find(ss_import_id)
InatImportCompletedMailer.inat_import_complete(ss_import).deliver
end
end
46 changes: 46 additions & 0 deletions app/workers/inat_import_worker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# frozen_string_literal: true

class InatImportWorker
include Sidekiq::Worker

# skip retries for this job to avoid re-running imports with errors
sidekiq_options retry: 0, queue: :data_medium

def perform(user_id, taxon_id, subject_set_id, updated_since=nil)
inat = Inaturalist::ApiInterface.new(taxon_id: taxon_id, updated_since: updated_since)
importer = Inaturalist::SubjectImporter.new(user_id, subject_set_id)

# Use a SubjectSetImport instance to track progress & store data
ss_import = importer.subject_set_import

imported_row_count = 0
inat.observations.each do |obs|
begin
importer.import(obs)
rescue Inaturalist::SubjectImporter::FailedImport
ss_import.update_columns(
failed_count: ss_import.failed_count + 1,
failed_uuids: ss_import.failed_uuids | [obs.external_id]
)
end

imported_row_count += 1

# update the imported_count as we progress through the import so we can use
# this as a progress metric on API resource polling (see SubjectSetWorker)
ss_import.save_imported_row_count(imported_row_count) if (imported_row_count % update_progress_every_rows(inat.total_results)).zero?
end

ss_import.save_imported_row_count(imported_row_count)

# Count that subject set, like right now
SubjectSetSubjectCounterWorker.new.perform(subject_set_id)

# notify the user about the import success / failure
InatImportCompletedMailerWorker.perform_async(ss_import.id)
end

def update_progress_every_rows(total_results)
SubjectSetImport::ProgressUpdateCadence.calculate(total_results)
end
end
2 changes: 2 additions & 0 deletions config/routes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@

json_api_resources :subject_set_imports, links: [:subject_sets, :users], only: [:index, :show, :create]

post '/inaturalist/import', to: 'inaturalist#import', format: false

json_api_resources :collections, links: [:subjects, :default_subject]

json_api_resources :tags, only: [:index, :show]
Expand Down
64 changes: 64 additions & 0 deletions lib/inaturalist/api_interface.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# frozen_string_literal: true

module Inaturalist
class ApiInterface
require 'faraday'
require 'faraday_middleware'
require 'json'

# Set maximum imported subjects, or no limit with -1
attr_reader :taxon_id, :total_results, :observation_cache, :params

def initialize(taxon_id:, updated_since: nil, max_observations: -1)
@taxon_id = taxon_id
@max_observations = max_observations
@observation_cache = []
@id_above = 0
@params = { taxon_id: @taxon_id }
@params[:updated_since] = updated_since unless updated_since.nil?
@done = false
@total_results = nil
end

def observations
Enumerator.new do |yielder|
loop do
results = fetch_next_page
raise StopIteration if @done

results.each do |obs|
yielder.yield Observation.new(obs)
end
end
end
end

def fetch_next_page
page_params = @params.merge(id_above: @id_above)
response = client.get(page_params)
@total_results ||= response['total_results']
results = response['results']
# Stop if a) there are no more results
# b) the total number of desired subjects is hit
# c) the ID of the last seen observation is the same as the last result's id
@done = true if results.empty? || max_cache_hit? || @id_above == results.last['id']
return if @done

@observation_cache += results
@id_above = results.last['id']
@params['id_above'] = @id_above
results
end

def max_cache_hit?
# Short circuit to turn off limit
return false if @max_observations == -1

@observation_cache.size >= @max_observations
end

def client
@client ||= Client.new
end
end
end
42 changes: 42 additions & 0 deletions lib/inaturalist/client.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# frozen_string_literal: true

module Inaturalist
class Client
attr_reader :url, :request_url, :headers, :default_params

def initialize
@url = 'https://api.inaturalist.org/v1/observations'
@request_url = nil
@headers = { 'User-Agent' => 'zooniverse-import' }
@default_params = {
verifiable: true,
order: 'asc',
order_by: 'id',
per_page: 200
}
end

def get(params)
request_params = @default_params.merge(params)
conn = Faraday.new(
url: @url,
headers: @headers,
params: request_params
) do |f|
f.request :url_encoded
f.request :retry
f.response :raise_error
f.response :json
f.adapter Faraday.default_adapter
end

begin
response = conn.get
@request_url = response.env.url.to_s
conn.get.body
rescue Faraday::ClientError => e
raise e
end
end
end
end
56 changes: 56 additions & 0 deletions lib/inaturalist/observation.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# frozen_string_literal: true

module Inaturalist
class Observation
require 'mini_mime'

def initialize(obs)
@obs = obs
end

def external_id
@obs['id']
end

def metadata
@metadata ||= extract_metadata(@obs)
end

def extract_metadata(obs)
metadata = {}
metadata['id'] = obs['id']
metadata['change'] = 'No changes were made to this image.'
metadata['observed_on'] = obs['observed_on']
metadata['time_observed_at'] = obs['time_observed_at']
metadata['quality_grade'] = obs['quality_grade']
metadata['num_identification_agreements'] = obs['num_identification_agreements']
metadata['num_identification_disagreements'] = obs['num_identification_disagreements']
metadata['location'] = obs['location']
metadata['geoprivacy'] = obs['geoprivacy']
metadata['scientific_name'] = obs['taxon']['name']
metadata
end

def locations
@locations ||= extract_locations(@obs)
end

def extract_locations(obs)
locations = []
obs['photos'].each do |p|
url = p['url'].sub('square', 'original')
mimetype = mime_type_from_file_extension(url)
locations << { mimetype => url }
end
locations
end

def all_rights_reserved?
@obs['license_code'].nil?
end

def mime_type_from_file_extension(url)
MiniMime.lookup_by_filename(url).content_type
end
end
end
Loading

0 comments on commit 2e1c8ec

Please sign in to comment.