Skip to content

Lesson: build a book model

hellbunnie edited this page Apr 8, 2013 · 25 revisions

This lesson is known to work with hydra-head version 6.0.0.
Please update this wiki to reflect any other versions that have been tested.

Goals

  • Define a simple OM Terminology for Book Metadata that we will track as XML Datastreams
  • Start the Rails console and run code interactively in the console
  • Create Datastream objects that use your OM Terminology
  • Define an ActiveFedora Model for Book objects
  • Declare a Datastream called descMetadata on your Book model and make it use your Book Metadata Terminology
  • Delegate methods from Book objects to their descMetadata Datastream
  • Create Book objects that use your Book Model
  • See how an object has been indexed into Solr
  • See how & where objects and metadata are stored in Fedora
  • Use OM to Manage how your Metadata is indexed in Solr
  • Re-index objects into Solr (update Solr based on any changes to an object, its Model, or the OM Terminologies it uses)

Explanation

In fedora an object can have many 'datastreams' which are either content for the object or metadata about the object. We are going to create a 'book' object. This object will have a metadata datastream which will contain some XML that describes the properties of the book. We'll call this datastream 'descMetadata'. You are free to call it whatever you like, but 'descMetadata' is a loose convention that stands for 'descriptive metadata'.

Once you've created an object and saved it in Fedora, you also want to be able to search for it in Solr. ActiveFedora and OM make it easy to get your metadata into Solr and manage if/when/how your metadata is indexed.

Steps

Step 1: Create an OM Terminology for Book Metadata

First we'll create a Ruby class that represents this descriptive metadata. Make a new directory for our datastreams by typing in

$> mkdir app/models/datastream

Now we'll create a file called app/models/datastream/book_metadata.rb

Paste the following code into that file:

class Datastream::BookMetadata < ActiveFedora::OmDatastream

  set_terminology do |t|
    t.root(path: "fields")
    t.title
    t.author
  end

  def self.xml_template
    Nokogiri::XML.parse("<fields/>")
  end
end

This class extends from OmDatastream. OM is a gem that allows us to describe the format of an xml file and access properties. We are using OM by calling the set_terminology method. The xml_template method tells OM how to create a new xml document for this class.

Step 2: Start the Rails console

Let's take a look at how this class works. We'll start the rails console by typing

$> rails console

You should see Loading development environment (Rails 3.2.11). Now you're in a "REPL", or interactive ruby console that has all of your Rails application's code and configuration loaded.

Step 3: In the console, create Datastream objects that use your OM Terminology

Let's create a new BookMetadata instance. I've shown the expected output after each command:

d = Datastream::BookMetadata.new
 => #<Datastream::BookMetadata @pid="" @dsid="" @controlGroup="X" changed="false" @mimeType="text/xml" > 
d.title = "ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know."
 => "ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know." 
d.author = "Horn, Zoia"
 => "Horn, Zoia" 
d.to_xml
 => "<fields>\n  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>\n  <author>Horn, Zoia</author>\n</fields>"

Once you're done, exit the console by typing exit

Step 4: Define a Book Model

Now let's, create a model that uses this datastream. Create a new file at app/models/book.rb. We'll paste in this code:

class Book < ActiveFedora::Base
  has_metadata 'descMetadata', type: Datastream::BookMetadata

  delegate :title, to: 'descMetadata'
  delegate :author, to: 'descMetadata'

end

We've defined our Book model to use the Datastream::BookMetadata for it's datastream called 'descMetadata'. We're telling the model to use the descMetadata as the delegate for the properties 'title' and 'author'

Step 5: In the console, create Datastream objects that use your OM Terminology

Now we'll open the rails console again and see how to work with our book.

b = Book.create(title: 'Anna Karenina', author: 'Tolstoy, Leo')
 => #<Book pid:"changeme:1", title:["Anna Karenina"], author:["Tolstoy, Leo"]> 

We've created a new Book object in the repository. You can see that it has a pid of 'changeme:1'. The because you set title and author to delegate to the descMetadata datastream, they are stored in that datastream's XML and can be accessed either through the delegated methods on the Book, or by going specifically to the datastream.

b.descMetadata
 => #<Datastream::BookMetadata @pid="changeme:1" @dsid="descMetadata" @controlGroup="M" changed="false" @mimeType="text/xml" > 
b.title
 => ["Anna Karenina"] 
b.author
 => ["Tolstoy, Leo"] 
b.descMetadata.title
 => ["Anna Karenina"] 
b.descMetadata.author
 => ["Tolstoy, Leo"] 

Step 6: See what your Book objects look like in Fedora and Solr

If we go to http://localhost:8983/fedora/objects/changeme:1 we should see what it looks like in fedora. Note especially that the xml datastream has been ingested http://localhost:8983/fedora/objects/changeme:1/datastreams/descMetadata/content

Let's also see that this book has been ingested into the Solr search index. http://localhost:8983/solr/select?q=changeme:1. However at this point, the title and author have not been stored in solr. We should modify our BookMetadata datastream to provide those instructions.

Step 7: See how your Book metadata are indexed into Solr

The to_solr method is what generates the solr document for your objects and their datastreams. To see the full solr document for the book we created, call

b.to_solr

To see just the part of the solr document that comes from the descMetadata datastream (which has our book title and author), call

b.descMetadata.to_solr
 => {} 

As you can see, the descMetadata datastream is returning an empty Hash, meaning that it isn't indexing the author and title values.

Step 8: Change how your Book metadata are indexed into Solr

To make the BookMetadata Terminology index the author and title fields, you need to reopen app/models/datastream/book_metadata.rb and change the terminology section to look like this:

  set_terminology do |t|
    t.root(path: "fields")
    t.title(index_as: :stored_searchable)
    t.author(index_as: :stored_searchable)
  end

Note: Because we have made changes to our Ruby code that we want to use, we need to restart the Rails console so that it will reload all of the code, including our latest changes.

Now, restart the rails console and we can load the object we previously created:

b = Book.find('changeme:1')
 => #<Book pid:"changeme:1", title:["Anna Karenina"], author:["Tolstoy, Leo"]> 

Check and see that to_solr includes the title and author fields.

b.descMetadata.to_solr
 => {"fields_title_tesim"=>["Anna Karenina"], "fields_0_title_tesim"=>["Anna Karenina"], "fields_author_tesim"=>["Tolstoy, Leo"], "fields_0_author_tesim"=>["Tolstoy, Leo"], "title_tesim"=>["Anna Karenina"], "author_tesim"=>["Tolstoy, Leo"]} 

These field names probably look a little strange, but the main thing to observe is that now when you call .to_solr on a BookMetadata datastream it returns a solr document with fields named title_tesim and author_tesim that contain your title and author values. Those are the field names that we will add to Blacklight's queries in Lesson: Make Blacklight Return Search Results.

Aside: If you are really curious about those strange suffixes on the field names, they are provided by solrizer, so you can read about them in the solrizer documentaton. In short, the _tesim suffix tells Solr to treat the values as text in the english language that should be stored, indexed and allowed to be multivalued. This _tesim suffix is a useful catch-all that gets your searches working predictably with minimal fuss. As you encounter cases where you need to index your content in more nuanced ways, there are ways to change these suffixes in order to achieve different results in Solr.

Step 9: Re-index an object in Solr

Now we'll call the update_index method, which republishes the Solr document using the changes we've made.

b.update_index
 => {"responseHeader"=>{"status"=>0, "QTime"=>25}} 

If you refresh the document result from solr you should see that these fields have been added to the solr_document:

<arr name="fields_title_tesim">
  <str>Anna Karenina</str>
</arr>
<arr name="fields_0_title_tesim">
  <str>Anna Karenina</str>
</arr>
<arr name="fields_author_tesim">
  <str>Tolstoy, Leo</str>
</arr>
<arr name="fields_0_author_tesim">
  <str>Tolstoy, Leo</str>
</arr>
<arr name="title_tesim">
  <str>Anna Karenina</str>
</arr>
<arr name="author_tesim">
  <str>Tolstoy, Leo</str>
</arr>

Step 10: Commit your changes

Now that we've got our model working, it's a great time to commit to git:

$> git add .
$> git commit -m "Created a book model and a datastream"

Next Step

Go on toLesson: Turn Off Access Controls or return to the Dive into Hydra page.

Clone this wiki locally