jump to navigation

Islandora and SOLR overview with troubleshooting tips September 3, 2013

Posted by ficial in islandora, techy.
trackback

We’ve been working on getting islandora (http://islandora.ca/) up and running for quite a while now. One of the challenges I’ve run into is that I didn’t have a general understanding of what are all the pieces involved. It turns out that one doesn’t just install islandora and call it done – the islandora piece is just one part (and a relatively small one at that) of the constellation of systems that need to be working correctly together to make an ‘islandora site’.

The Players

All the systems / components involved in making an islandora-based repository work; if anything goes wrong with any of these systems and/or the communication between them then the site stops working

  • drupal – technically mostly works out of the box, but actually pretty much useless without extensive customization
    • the islandora module set (within drupal)
      • ? solution packs that support particular kinds of data objects
      • ? various other modules/libraries on which islandora depends: tuque, openseadragon, etc.
    • the theme or themes used
      • ? various modules/components on which the theme depends
    • ! multi-site organization
    • * technologies used: drupal, islandora, PHP, xml, xslt, SPARQL
  • MySQL – relational DB system that supports drupal
    • * technologies used: MySQL, SQL
  • fedora commons – object storage system
    • fedora gsearch – index-based search tool that’s built in to fedora
    • * technologies used: apache tomcat (and all the config fun that comes with it), java, xml, xslt
  • SOLR/Lucene – search tool that’s built on top of fedora gsearch
    • * technologies used: tomcat, java, xml, xslt, SPARQL

We also have in our mix

  • LDAP
    • LDAP auth module for drupal

Here are some configuration-based communication failure points we’ve encountered and the diagnosis steps we took:

  • drupal – MySQL
    • check drupal’s settings.php to make sure the mysql server is indicated correctly and that the username and password are correct
      • this gets a bit more complicated for multi-site installs
    • check the firewall settings on your mysql machine and drupal machine to make sure they can talk with each other (e.g. each can be ping-ed from the other)
    • make sure mysql has been started with networking on
    • make sure you can use the mysql client to connect from the drupal machine to the mysql machine (mysql -h mysqlmachine.institution.edu -D databasename -u accountname -p)
    • make sure the mysql user has been granted appropriate priveleges to the/all databases being used
      • this gets a bit more complicated for multi-site installs
  • drupal – LDAP
    • make sure the network-level communication is working (e.g. the machine running your LDAP service can be ping-ed from your drupal machine)
    • make sure your settings in the drupal LDAP module are correct
  • drupal – islandora
    • make sure all the relevant islandora modules and their dependencies are installed and enabled
      • make sure the versions match
  • islandora – fedora
    • check the drupal settings (Islandora > Configuration)
      • check any namespace restrictions (Islandora > Configuration >> Namespaces)
    • make sure the network-level communication is working  (e.g. each machine can be ping-ed from the other)
    • check the filter.xml file on fedora to make sure that its connection values and query are correct
  • fedora – mysql
    • make sure the network-level communication is working  (e.g. each machine can be ping-ed from the other)
    • make sure you can use the mysql client to connect from the fedora machine to the mysql machine (mysql -h mysqlmachine.institution.edu -D databasename -u accountname -p)
    • on the fedora machine check in $FEDORA_HOME/server/config/fedora.fcfg to make sure the jdbcURL parameter refers to the MySQL server (by IP address)
  • gsearch – SOLR
    • check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt
      • and associated sub-xslt’s if your system is organized that way
  • islandora – SOLR
    • check the drupal settings (Islandora > Solr client)
      • should successfully connect to the SOLR server
      • should use the correct request handler (‘standard’ by default)
    • check the query defaults, including any namespace restrictions (Islandora > Solr client >> Query defaults)
    • check the SOLR confile files – on the fedora machine:
      • $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml to make sure the indexing is set up correctly
      • $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml to make sure the request handler is defined correctly

Data Flow

How info gets into the systems and back out

DATA IN-

  1. something exists outside the repository
  2. a collection admin navigates to the ingest form and enters various metadata about the object and uploads a digitized version of the object (e.g. an image file)
    • the ingest form depends on the solution packs installed and the form associations made
  3. the form processor takes the metadata and constructs FOXML
  4. the form process sends the FOXML and digital resource(s) to fedora
  5. fedora stores the object, which consists of metadata and one or more other resources / datastreams
    • at this point the object is accessible via the fedora admin interface and REST API
  6. gsearch indexes the newly stored object
    • at this point the object is accessible via the gsearch interface
    • the indexed fields are controlled by $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt
  7. SOLR processes the newly changed gsearch index
    • at this point the object is accessible via solr
      • able to be returned from queries
      • the object’s presence impacts the data shown in the SOLR schema browser
    • the indexing is controlled by $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml

DATA OUT-

NOTE: there are two primary path by which data arrives from the object-store: search results and direct object retrieval

DATA OUT: SEARCH-
  1. on the web site the user enters a search term and clicks submit
  2. the search handler builds a SPARQL query based on the search term
    • plus any constraints / defaults set in the SOLR client config (e.g. namespace limitation)
  3. the search handler sends the SPARQL to the SOLR server
  4. SOLR finds the item(s) and builds an chunk of XML that contains everything it knows about the item(s) – the data from all fields that it’s indexed (which is in turn limited/defined by the fields that gsearch indexes)
  5. the search handler receives the XML back
  6. the XML is converted into an HTML block (via XSLT? explicitly in PHP code? by some other mechanism?)
    • that conversion controls which of the data sent back is actually displayed to the user
  7. the block of HTML is inserted into the content place on a drupal page and is displayed to the user
DATA OUT: DIRECT RETRIEVAL
  1. on the web site the user clicks on a link to an object
  2. that links to the object display handler with parameters to identify which object
  3. the object display handler constructs a URL to fetch the object data from fedora
    • – REST API? some other system?
  4. the object display handler makes an HTTP request
  5. fedora gives back the FOXML it has for the object
  6. the object display handler process the FOXML to produce an HTML block (via XSLT? explicitly in PHP code? by some other mechanism?)
    • – in the course of processing it may make additional HTTP request to fetch additional info about/associated with the object (i.e. the objects datastreams – e.g. the object’s thumbnail image, or MODS metadata)
    • – some datastreams may be referenced directly in the HTML (or other specialized display tools – e.g. openseadragon viewer) via REST API URLs
    • ! I have no idea how openseadragon actually works / gets its info
  7. the block of HTML is inserted into the content place on a drupal page and is displayed to the user

Useful URLs

These URLs provide ways to interact directly with fedora, gsearch, and SOLR – these are EXTREMELY USEFUL for debugging & exploring. All of these URLs reference the machine on which you’ve installed fedora (and gsearch and SOLR). In these examples I’ll be using fedoramachine.institution.edu in the URL, but that will vary depending on the machine name used in your particular installation.

fedora admin panel/tool

NOTE: this is typically protected by an htaccess wall – you’ll need an account on fedoramachine.institution.edu to use this
NOTE: this is a flash-based tool
manage the repository (view/edit objects; ?maybe upload/ingest) directly (bypass drupal/islandora)
http://fedoramachine.institution.edu:8080/fedora/admin/

fedora REST API

NOTE: this is typically protected by an htaccess wall – you’ll need an account on fedoramachine.institution.edu to use this
read-only direct access to objects
http://fedoramachine.institution.edu:8080/fedora/objects/collectionname:23/
http://fedoramachine.institution.edu:8080/fedora/objects/collectionname:23/objectXML

where collectionname:23 is the PID of the object you’d like to access

gsearch

search the repository using gsearch (bypass SOLR (and drupal/islandora))
http://fedoramachine.institution.edu:8080/fedora/search

SOLR REST API

rebuild indexes, browse SOLR data (bypass drupal/islandora)
http://fedoramachine.institution.edu:8080/fedoragsearch/rest/

SOLR admin (not really admin management – admin read-only)

browse the schema, build / test queries (bypass drupal/islandora)
http://fedoramachine.institution.edu:8080/solr/admin/

SOLR search failure

If a SOLR search is not working there are many potential failure points / stages to check:

  1. is drupal/islandora communicating with fedora (check error logs, check islandora config screen; should be no errors and the config screen should say ‘Successfully connected to Fedora Server (Version 3.5).’)
    • on problems: make sure fedora is running, try a command-line ping from each machine to the other, ?????
  2. is the object making it into fedora (check fedora admin, fedora REST; should be able to navigate to the object or specify the URL directly, and then should see the object’s metadata)
    • on problems: check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
  3. does the object have all the right metadata (check fedora admin, fedora REST; verify value match what was entered in the form)
    • on problems: check the form being used (islandora > form builder), check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
  4. is the meta in the right places and correctly formatted (check fedora admin, fedora REST; verify that value are in the expected fields, verify that datastreams are appropriately inline or managed)
    • on problems: check the form being used (islandora > form builder), check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
    • NOTE: as of islandora 7.1 SOLR has problems with managed (type=”M”) datastreams, but should handle inlineXML (type=”X”)
  5. is gsearch correctly indexing the object (check gsearch directly; verify that the expected fields show up, and that the object can be found by searching on a known field-value combination)
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, ?????
  6. is SOLR correctly indexing the object (use the SOLR REST API to reindex the given PID (http://fedoramachine.institution.edu:8080/fedoragsearch/rest?operation=updateIndex  – 4th action box down), then verify that the expected field names show up in the field name drop down in the browse index page (http://fedoramachine.institution.edu:8080/fedoragsearch/rest?operation=browseIndex), and then that expected object shows up in the results list (though clicking on the provided link may not work – seems to be some basic issue with the gFindObjects page))
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, check $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml, check $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml, ?????
  7. is the object fetch-able via SPARQL (check the SOLR admin page (http://fedoramachine.institution.edu:8080/solr/admin/) and build a simple query to get the object via PID (e.g. ‘PID:”collectionname:23″‘)) and then check to see if it shows up in a multi-result list (e.g. ‘format:”cambio”‘)
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, check $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml, check $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml, ?????
  8. back on the drupal/islandora side, is the SOLR client correctly connecting to the SOLR server (check error logs, check islandora solr client screen; should say ‘Successfully connected to Solr server at fedoramachine.institution.edu:8080/solr’)
    • on problems: ?????
Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: