jump to navigation

Islandora 7 – SOLR Faceting by Collection Name (or Label) April 14, 2014

Posted by ficial in islandora, techy.
Tags: , , , , , , ,
add a comment

One of the basic features we need from our Islandora system is the ability to facet search results by collection name. Getting this working turns out to be a non-trivial project. The essential problem is that although objects in islandora know in which collection(s) they reside, that information is stored in the object only as a relationship identified by a computer-y PID. If one uses that relationship directly to do faceting one gets something that looks like ‘somenamespace:stuffcollection’, rather than the actual name of the collection ‘Our Collection of Stuff’. In brief, the solution I used was to alter the way the objects were processed by fedoragsearch to send actual collection names rather than just PID info. I did this by extending the RELS-EXT processing to load and use the relevant collection data when handling isMemberOfCollection fields.

The faceting options that are available are determined by what information is in the SOLR indexes – faceting is NOT driven directly by the fedora object store! To allow faceting by collection name we need to tell SOLR the names of the collection(s) of the object. This means that, similar to getting full text search working, we need to touch both fedoragsearch system to deliver the desired info to SOLR, and the the SOLR config info to make the desired fields available for searching (and faceting).

In our fedoragsearch set up we already had pieces in place to process the RELS-EXT info, which is where collection membership (among other things) resides. This part of the objects FOXML looks somthing like this:

  <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
    <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="Fedora Object to Object Relationship Metadata." CREATED="2013-11-08T15:49:50.889Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="548">
      <foxml:xmlContent>
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:islandora="http://islandora.ca/ontology/relsext#">
          <rdf:Description rdf:about="info:fedora/somenamespace:52">
            <fedora:isMemberOfCollection rdf:resource="info:fedora/somenamespace:projectid"/>
            <fedora-model:hasModel rdf:resource="info:fedora/islandora:sp-audioCModel"/>
          </rdf:Description>
        </rdf:RDF>
      </foxml:xmlContent>
    </foxml:datastreamVersion>
  </foxml:datastream>

where the object has a PID of ‘somenamespace:52′ and is a member of the collection with PID ‘somenamespace:projectid’.

In the main gsearch_solr folder we have a sub-folder called islandora_transforms, in which there is a file called RELS-EXT_to_solr.xslt. This file is used by demoFoxmlToSolr.xslt via a straightforward include:

  <xsl:include href="/usr/local/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/islandora_transforms/RELS-EXT_to_solr.xslt"/>

which intially was just this:

  <?xml version="1.0" encoding="UTF-8"?>
  <!-- RELS-EXT -->
  <xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foxml="info:fedora/fedora-system:def/foxml#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    exclude-result-prefixes="rdf">
    <xsl:template match="foxml:datastream[@ID='RELS-EXT']/foxml:datastreamVersion[last()]" name='index_RELS-EXT'>
    <xsl:param name="content"/>
    <xsl:param name="prefix">RELS_EXT_</xsl:param>
    <xsl:param name="suffix">_ms</xsl:param>
    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_uri', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="@rdf:resource"/>
      </field>
    </xsl:for-each>
    <xsl:for-each select="$content//rdf:Description/*[not(@rdf:resource)][normalize-space(text())]">
    <field>
        <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_literal', $suffix)"/>
        </xsl:attribute>
      <xsl:value-of select="text()"/>
        </field>
    </xsl:for-each>
  </xsl:template>
  </xsl:stylesheet>

The initial version of this file just directly processes the contents of the RELS-EXT datastream of the object’s FOXML, eventually creating the SOLR fields RELS_EXT_isMemberOfCollection_uri_ms/mt and RELS_EXT_hasModel_uri_ms/mt (fedoragsearch created the _uri info, which SOLR extends to the _ms/_mt versions). We can facet directly on those to get the desired breakdowns by collection (and by model, for that matter), but the text presented to the user is basically meaningless. So, I added some code to load the actual collection data for each isMemberOfCollection relation, and then pulled the human-readable collection title from that.

From my perpsective there were three particularly tricky parts to this (further complicated by my limited proficiency/understanding of XSLT and XPATH). First, how do I catch all the memberships and nothing else. Second, how do I get the actual collection PID. Third, how do I pull in and process additional content based on that PID. In bulling my ways past these obstacles I ended up with code that I’m dead sure isn’t as pretty or efficient as it could be, but on the plus side it works for me.

In step one I re-used the looping example already in the file to look through all the description sub-fields that have a resource attribute, which processes this data:

  <rdf:Description rdf:about="info:fedora/somenamespace:52">
    <fedora:isMemberOfCollection rdf:resource="info:fedora/somenamespace:projectid"/>
    <fedora-model:hasModel rdf:resource="info:fedora/islandora:sp-audioCModel"/>
  </rdf:Description>

and hits both the fedora:isMemberOfCollection and fedora-model:hasModel fields. To make sure I’m not accidentially processing models I added an if test that examines the name of the field and makes sure I’m only proceeding with further work on isMemberOfCollection fields (NOTE: I’ll probably be adding a branch for processing hasModel at some point as well – the code will be very similar). Once I’ve ensured that I’m working with the right data I need to get the PID of the collection. This part baffled me for a while because I hadn’t noticed that the value of that field wasn’t just the collection PID, it had ‘info:fedora/’ prepended to it. Once I realized what was going on I used a simple substring to pull out only the PID part. Lastly, I needed to pull and process the collection with that PID in order to get its human-readable title. Luckily I had an analagous example of that kind of thing in the external datastream processing that happens in demoFoxmlToSolr.xslt – I loaded the collection FOXML into a local variable, then processed that to pull out the title. Finally, once I’d grabbed all the text I needed, I created the appropriate fields to send on to SOLR (the collection_membership. prefix I used is one I just made up on the spot – there’s nothing special about it and it’s entirely possible that there’s some other structure/naming scheme I should be using instead). The final, modified RELS-EXT_to_solr.xslt looks like this:

  <?xml version="1.0" encoding="UTF-8"?>
  <!-- RELS-EXT -->
  <xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foxml="info:fedora/fedora-system:def/foxml#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    exclude-result-prefixes="rdf">

    <xsl:template match="foxml:datastream[@ID='RELS-EXT']/foxml:datastreamVersion[last()]" name='index_RELS-EXT'>
    <xsl:param name="content"/>
    <xsl:param name="prefix">RELS_EXT_</xsl:param>
    <xsl:param name="suffix">_ms</xsl:param>

    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_uri', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="@rdf:resource"/>
      </field>
    </xsl:for-each>

    <xsl:for-each select="$content//rdf:Description/*[not(@rdf:resource)][normalize-space(text())]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_literal', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="text()"/>
      </field>
    </xsl:for-each>

    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">

      <xsl:if test="local-name()='isMemberOfCollection'">
      <xsl:variable name="collectionPID" select="substring-after(@rdf:resource,'info:fedora/')"/>
      <xsl:variable name="collectionContent" select="document(concat($PROT, '://', $FEDORAUSERNAME, ':', $FEDORAPASSWORD, '@', $HOST, ':', $PORT,'/fedora/objects/', $collectionPID, '/datastreams/', 'DC', '/content'))"/>

      <field name="collection_membership.pid_ms">
        <xsl:value-of select="$collectionPID"/>
      </field>

      <xsl:for-each select="$collectionContent//dc:title">
        <xsl:if test="local-name()='title'">
        <field name="collection_membership.title_ms">
          <xsl:value-of select="text()"/>
        </field>
        <field name="collection_membership.title_mt">
          <xsl:value-of select="text()"/>
        </field>
        </xsl:if>
      </xsl:for-each>

      </xsl:if>
  <!--
      <xsl:if test="local-name()='hasModel'">
      <xsl:variable name="modelPID" select="substring-after(@rdf:resource,'info:fedora/')"/>
      <field name="CSW_test_if_model">
        <xsl:value-of select="$modelPID"/>
      </field>
      </xsl:if>
  -->
    </xsl:for-each>

    </xsl:template>

  </xsl:stylesheet>

Similar to the approach I used for the OCR text work, I was able to watch the fedoragsearch logs and verify that the output was as expected/needed. The changes on the SOLR side are pretty minor. I added a couple of lines to schema.xml to handle the new fields:

<field name="collection_membership.title_ms" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="collection_membership.title_mt" type="text_fgs" indexed="true" stored="true" multiValued="true"/>

and, though not necessary for the faceting aspect of this work, I added collection_membership.title_mt to the field list of the standard search request handler in solrconfig.xml:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
  <!-- default values for query parameters -->
  <lst name="defaults">
  <str name="echoParams">explicit</str>
  <str name="fl">*</str>
  <str name="q.alt">*:*</str>
  <str name="qf">
  PID
  dc.title
  ....
  ....
  collection_membership.title_mt
  </str>
  </lst>
</requestHandler>

The final step is to re-index everything, and add the collection_membership.title_ms field to the list of facet fields in the web-based islandora solr config tool (Islandora > Solr index > Solr settings > Facet settings; add collection_membership.title_ms to the Facet fields, and give it a label of Collection).

And that’s that. If anyone has any suggestions/thoughts about how to improve my XSLT I’d be thrilled to hear them.

Islandora 7 – Making OCR text searchable via SOLR February 21, 2014

Posted by ficial in islandora, techy.
Tags: , , , , , ,
add a comment

We recently tackled the issue of making OCR-ed text searchable for Islandora 7. I had difficulty finding solid, targeted answers online, so here’s what we did in case anyone else needs to do this – hopefully you will be able to recoup some of the time I spent figuring this out. :)

First, a quick re-cap of OCR data-flow and how to inspect the parts of that data flow:

  1. We have an image that has some text (verify by visual examination of the image)
  2. That image is ingested and creates an object with a PID (verify by inspection of the fedora repository – e.g. http://fedorablahblahblah:8080/fedora/objects/PID, where PID looks like namespace:number (e.g. wonderfulcollection:43))
  3. As a part of the ingest process the OCR tool runs and creates a managed datastream on the object; that datastream references the actual text generated (verify by lookign at the foxml of the object and that datastream of the object – http://fedorablahblahblah:8080/fedora/objects/PID/objectXML, and http://fedorablahblahblah:8080/fedora/objects/PID/datastreams/OCR/content)
  4. The gsearch utility runs and pulls the FOXML of the newly created object and uses an xslt to generate from that an update request that’s sent to SOLR; that update request contains all the data that SOLR is to index (verify by looking at the fedora gsearch log $FEDORA_ROOT/server/logs/fedoragsearch.log to see the XML that’s being sent to SOLR)
  5. SOLR processes the request and puts the data in its indices (based on entries in its schema.xml file) (verify by looking at the solr admin tool schema browser – http://fedorablahblahblah:8080/solr/admin/schema.jsp and finding the OCR field (open the FIELDS navigation element on the left, then do a text search on the page for ocr) and making sure it has indexed at least one document)
  6. The indexed fields are available to be searched and returned based on the request handlers defined (in the solr_config.xml file – verify by finding (or adding) the name of the ocr field in the defualt search handler)
  7. The islandora SOLR module is configured to use those fields (verfiy by searching for a string in the OCR-ed text and having the expected object in the search results; adjust the solr setting to have the ocr field available as an advanced search term and to adjust the default sort weights as desired)

To make this work there are 3 main parts that need to be resolved. First, the OCR process needs to work. Second, gsearch needs to send the resulting text to SOLR. Third, SOLR needs to process it and make it available for searching.

I. Making OCR work on ingest

This was something that I didn’t need to deal with because the excellent folks at Common Media did it for us – I’ll edit this to add a recap of the process/steps soon.

II. Making fedora gsearch handle the OCR-ed text

This is the part that gave us the most trouble.

The gsearch process works like this:

a. a new object is ingested into fedora
b. fedora sends a message to gsearch
c. gsearch runs an xslt that processes the objects FOXML to create XML for an add request to SOLR

Parts a and b we don’t really have to worry about here since there are no special changes that need to be made to accommodate OCR-ed text. Part c is where the complications lie. Fedora gsearch uses a file called demoFoxmlToSolr.xslt in $FEDORA_ROOT/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/. In that file we had to add a new template to create the OCR field:

  <xsl:template match="foxml:datastream[@ID='OCR']/foxml:datastreamVersion[last()]" name="index_text_nodes_as_a_text_field">
    <xsl:param name="content"/>
    <xsl:param name="prefix">ocr.</xsl:param>
    <xsl:param name="suffix"></xsl:param>
    <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, ../@ID , $suffix)"/>
      </xsl:attribute>
      <xsl:variable name="text" select="normalize-space($content)"/>
      <!-- Only output non-empty text nodes (followed by a single space) -->
      <xsl:if test="$text">
        <xsl:value-of select="$text"/>
        <xsl:text> </xsl:text>
      </xsl:if>
    </field>
  </xsl:template>

and then we needed to call that template for the appropriate content and with the appropriate parameter:

  <xsl:when test="@CONTROL_GROUP='M' and @ID!='OCR'">
    <xsl:apply-templates select="foxml:datastreamVersion[last()]">
      <xsl:with-param name="content" select="document(concat($PROT, '://', $FEDORAUSERNAME, ':', $FEDORAPASSWORD, '@', $HOST, ':', $PORT, '/fedora/objects/', $PID, '/datastreams/', @ID, '/content'))"/>
    </xsl:apply-templates>
  </xsl:when>
  <!-- NOTE: OCR data is plain text rather than XML, so we can't use the document() function as above to get it -
       need exts:getDatastreamText() instead -->
  <xsl:when test="@CONTROL_GROUP='M' and @ID='OCR'">
    <xsl:apply-templates select="foxml:datastreamVersion[last()]">
      <xsl:with-param name="content" select="exts:getDatastreamText($PID, $REPOSITORYNAME,
      @ID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/>
    </xsl:apply-templates>
  </xsl:when>

In our initial attempts at this we ran into problems in both places. The template that generated the ocr field wasn’t working corrently, and we hadn’t realized that the document() function only work on XML docs and not on plain text. As a result we were getting the ocr.OCR field created, but it would have no content. After getting some information help from others and an extended session of experimenting and debugging on our end we arrived at the above working code.

DEBUGGING/DEVELOPMENT NOTES: If you have to do work in this area keep in mind that any changes to demoFoxmlToSolr.xslt will require a restart of fedora to take effect. To run the file we used the gsearch REST API (http://fedorablahblahblah:8080/fedoragsearch/rest?operation=updateIndex) and repeatedly deleted and added a known PID from/to the index (bottom two forms on that page). While we did that we watched the gsearch log file (tail -f $FEDORA_ROOT/server/logs/fedoragsearch.log) to see the resulting XML. If demoFoxmlToSolr.xslt has errors then trying to get to the REST API pages will give various nasty and confusing error messages.

III. Making SOLR handle the OCR-ed text

This involves changes in the SOLR config files in $FEDORA_ROOT/gsearch_solr/solr/conf – schema.xml and solrconfig.xml. The former file essentially controls how data is organized / handled / indexed. The latter file controls which data is accessible to search. The changes here are actually pretty easy. First, add an entry in schema.xml to handle the ocr field that gsearch is sending to SOLR:

 <fields> 
    ....
    <dynamicField name="ocr*" type="text_fgs"    indexed="true"  stored="true" multiValued="true"/>
    ....    
 </fields>

To make sure this is working, restart fedora (needed to make the above addition/change take effect) then delete and add/update the index for a single PID as described above. After that you should be able to check the SOLR schema browser (http://fedorablahblahblah:8080/solr/admin/schema.jsp) and find your OCR field and see that it has one document.

Once you’ve made sure it’s working you’ll need to update/recreate your SOLR indexes:

  1. go to the gsearch REST API update index page http://fedorablahblahblah:8080/fedoragsearch/rest?operation=updateIndex
  2. click the updateIndex createEmpty button
  3. in a console session on your fedora server stop fedora (/etc/init.d/fedora stop)
  4. in that console remove (or set aside) the old solr index (rm -rf $FEDORA_ROOT/gsearch_solr/solr/data/index)
  5. in that console start fedora (/etc/init.d/fedora start)
  6. on the REST API update index page, click updateIndex fromFoxmlFiles
  7. wait for the index to finish updating (you can open another instance of the REST API update index page and to see the progress in the ‘Resulting number of index documents’ table cell – refresh that page periodically to see how many are indexed)

After that you should be able to check the SOLR schema browser (http://fedorablahblahblah:8080/solr/admin/schema.jsp) and find your OCR field and see that it has the expected number of documents indexed.

Lastly, you need to edit solrconfig.xml to have searches actually check that field. In that file find the standard request handler and add the field name (ocr.OCR) to the list of default fields:

  <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="fl">*</str>
      <str name="q.alt">*:*</str>
      <str name="qf">
      PID
      dc.title
      ...
      ocr.OCR
      ...
      </str>
    </lst>
  </requestHandler>

Resart fedora, and you should now be able to do basic searches in your islandora site for strings in the OCR-ed text and have the expected object be in the search results. I recommend also adjusting your islandora solr search settings to add the ocr field to your advanced search fields.

Islandora and SOLR overview with troubleshooting tips September 3, 2013

Posted by ficial in islandora, techy.
add a comment

We’ve been working on getting islandora (http://islandora.ca/) up and running for quite a while now. One of the challenges I’ve run into is that I didn’t have a general understanding of what are all the pieces involved. It turns out that one doesn’t just install islandora and call it done – the islandora piece is just one part (and a relatively small one at that) of the constellation of systems that need to be working correctly together to make an ‘islandora site’.

The Players

All the systems / components involved in making an islandora-based repository work; if anything goes wrong with any of these systems and/or the communication between them then the site stops working

  • drupal – technically mostly works out of the box, but actually pretty much useless without extensive customization
    • the islandora module set (within drupal)
      • ? solution packs that support particular kinds of data objects
      • ? various other modules/libraries on which islandora depends: tuque, openseadragon, etc.
    • the theme or themes used
      • ? various modules/components on which the theme depends
    • ! multi-site organization
    • * technologies used: drupal, islandora, PHP, xml, xslt, SPARQL
  • MySQL – relational DB system that supports drupal
    • * technologies used: MySQL, SQL
  • fedora commons – object storage system
    • fedora gsearch – index-based search tool that’s built in to fedora
    • * technologies used: apache tomcat (and all the config fun that comes with it), java, xml, xslt
  • SOLR/Lucene – search tool that’s built on top of fedora gsearch
    • * technologies used: tomcat, java, xml, xslt, SPARQL

We also have in our mix

  • LDAP
    • LDAP auth module for drupal

Here are some configuration-based communication failure points we’ve encountered and the diagnosis steps we took:

  • drupal – MySQL
    • check drupal’s settings.php to make sure the mysql server is indicated correctly and that the username and password are correct
      • this gets a bit more complicated for multi-site installs
    • check the firewall settings on your mysql machine and drupal machine to make sure they can talk with each other (e.g. each can be ping-ed from the other)
    • make sure mysql has been started with networking on
    • make sure you can use the mysql client to connect from the drupal machine to the mysql machine (mysql -h mysqlmachine.institution.edu -D databasename -u accountname -p)
    • make sure the mysql user has been granted appropriate priveleges to the/all databases being used
      • this gets a bit more complicated for multi-site installs
  • drupal – LDAP
    • make sure the network-level communication is working (e.g. the machine running your LDAP service can be ping-ed from your drupal machine)
    • make sure your settings in the drupal LDAP module are correct
  • drupal – islandora
    • make sure all the relevant islandora modules and their dependencies are installed and enabled
      • make sure the versions match
  • islandora – fedora
    • check the drupal settings (Islandora > Configuration)
      • check any namespace restrictions (Islandora > Configuration >> Namespaces)
    • make sure the network-level communication is working  (e.g. each machine can be ping-ed from the other)
    • check the filter.xml file on fedora to make sure that its connection values and query are correct
  • fedora – mysql
    • make sure the network-level communication is working  (e.g. each machine can be ping-ed from the other)
    • make sure you can use the mysql client to connect from the fedora machine to the mysql machine (mysql -h mysqlmachine.institution.edu -D databasename -u accountname -p)
    • on the fedora machine check in $FEDORA_HOME/server/config/fedora.fcfg to make sure the jdbcURL parameter refers to the MySQL server (by IP address)
  • gsearch – SOLR
    • check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt
      • and associated sub-xslt’s if your system is organized that way
  • islandora – SOLR
    • check the drupal settings (Islandora > Solr client)
      • should successfully connect to the SOLR server
      • should use the correct request handler (‘standard’ by default)
    • check the query defaults, including any namespace restrictions (Islandora > Solr client >> Query defaults)
    • check the SOLR confile files – on the fedora machine:
      • $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml to make sure the indexing is set up correctly
      • $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml to make sure the request handler is defined correctly

Data Flow

How info gets into the systems and back out

DATA IN-

  1. something exists outside the repository
  2. a collection admin navigates to the ingest form and enters various metadata about the object and uploads a digitized version of the object (e.g. an image file)
    • the ingest form depends on the solution packs installed and the form associations made
  3. the form processor takes the metadata and constructs FOXML
  4. the form process sends the FOXML and digital resource(s) to fedora
  5. fedora stores the object, which consists of metadata and one or more other resources / datastreams
    • at this point the object is accessible via the fedora admin interface and REST API
  6. gsearch indexes the newly stored object
    • at this point the object is accessible via the gsearch interface
    • the indexed fields are controlled by $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt
  7. SOLR processes the newly changed gsearch index
    • at this point the object is accessible via solr
      • able to be returned from queries
      • the object’s presence impacts the data shown in the SOLR schema browser
    • the indexing is controlled by $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml

DATA OUT-

NOTE: there are two primary path by which data arrives from the object-store: search results and direct object retrieval

DATA OUT: SEARCH-
  1. on the web site the user enters a search term and clicks submit
  2. the search handler builds a SPARQL query based on the search term
    • plus any constraints / defaults set in the SOLR client config (e.g. namespace limitation)
  3. the search handler sends the SPARQL to the SOLR server
  4. SOLR finds the item(s) and builds an chunk of XML that contains everything it knows about the item(s) – the data from all fields that it’s indexed (which is in turn limited/defined by the fields that gsearch indexes)
  5. the search handler receives the XML back
  6. the XML is converted into an HTML block (via XSLT? explicitly in PHP code? by some other mechanism?)
    • that conversion controls which of the data sent back is actually displayed to the user
  7. the block of HTML is inserted into the content place on a drupal page and is displayed to the user
DATA OUT: DIRECT RETRIEVAL
  1. on the web site the user clicks on a link to an object
  2. that links to the object display handler with parameters to identify which object
  3. the object display handler constructs a URL to fetch the object data from fedora
    • - REST API? some other system?
  4. the object display handler makes an HTTP request
  5. fedora gives back the FOXML it has for the object
  6. the object display handler process the FOXML to produce an HTML block (via XSLT? explicitly in PHP code? by some other mechanism?)
    • - in the course of processing it may make additional HTTP request to fetch additional info about/associated with the object (i.e. the objects datastreams – e.g. the object’s thumbnail image, or MODS metadata)
    • - some datastreams may be referenced directly in the HTML (or other specialized display tools – e.g. openseadragon viewer) via REST API URLs
    • ! I have no idea how openseadragon actually works / gets its info
  7. the block of HTML is inserted into the content place on a drupal page and is displayed to the user

Useful URLs

These URLs provide ways to interact directly with fedora, gsearch, and SOLR – these are EXTREMELY USEFUL for debugging & exploring. All of these URLs reference the machine on which you’ve installed fedora (and gsearch and SOLR). In these examples I’ll be using fedoramachine.institution.edu in the URL, but that will vary depending on the machine name used in your particular installation.

fedora admin panel/tool

NOTE: this is typically protected by an htaccess wall – you’ll need an account on fedoramachine.institution.edu to use this
NOTE: this is a flash-based tool
manage the repository (view/edit objects; ?maybe upload/ingest) directly (bypass drupal/islandora)

http://fedoramachine.institution.edu:8080/fedora/admin/

fedora REST API

NOTE: this is typically protected by an htaccess wall – you’ll need an account on fedoramachine.institution.edu to use this
read-only direct access to objects

http://fedoramachine.institution.edu:8080/fedora/objects/collectionname:23/

http://fedoramachine.institution.edu:8080/fedora/objects/collectionname:23/objectXML

where collectionname:23 is the PID of the object you’d like to access

gsearch

search the repository using gsearch (bypass SOLR (and drupal/islandora))

http://fedoramachine.institution.edu:8080/fedora/search

SOLR REST API

rebuild indexes, browse SOLR data (bypass drupal/islandora)

http://fedoramachine.institution.edu:8080/fedoragsearch/rest/

SOLR admin (not really admin management – admin read-only)

browse the schema, build / test queries (bypass drupal/islandora)

http://fedoramachine.institution.edu:8080/solr/admin/

SOLR search failure

If a SOLR search is not working there are many potential failure points / stages to check:

  1. is drupal/islandora communicating with fedora (check error logs, check islandora config screen; should be no errors and the config screen should say ‘Successfully connected to Fedora Server (Version 3.5).’)
    • on problems: make sure fedora is running, try a command-line ping from each machine to the other, ?????
  2. is the object making it into fedora (check fedora admin, fedora REST; should be able to navigate to the object or specify the URL directly, and then should see the object’s metadata)
    • on problems: check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
  3. does the object have all the right metadata (check fedora admin, fedora REST; verify value match what was entered in the form)
    • on problems: check the form being used (islandora > form builder), check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
  4. is the meta in the right places and correctly formatted (check fedora admin, fedora REST; verify that value are in the expected fields, verify that datastreams are appropriately inline or managed)
    • on problems: check the form being used (islandora > form builder), check the catalina.out logs ($FEDORA_HOME/tomcat/logs/catalina.out) and drupal-machine logs (on your drupal machine, /var/logs/httpd/error_log), ?????
    • NOTE: as of islandora 7.1 SOLR has problems with managed (type=”M”) datastreams, but should handle inlineXML (type=”X”)
  5. is gsearch correctly indexing the object (check gsearch directly; verify that the expected fields show up, and that the object can be found by searching on a known field-value combination)
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, ?????
  6. is SOLR correctly indexing the object (use the SOLR REST API to reindex the given PID (http://fedoramachine.institution.edu:8080/fedoragsearch/rest?operation=updateIndex  – 4th action box down), then verify that the expected field names show up in the field name drop down in the browse index page (http://fedoramachine.institution.edu:8080/fedoragsearch/rest?operation=browseIndex), and then that expected object shows up in the results list (though clicking on the provided link may not work – seems to be some basic issue with the gFindObjects page))
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, check $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml, check $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml, ?????
  7. is the object fetch-able via SPARQL (check the SOLR admin page (http://fedoramachine.institution.edu:8080/solr/admin/) and build a simple query to get the object via PID (e.g. ‘PID:”collectionname:23″‘)) and then check to see if it shows up in a multi-result list (e.g. ‘format:”cambio”‘)
    • on problems: check $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/demoFoxmlToSolr.xslt ?, check $FEDORA_HOME/gsearch_solr/solr/conf/solrconfig.xml, check $FEDORA_HOME/gsearch_solr/solr/conf/schema.xml, ?????
  8. back on the drupal/islandora side, is the SOLR client correctly connecting to the SOLR server (check error logs, check islandora solr client screen; should say ‘Successfully connected to Solr server at fedoramachine.institution.edu:8080/solr’)
    • on problems: ?????

bootstrap confirm dialog and transient alert April 26, 2013

Posted by ficial in javascript, techy, webdev.
Tags: , , , , , , , ,
add a comment

I’ve recently been doing some work on a web app that will use / is using the twitter bootstrap framework. There were two pieces of functionality that I wanted that weren’t readily available: transient alerts and confirmation dialogs. A transient alert is a message that appears and then fades away in a short period of time. A confirmation dialog is a modal dialog where some action happens when the user clicks a ‘Yes’ button, and nothing happens when they cancel/close/click ‘No’.

The confirm dialog was a fairly straight-forward adaption of the built in Modal element in bootstrap. However, the basic one fell short in several ways:

  • The appearance was way too complicated for what I wanted, and simultaneously didn’t stand out enough
  • The keyboard interaction was a bit sub-par; I wanted users to be able to use Enter to confirm as well as Escape to cancel – bootstrap supports the latter but not the former
  • It was difficult to do something useful with a confirmation – the bootstrap Modal element doesn’t actually block execution, so you can’t easily wait for a response in code; you have to bind the relevant action to the click event of the Yes button

To build my confirm dialog I:

  • Tweaked the CSS to reduce the noise and complexity, and to make the element stand out more
  • Constructed a simple, stripped down DOM structure that works with the basic bootstrap Modal system. I store this in the JS file next to the launcher function just to keep relevant things close to each other.
  • Built a JS function that takes a message and an handler that’s called when the Yes button is clicked
  • Used a global to simplify passing data into the handler function

Then to use the confirm dialog create an action handler and then specify a click event (or other event, I guess) handler that launches the confirm dialog with an appropriate message and the event handler. If needed, first store any relevant data in the global data passing variable.

Here’s the code for the confirmation dialog (definition and example usage).

 

The transient alert was simpler in some ways in that I decided to build it form the ground up – leaning on bootstrap for some styling, but not for functionality. Essentially, it’s a simple div with replaceable text. When the transient alert is activated the text is changed as appropriate and a fade-out transition is initiated.

Here’s the code for the transient alert (definition and example usage).

 

We Design Our House April 13, 2013

Posted by ficial in house building.
Tags: , ,
add a comment

We’ve been living in our house for about 2 1/4 years now (same age as our son) and have been very happy with the way it turned out. The design process was long and intensive, but in the end seems to have worked. We pretty much made up our process as we went along, though it was a bit less haphazard than that sounds. We’re both fairly analytically inclined, and have read a number of design books on a wide range of realms (pottery, urban planning, architecture, landscaping, forestry, woodworking,…). From our background and reading and discussions we concluded that good design processes at their core do two essential things. First, they determine the intersection of values, goals/features, functionality, and complexity (where complexity encompasses cost, resources, and difficulty of implementation). Second, they provide a relative metric for prioritizing development – that is, for any two aspects of the project there is away of determining which one is more important. We created our design process based on those two points.

In brief summary, the process we used boils down to:

  1. Articulate – brainstorm; clearly list values (abstract ideals), features (specific elements, often measurable/concrete), and functions (things that need to be done supported)
  2. Analyze – determine how the above relate to each other; use numbers; trim and/or add items as needed
  3. Arrange – group functions, then relate the function groups to each other; use that as the basis for layouts
  4. Adjust – starting with the basis as representing rooms/areas, push, pull, split, and join based on the values and features
  5. Appraise – determine whether a given design is ultimately acceptable; re-consider the numbers assigned to values and features; if a design does not work, either adjust it further or start again with the layout basis
  6. Accept – decide that a given design is good enough; make sure it’s actually doable (i.e. check that it won’t fall down and can be built within budget)

(okay, so I’m a sucker for alliteration)

In rather more detail….

Our first step was to create a list of the values we cared about. These are the intangibles, subjectives, and other fuzzy goals. We did this with an initial brainstorming session of about an hour plus additional ideas and refinements over the next week (though in a sense we did it in various conversations and day dreams over 5+ years, or in our heads over our lifetimes). A value would be a single word or short phrase, with 1-3 sentences detailing/expanding the idea. After a week we revisited our ideas and trimmed and combined to create set of 15 values:

  • Beauty: It’s important that things be pretty; aesthetics counts, not for everything, but for something.
  • Being Outside: We like being outside, for reasons both physical and spiritual/mental.
  • Choice/Freedom: Within limits, do what we want. Not unduly tied down/constrained by our home/yard.
  • Comfort: Most physical comfort. Good food, comfy temperature, no bugs, etc.
  • Community: Family, friends, neighbors, fellow residents of the town, county, state, nation, world, watershed, and bioregions.
  • DIY: We derive joy and satisfaction from doing things ourselves, even when it’s not necessarily the most efficient/effictive approach.
  • Eco-integration: The house and grounds and residents are a part of the local eco-system.
  • Environmental Stewardship: Taking care of the local and larger environment, making it better/healthier, preserving it for the future
  • Fun: Having fun is not being happy, but it IS a PART of being happy.
  • Health: Mental and physical health: exercise, nutrition, safety, peace of mind, anti-stress
  • Learning: We like both learnign and knowing stuff.
  • Positive Legacy: For future generations and residents.
  • Preparedness: Future-aware, mindful of what will and may come. Preparing for possible and unavoidable life changes.
  • Privacy: Both from locals and from family/friends.
  • Self Sufficiency: But not to the point of cutting off the rest of the world.

We waited a bit to let our heads clear, then did a similar sort of brainstorming session for goals/features. These are specific aspects / elements we would like to be present in the finished house. Our features list ran to 52 items:

  • Annuals gardens
  • We have sufficient and appropriate space for our hobbies
  • We grow/raise/forage much of our own food
  • Long term timber planting
  • Fungiculture
  • We perform horticultural/fungicultural experiments and research
  • Design around solar more than other local energy sources
  • Maple syrup (harvesting sap, boiling to syrup)
  • Others’ waste is our resource
  • Local materials used preferentially
  • Wood lot large enough to provide all heating needs, with a buffer.
  • Both private and public inviting areas
  • We have a variety of decorative plants and hardscape
  • Divisions of wild and civilized – clear what is ours
  • Fun/silly (fantasy nature trail, art (eg giant chair)
  • Our land has a style/flavor that we enjoy
  • Streams are good
  • Aesthetically and practically not car dominated
  • Yard/land has large and small art
  • Expose infrastructure, hide services
  • Good views (peaceful, inspiring, alive)
  • We like outbuildings
  • We have comfortable outdoor space
  • A place to be outside while it’s raining but remain dry
  • market gardening an option
  • Injury and old age friendly
  • Consider garden/yard maintenance: total time (more than a little, but not too much), total fussing (some for a few plants), consider peak times and other fluctuating time demands, what kinds of garden work do we prefer?dislike?
  • We can travel/go on vacations (a week or more, but less than a month)
  • Possible rental potential
  • Our land/home is adaptable and/or planned for changes in our life
  • Debt free
  • Non-wage income sources/options
  • Kid friendly (fun, educational, safe, not poisonous)
  • Variety of fruits and nuts for fresh eating and preserving
  • Varmint control (micro, macro and mega)
  • Comfortable year round (house and land/yard)
  • We have leisure time.
  • We want people to visit, but not for too long
  • Guests (short and long term) like visiting us
  • Potential for long term guests and relatives
  • Compatible with the camp up the road
  • Compatible with area population growth
  • Minimize waste output
  • Much diverse habitat
  • We use water wisely
  • Bats/bat habitat
  • Bird houses (fowl & wild birds)
  • Hedging bets on climate (drought, flood, fire, heat, cold)
  • Durable – build for the long term (stable design, the essential parts will work indefinitely with reasonable maintenance)
  • Diversity is very important, plant wise and other
  • Minimal imported energy needs/use
  • SCIENCE! Observe & record, plots, labels, sensors, place to do analysis

We then put our values and features in a big matrix in a spreadsheet – values as the column heading and features as the row headings. For each feature we read across and put a 1 in the column if that feature contributed significantly to that value. Sums across then indicate the breadth of support the feature has for our values and thus gives a rough metric for comparing importance of features, and sums down indicate a depth of support for a given value and thus gives a rough metric for comparing importance of values. After that we took another couple of weeks break to mull things over and tweak numbers and weightings a bit.

The last major brainstorming step was coming up with the set of functions. We kicked this off with an hour or so of jotting down ideas on note cards, and kept adding to the pile over the course of a week or so. These are very short descriptions of things that we wanted or needed to do. E.g. cook and bake at the same time and with multiple people working at once, store the trash between trips to the transfer station, support recycling, have a place for the cat litter, host a food-oriented gathering of 15+ people, store and present our library of 2000+ books, and so on. We came up with MANY functions, from necessary to frivilous. Then we spent a long eveing doing a big sort. The functions got put into 3 piles. The first pile was the essential functions and included things like sleeping space, cooking, etc. The second pile was things that we cared about a lot but weren’t REALLY a necessity for having a house (e.g. a place to start seeds in the spring, a place for guests to sleep, etc.). The third pile was things that would be nice to have but we wouldn’t be too disappointed if they didn’t make it. Technically there was also a fourth pile of things we decided we didn’t care about, but we just tossed those in the recycling. Some functions were easy to place, but many were part-way between two piles. To decide which pile to put the liminal functions in we referred to our value-feature matrix to see the relative importance of the various values and features to which that function related.

The next step, a couple of nights later, was to collect the functions into related groupings – e.g. ‘a place to get dressed’ and ‘a place to store clothes’ would go together. First we did the groupings for the essential functions. Then we tried to fit the desired functions into those groupings (and did some reorganizing as we went). Then we went through our nice-to-have functions into those groups, with minimal additional reorganizing. At the end of this we had our functional clusters, which began to give us a picture of what rooms / areas our house would have.

The next session we did a quick re-check of the groupings, did a little reorganizing, and created cluster cards. A cluster card had a title (e.g. ‘master bedroom’, ‘group C’, ‘laundry’, etc.) and a list of the functions – at this point we stopped working with the individual function cards. We then explored relationships among the clusters. E.g. the laundry is linked to the bedrooms (where the undressing functions reside) and the outdoors (where the air-drying function resides). Some links were based on the functions in the clusters, while others were informed by our features and values. We did this by laying out all the cluster cards and moving them around, using distance to indicate degree of linkage (farther apart meant less linked). We took snapshots and notes about our results, then put the cluster cards away for the night. We repeated the cluster relations exercise a few times before we settled on an arrangement that satisfied us. Once the clusters were arranged it was a straightforward step to drawing lines between nearest neighbors, which indicated how the rooms in our house would be arranged/connected, or at least what the arrangement priorities would be when placing the rooms.

The actual layout began with the cluster relations as rooms and the connections between them as doorways and halls. That starting layout was then pushed, pulled, and twisted as we applied constraints and principles of efficient space use, passive solar design, monetary resources, required services, and, for want of a better description/name, human experience design. The first three are fairly straightforward to understand and apply. For efficient space use we looked to minimize pure-transit space and to assess objectively the space actually needed for the functions assigned to each room/realm. This was a bit tricky on the technical side, but mostly just a physical puzzle. Passive solar design basically boils down to 1) good solar orientation, 2) lots of south-facing glazing (at least, for passive solar heating, which we needed) and correspondingly little north-facing glazing, 3) sun-accessible thermal mass to soak up the heat, and 4) good insulation and a tight shell. Monetary resources were simply savings + income + loans. Required services refers to things like making sure sewer, water, electricity, are available where needed, as well as things like a vent for the dryer, a chimney for the woodstove, etc.

Human experience design is bringing intention to the human experience of being in a space/structure. This is what gives a building its character / feeling. In our values and features we had a number of items pertaining to the experience factor – e.g. goth private and public inviting areas, good views – but little initial idea of how to accomplish/realize them. In our readings and research related to this we also came upon a several of other ideas / concepts / design principles which we adopted. Since this post is mostly about the process of design I won’t go into detail on this topic. However, if you’re interested in it, here are some of the books we found especially useful/relevant/inspiring:

We also did an exercise where we listed all the houses and other structures that we’d been in and could clearly picture and discuss, and what about each of those houses felt good to us and in what way, and which elements of those houses contributed to those feelings and how/why. After a while, with the readings and the exercise, we felt we had some decent grasp of some ways to generate / evoke the experience we wanted to have in our home.

We came up with a number of theoretically viable designs before we settled on the one that lead to our current home. Each time we came up with something that looked workable we held it up to our values and goals and double-checked both that it sufficiently met them, and that we felt those goals and values still applied. Some major decisions that lead to new layouts:

  1. Small-to-medium house rather than tiny house – our hobbies require significant infrastructure, we have a good sized collection of books, and we wanted to have kid
  2. No earth roof – after seriously considering one design we decided that the technical challenges of building an earth roof for a hoped-to-be-200+-year-structure were just a little too much. There are certainly ways to do that, but we felt that those ways would too heavily impact other values and features.
  3. No earth berming – we weighed the technical and design challenges of berming and eventually decided that we could get much the same benefit through other means (mainly serious insulation amd serious thermal mass) and ditching the berming offered many benefits (e.g. exit-capable windows in every room, more light, less worry about water infiltration, simpler structural requirements)
  4. Willing to hire someone for some of the construction and to take on significant debt – this was a really tough choice / decision. Being willing to take on lots of debt meant that we could make a lot of capital-intensive investments which we expected to pay off in the long term (e.g. even more insulation, active solar home heating and DHW system, standing seam metal roof), and hiring a contractor meant a lot less stress. We were at a point where we could have quit our jobs and worked on the house full time, or hired someone else to deal with maor parts. After doing the calculations we determined that we’d actually save/make more money by continuing in our current jobs. In retrospect, our calculations have have been off… but we’re still happy with the house we have.

It was very, very useful and important to make quick 3D mock-ups of our designs in SketchUp (http://www.sketchup.com/), a free and easy-to-learn-and-use lightweight CAD program. Having a 3D model both let us get a much better sense of what it would be like in a given space, and highlighted structural issues that weren’t as readily apparent on paper – one initially-promising design was discarded because the 3D mock-up revealed that a major beam would have to cross the stairway at about 4 feet over the steps. We also ran designs by many friends and family to get critical feedback. Having lots of other people looking at designs and raising questions and/or making suggestions was vital – we certainly didn’t incorporate all others ideas, but some made it in after due consideration.

Eventually, we settled on a final design. Final-ish, anyway. At that point we brought in some professionals to do some sanity checks and maybe make additional suggestions. We talked first to a general contractor (whom we eventually hired) for general advice/ideas (a few of which we followed; in retrospect we should have ignored a couple of them (mainly ones related to design / human experience), but others were in fact quite useful (in the technical building realm) – lack of actual experience can be a real hamper in figuring out which professional advice can be safely ignored/discarded), and then to a timber framing company to ensure our house would actually stand up (we hired that company to procure and cut the beams, which our GC then assembled (with both of us working as assistants / peons on his crew)).

From brainstorming values to the truly final design of our home took probably about 18-24 months (my memory is hazy on the timing details). The first part (articulate, analyze, and arrange) took probably 2 months. most of the remaining time was all about looping through the adjust-appraise cycle. The final month or two dealt with incorporating feeback from the professionals (e.g. in our first final design we’d conservatively calculated 10 maximum foot spans, but the timber frame maker was able to extend that to 16, which changed the post locations and thus some of our wall locations).

Overall, it was a great experience in which we learned a lot (e.g. never start your home building project in New England in October), and now we live in the house of which we hadn’t before realized we’d always dreamed.

Follow

Get every new post delivered to your Inbox.