jump to navigation

Islandora 7 – permissions for multisite and XACML June 30, 2014

Posted by ficial in islandora, multisite, techy, XACML.
2 comments

TLDR:

We used drupal-multisite to organize our repository. Permissions/access is a challenge. We used namespaces that are unique to each collection for site-level access, which required a bit of custom coding to support and has some limitations, but took much less coding and is more stable than the other main option we considered (implementing real per-collection access). Supporting XACML effectively across multiple sites then requires separate user tables for each site (backed by LDAP to unify login credentials), a separate entry in the fedora filter-drupal.xml file, and appropriate privileges granted on the DB for that site for the mysql user that filter-drupal specifies. Setting up a new site has three separate areas that require action:

  • drupal : create the site using standard multi-site approach (share ldap_authorization and ldap_servers tables)
  • mysql : grant access to the new database to the user that fedora uses to check credentials
  • fedora : add an entry for the new site/DB to …/server/config/filter-drupal.xml, then restart fedora

LONG:

In planning our repository a major challenge was for us to present and to control access to and mangement of our objects in a way that more-or-less matches how our users want and need things to work. Conceptually, our system has a tree-like structure, with potential for cross-connections. At the root is our over-all site http://unbound.williams.edu, which provides a face for the program / system as a whole and a convenient place / way to search across all the (unrestricted) objects in our system. From there we have project sites, which correspond to a particular department (e.g. http://unbound.williams.edu/williamsarchives), institutional project (http://unbound.williams.edu/facultypublications), or individual project (http://unbound.williams.edu/mayamotuldesanjosearchaeology). Within a given project there might be a single collection or multiple collections. A person or department might in turn have a single project or multiple projects, or might be involved in different projects and/or collection in different ways (e.g. managing one project, contributing to another collection, with read-only to a third, protected collection).

Setting up the technical infrastructure and processes to support the above model was (and continues to be) challenging. We used a drupal multi-site system to organize the main site and project sites. We leveraged islandora’s built-in namespace restriction capabilities to limit given collections to given sites. We did this by associating each collection with a unique namespace. This allows us to very easily include a given collection in multiple projects (e.g. the faculty articles collection might be in both the faculty publications project and the archives project). Essentially, we wanted to be able to support object access on a per-collection basis, but the built-in support only worked with namespaces, so we made them (semi-) synonymous. There were a couple of technical challenges to making this work, and there are also some less-than-ideal limitations that go with this approach.

On the technical side, there are two places that namespace restrictions come in to play: repository access and search access. On the back end there seems to be no limit to the number of namespaces that can be specified for these two areas, but the web form elements that are used for them limit the content to something too small for our purposes. We went through two levels of work-around here. First, we changed the form elements for those field from basic inputs to text area / paragraph inputs. However, we still had the problem that there were two separate places where namespaces had to be managed, which could easily lead to problems that would greatly impact user experience. So, we created a custom module that provides a single interface that controls both areas – the namespace list that’s entered in that one field is used to set both the SOLR preferences and the site namespace config values. With this in place our namespace list for a given site might become pretty long, but it’s easy enough to manage and we don’t ever end up in a situation where there’s a mis-match between the search-based access and the repository/site-based access.

On the data structure side of things this approach creates some hard limits in what we can do. We’re trying to emulate collection-based access control, but this doesn’t do that exactly. It fails in two main ways. First, an object’s namespace isn’t necessarily the same as that of a collection that contains that object. In the case when an object is in more than one collection then we’re guaranteed that there’s a mis-match for at least one of the collections. To try to get around this we more finely divide our object sets than we otherwise might and use the site-level grouping to bring them together rather than collection-level grouping. Second, we lose hierarchical object access control. In a pure collection based approach we would be able to nest collections and specify access by the top-level collection, but since each collection is it’s own namespace we have to manually manage access to whole hierarchies as individual elements. Neither of those two limitations are game-stoppers, but they do need to be taken into consideration when ingesting a new set of objects and setting up new projects an collections.

In an ideal world we’d have used collection membership directly for access control, but doing so would have required rather a lot of custom coding to implement. Essentially we’d have had to create a whole new set of fields and corresponding web forms that paralleled the namespace ones. Additionally, to make hierarchical collection membership work appropriately we’d have to get tangled in building and maintaining additional relationship fields in the RELS_EXT datastream. All certainly possible, but in our situation it required too much work and was too prone to implementation errors. We deliberately sacrificed functionality to gain stability and low technical investment and upkeep. So far it’s working OK for us.

Though we’re using namespaces as the primary way of associating given collections with given sites we still have the challenge of restricting access to collections (and indovidual objects) within a site via XACML. There are some subtleties in this due to how fedora checks permissions. Essentially, fedora has a component that checks in with the drupal database to verify that a user is authenticated and to check what roles the user has. This is explained briefly in the ‘Configure the Drupal Servlet Filter’ section at https://wiki.duraspace.org/pages/viewpage.action?pageId=34638844, with a very general directive to “use the Drupal LDAP module” to avoid difficulties in too-much-access. Making all that actually work required a certain amount of further research and experimentation for us.

We use LDAP for our central authentication system, and connect to it for our islandora system using LDAP for drupal 7. That package has a lot of sub-pieces, only three of which we found necessary to get things working: LDAP Authentication, LDAP Authorization, and LDAP Authorization – Drupal Roles (though one could probably get away with just the first). Once that’s set up for our main site we can simplify spinning up additional sites by sharing two key tables across the sites: ldap_authorization and ldap_servers. The modules still need to be enabled for new sites, but since the tables are shared no additional configuration is needed. Additionally, if our LDAP config needs to be changed then doing it once automatically ensures it works for all the sites. We do the table sharing by setting up one drupal as the primary install (in our case it’s our main site, using a database named main_drupal) and using the prefix attribute of the databases settings variable in the individual site settings files. (see below for an example)

We originally shared the user tables as well, but that caused serious problems when trying to use XACML to control object access by role. The fedora component that checks in with drupal about user validation and roles has an interesting behaviour where it combines all roles that a given username-password combination has across all sites. So, with a single, shared user table a user effectively has the same username and password for all sites, which means that the user would get for all sites any role they have on any site. In other words, making a user admin on one site would given them admin access to all objects on all sites. So, we have separate user tables. However, because we’re using LDAP as our authentication system it doesn’t impact user management – all the user management happens external to drupal anyway.

However, since we’re using seperate databases to hold all those different user tables (and other site-specific stuff, of course) the db user that fedora uses to check user authentication and roles need to be given access to those databases. One could create a user and given them universal grants, but that seems…. suspect, from a security standpoint. So, each time we create a new project site we need to make sure to grant that user select privileges for the new database. Also, simply granting the user those privileges isn’t enough in itself, the fedora component also needs to be configured actually to check the new database. This is done by adding an additional connection specification in the …/server/config/filter-drupal.xml file.

I think that fedora makes a seperate DB connection for each entry in the file, so at some point one runs into issues of scalability, where for any kind of islandora access to fedora data the system is checking against N databases. Hopefully fedora uses some sort of connection pooling and caching system to mitigate this somewhat, but I don’t really know.

In summary, to set up a new islandora-enabled instance of a drupal multi-site:

  1. have LDAP installed and configured for some primary site (which for the purposes of the example below uses a database called main_drupal)
  2. do all the usual mutli-site set-up stuff
  3. in the new site’s settings.php file, specify the primary site db as the prefix for the ldap_authorization and ldap_servers tables

    $databases = array (
      'default' =>
      array (
        'default' =>
        array (
          'database' => 'sitedbname',
          'username' => 'a_db_user',
          'password' => ',jdFN3952oiU54h6n2o987ytglaKEn68Yu34',
          'host' => 'mysql-machine.institution.edu',
          'port' => '',
          'driver' => 'mysql',
          'prefix' => array(
            'default' => ''
            ,'ldap_authorization' => 'main_drupal.'
            ,'ldap_servers' => 'main_drupal.'
          ),
        ),
      ),
    );
  4. in the database that backs drupal, grant the fedora db user selection access to the new database (probably really only need access to a few specific tables (users, users_roles, role), though that’s more work to specify and maintain)

    GRANT SELECT ON SITEDBNAME.* TO 'fedora_mysql_user'@'fedora-machine.institution.edu';
  5. on the fedora host add an entry to …/server/config/filter-drupal.xml for the new database


    <connection server="mysql-machine.institution.edu" dbname="sitedbname" user="fedora_mysql_user" password="nRExw890zV34hl56N245AV078kk45" port="3306">
      <sql>
       SELECT DISTINCT u.uid AS userid, u.name AS Name, u.pass AS Pass, r.name AS Role FROM (users u LEFT JOIN users_roles ON u.uid=users_roles.uid) LEFT JOIN role r ON r.rid=users_roles.rid WHERE u.name=? AND u.pass=?;
      </sql>
    </connection>

  6. don’t forget to restart fedora so that the new filter-drupal stuff is used

Islandora 7 – splitting CSV data on ingest June 24, 2014

Posted by ficial in code fixes, islandora, techy, xsl.
add a comment

TLDR:
It’s tricky to tokenize CSV values on ingest using a MODS form. To do so, create a self-transform XSL and manually tokenize the appropriate fields – create an XSL to do the tokenizing in …./sites/all/modules/islandora_xml_forms/builder/self_transforms/, then set that as the self-transform for the relevant form. You’ll need to create your own CSV tokenizer since Islandora 7 uses an older version of XSL. See below for example code.

LONG FORM:
In our Islandora install we’re using MODS as the main meta-data schema. That is, the ingest forms are set up for generating MODS XML. However, the way the form is set up is anti-helpful for some of the people that are doing our data loads. Specifically, the subject-topic, subject-geographic, and subject-temporal fields were not being processed as people expected.

Those three fields are multi-value ones, meaning they support a structure like:

...
<subject>
  <topic>cows</topic>
  <topic>bovines</topic>
  <topic>farm animals</topic>
  <geographic>field</geographic>
  <geographic>farm</geographic>
  <temporal>historic</temporal>
  <temporal>1800s</temporal>
</subject>
...

However, when using the form we want to be able to enter them as CSV values – e.g. ‘cows, bovines, farm animals’. Unfortunately, the default behavior is to treat such as a single value, giving a result like:

...
<subject>
  <topic>cows, bovines, farm animals</topic>
  <geographic>field, farm</geographic>
  <temporal>historic, 1800s</temporal>
</subject>
...

The Islandora 7 ingest forms system does provide a place where this can be corrected, but it’s subtle and tricky. Specifically, one has to create an XSL to do the proper tokenizing and set that up as a ‘self transform’ for the form. Creating the tokenizing XSL is in turn made more difficult because Islandora 7 uses XSL earlier than 2.0, which means that there is no built in tokekizing function. The place this needs to be done is in …/sites/all/modules/islandora_xml_forms/builder/self_transforms/, which took me a while to find because I was mis-lead by the ‘builder’ folder – code in that folder relates not only to the building of forms, but also the using/processing of forms.

Following some suggestions on various sites, I organized my tokenizing code in a separate file and included/imported it into the self-transform. Here’s where I ended up:

TOKENIZER (csv_tokenizer.xsl):
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:mods="http://www.loc.gov/mods/v3">
    <xsl:template name="csvtokenizer" >
      <xsl:param name="commaStr"/>
      <xsl:param name="tagLabel"/>
      <xsl:if test="normalize-space($commaStr) != ''">
        <xsl:choose>
          <xsl:when test="contains($commaStr, ',')">
            <xsl:call-template name="csvtokenizer">
              <xsl:with-param name="commaStr" select="substring-before($commaStr,',')"/>
              <xsl:with-param name="tagLabel" select="$tagLabel"/>
            </xsl:call-template>
            <xsl:call-template name="csvtokenizer">
              <xsl:with-param name="commaStr" select="substring-after($commaStr,',')"/>
              <xsl:with-param name="tagLabel" select="$tagLabel"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <xsl:if test="normalize-space($tagLabel) != ''">
              <xsl:element name="{$tagLabel}">
                <xsl:value-of select="substring($commaStr, string-length(substring-before($commaStr, substring(normalize-space($commaStr), 1, 1))) +   1)"/>
              </xsl:element>
            </xsl:if>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:if>
    </xsl:template>
</xsl:stylesheet>

SELF TRANSFORM (cleanup_mods.xsl - NOTE: this also removes empty fields):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mods="http://www.loc.gov/mods/v3">
<xsl:import href="csv_tokenizer.xsl"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" media-type="text/xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[not(node())]"/>
<xsl:template match="mods:subject/mods:topic">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:topic'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="mods:subject/mods:geographic">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:geographic'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="mods:subject/mods:temporal">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:temporal'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()[normalize-space()]|@*[normalize-space()]"/>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

I could have combined the three tokenizing template matches into a single one with or-ed parameters and dynamic tag label, but I find the code here much easier to read and the maintenance cost very low.

The self-transform runs before any other transforms, so the splitting done here propagates downstreams without any further work.

Expectations in Game Design June 18, 2014

Posted by ficial in game design, games.
Tags: , ,
add a comment

A key part of a good game design is matching the players expectations to the game play actually delivered. This is one of the key points where theme is relevant (though certainly not the only one). More broadly speaking, expectation management is the primary issue of importance where the mechanical aspects of a game (specific rules, general complexity, length of play, etc.) intersect the non-mechanical aspects (theme, graphics, physical pieces, etc.). Kinds of expectations can be divided into two broad categories: mechanics, and experience.

Mechanical expectations have to do with how closely the rules of play match the assumptions / intuition of the player. Essentially, this is a process of drawing on the out-of-game assumed background that a player has (e.g. gold is worth more than silver, people need to be fed, wood burns, etc.) and using symbols and story to match that to the mechanical elements of play (i.e. the rules and game state). For example, if a game includes some bits labeled ‘coins’, then players naturally understand the idea of spending them to purchase a building, for example. If those bits are instead labeled ‘cows’ then it requires more explanation for a player to understand that some number of them may be converted into a building – it becomes a less intuitive rule. Conversely, if there’s a rule that says that at certain times having two of those bits allows a player to get a third then ‘cows’ makes sense in that two animals can be bred to prodcue another, whereas ‘coins’ requires something much more abstract to justify the addition. In this realm the theme of the game suggests the general kinds of actions that are and aren’t available the the kinds of outcomes that might be expected from those actions. The physical pieces both signify particular things (e.g. larger and heavier things are more important; given two kinds of markers, having one green and one brown is less meaningful than having one green and shaped like a leaf and the other brown and shaped like a horse), and suggest what those bits are used for (e.g. a gold colored disc to represent a valuable coin makes a lot more sense than a white cube; a token shaped like a bone might be fed to a dog, or used to build a skeleton; etc.). Graphics allow for illustrative suggestions of relevant rules, and also allow for easy reference to other parts of the game (via illustrations or more abstract icons / symbols).

When a designer has done a good job managing the mechanical expectations then the resulting game is much easier to learn, teach, and play. The actual play tends to be smooth, and faster than it otherwise might be. This is usally what people are talking about when they say a game is ‘well themed’ or ‘the theme is well matched’. When mechanical expectations are not managed well, then players have a hard time learning the game, and even after they learn it play tends to be slower and players are more likely to miss and/or to misinterpret rules. Criticisms tend to be things like ‘it didn’t make sense’ or ‘the theme was pasted on’. Over all, doing a good job with mechanical expectations turns a set of rules and abstract ideas into a good game. To turn a good game into a great game requires managing experiential expectations.

Player experience is the emotions and thoughts that a player has during the course of play — are players playing to have fun, or to compete? where/how does a player get a sense of accomplishment? when does the player feel the most tense, and why? how does player A feel about player B (in the context of game play)? does play feel deep and complex, or light? Experiential expectations are a much fuzzier concept than mechanical ones, in large part because player experience depends so much on the players themselves. The tools for setting expectations are the same – art, setting, iconography, language, story, etc. – but the goal in this case is not to draw parallels between exterior context and in-game elements, but instead to put players in a frame of mind where the experiences that the designer is attempting to create are easy to achieve and more intense when they happen. The art of the game can influence expectations via style, color scheme, size / prevalence, and subject. The setting can suggest particular feelings (e.g. when a player is told that a game is set in a dark cave then they’re much more prepared to feel limitation, enclosure, isolation, and fright than if they’re told the game is set in a sunny field). There are whole disciplines devoted to thinking about how iconography and typography affect a viewers feelings (https://www.google.com/?gws_rd=ssl#q=how+typography+affects+feelings). Language of course has a huge influence (compare ‘the triangle token follows the round token’ to ‘the tiger token stalks the farmer token’), and story or a less rigid narrative element allows an even more effective manipulation of player feelings. Video games can also borrow tricks from all the expertise the movie industry has developed – music, sound, motion, visual effects, background action, etc.

There’s also an interesting sub-set of experiences that can be thought of as having a target magnitude / degree – pretty much any aspect where a player might ask ‘how much’. (e.g. how much depth is there? how much cooperation? how much luck? etc.) For these ones, the target for managing expectations is actually slightly offset below (or ‘closer to neutral than’) the degree of effect the designer is trying to evoke during player. The relation of the actual experience to the expectation can greatly effect the intensity/excitement of the experience. Consider an experience for which the designer has established an expectation of level E, and a players actual experience at level A. When A is less than E then the player is bored / underwhelmed with that aspect of the game. When A is equal to E then the player is satisfied – the game delivered what it promised. When A is just a little bit more than E then the player is excited because the game has surpassed their expectations – this is the sweet spot of experiential expectation management. When A is a lot more than E then the player is overwhelmed and blocks out that part of the play experience or loses interest entirely.

Over all, setting the players expectations is a vital aspect of game design. A game that is mechanically good will be disliked if the players are expecting one thing but getting another, while a game that might be mechanically uninteresting or even quite flawed will be thoroughly enjoyed if it’s clear in the experience it delivers and that matches what the players want.

Kickstarter for Business Mogul Board Game May 15, 2014

Posted by ficial in games.
Tags: , ,
add a comment

https://www.kickstarter.com/projects/747414366/mogul-boardgame-purple-valley-games-first-publicat

ABOUT THE GAME

In Business Mogul players are the leaders of rising corporate empires, competing with each other to amass the most power. It has gone through many, many rounds of revision, and local play testers really like it. They’ve noted the many interesting decisions it offers, the way the strategic importance of different elements shifts as the game progresses, the very minimal down time, and the fun process of play even when they’re not in the lead. Plus, while a player can feel good about winning through skillful play, it has just enough luck that players don’t feel bad about losing.

Game play is centered on building an economic engine and has a rich, balanced bidding mechanic. The game has a fixed number of turns – six seasons of five rounds each – and total play time tends to be about 2-2.5 hours for 4 people (the game is designed for 3-5, with a solid 2-player variant), on par with a feature movie. In every round there is one item per player up for grabs – every player gets an item every round. The items on which players bid are businesses that are used to build up their economic engine, extra actions to be saved until needed most (and they let a player do some things outside the normal rules of play), special agenda cards that give a score bonus at the end of the game (how much depends on how well the player fulfills the conditions for the given card), and events (which are to be avoided when possible). Players bid for pick/draft order rather than on a particular item. Each player starts each season with a set of nine bidding cards, which represent how the player, as the head of their businesses, is dividing their attention that round between trying to get the best of the available items and getting other stuff done. Each bid has a specific action that it allows (lower bids have more powerful actions), and bids saved for the end of the season (typically players have 4 bids left over each season) are worth money – the higher the bid the more it’s worth. These additional aspects associated with each bid make for exciting, meaningful choices every round. The key to winning is being tactically smart about balancing pick order vs. action vs. end-of-season-payout, paying attention to the state of other player’s empires and the bids they’ve already used in the current season, and executing a strategy that adapts to the flow of items up for bid.

Business Mogul is a fairly heavy-weight game, offering rich decision making, multiple viable strategies, high-replay value, and lots of room for player improvement. There are some fairly common game mechanics/attributes that Mogul specifically does not have:

  • It has no player-to-player aggression – the bidding each round is the only point of direct competition.
  • It does not have random action outcomes (that is whether or not a given action succeeds is not determined by dice, card draw, or anything like that).
  • It does not have any kind of territory control.
  • It has no resources / production beyond money and end-game score.
  • It does not require memorization of what’s been played; aside from a single hidden card each player gets at the start and the particular bid a player is making in a given round, the state of the game is fully revealed.

ABOUT THE PROJECT

Business Mogul was designed by Chris Warren over the course of several years of development and playtesting. Chris has been a serious amateur game designer for over 10 years, having co-founded a weekly game design group in 2001 and he recently founded a small publisher Purple Valley Games. Business Mogul is the first game he is trying to get published. The kickstarter campaign has a primary backer level of $35 for the game & domestic shipping (international shipping is extra) and runs from now until June 8th. The game play design is finished and well tested, and the game assets will be revised/upgraded before publication – the pictures and demonstrations currently available show a final-stage prototype.

SUMMARY INFORMATION

Name: Business Mogul

Designer: Chris Warren

Number of Players: 3-5, with a 2-player variant

Recommended ages: 12+

Play time: 90-150 minutes, depending on number of players and player experience level

Purple Valley Games web site: http://purplevalleygames.com/?page_id=22

Board Game Geek page: http://boardgamegeek.com/boardgame/159240/business-mogul

Kickstarter campaign site: https://www.kickstarter.com/projects/747414366/mogul-boardgame-purple-valley-games-first-publicat

Twitter: https://twitter.com/csw11235

Google+: https://plus.google.com/u/0/+ChrisWarren_as_csw11235

Islandora 7 – SOLR Faceting by Collection Name (or Label) April 14, 2014

Posted by ficial in islandora, techy.
Tags: , , , , , , ,
add a comment

One of the basic features we need from our Islandora system is the ability to facet search results by collection name. Getting this working turns out to be a non-trivial project. The essential problem is that although objects in islandora know in which collection(s) they reside, that information is stored in the object only as a relationship identified by a computer-y PID. If one uses that relationship directly to do faceting one gets something that looks like ‘somenamespace:stuffcollection’, rather than the actual name of the collection ‘Our Collection of Stuff’. In brief, the solution I used was to alter the way the objects were processed by fedoragsearch to send actual collection names rather than just PID info. I did this by extending the RELS-EXT processing to load and use the relevant collection data when handling isMemberOfCollection fields.

The faceting options that are available are determined by what information is in the SOLR indexes – faceting is NOT driven directly by the fedora object store! To allow faceting by collection name we need to tell SOLR the names of the collection(s) of the object. This means that, similar to getting full text search working, we need to touch both fedoragsearch system to deliver the desired info to SOLR, and the the SOLR config info to make the desired fields available for searching (and faceting).

In our fedoragsearch set up we already had pieces in place to process the RELS-EXT info, which is where collection membership (among other things) resides. This part of the objects FOXML looks somthing like this:

  <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
    <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="Fedora Object to Object Relationship Metadata." CREATED="2013-11-08T15:49:50.889Z" MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0" SIZE="548">
      <foxml:xmlContent>
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:fedora-model="info:fedora/fedora-system:def/model#" xmlns:islandora="http://islandora.ca/ontology/relsext#">
          <rdf:Description rdf:about="info:fedora/somenamespace:52">
            <fedora:isMemberOfCollection rdf:resource="info:fedora/somenamespace:projectid"/>
            <fedora-model:hasModel rdf:resource="info:fedora/islandora:sp-audioCModel"/>
          </rdf:Description>
        </rdf:RDF>
      </foxml:xmlContent>
    </foxml:datastreamVersion>
  </foxml:datastream>

where the object has a PID of ‘somenamespace:52′ and is a member of the collection with PID ‘somenamespace:projectid’.

In the main gsearch_solr folder we have a sub-folder called islandora_transforms, in which there is a file called RELS-EXT_to_solr.xslt. This file is used by demoFoxmlToSolr.xslt via a straightforward include:

  <xsl:include href="/usr/local/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/gsearch_solr/islandora_transforms/RELS-EXT_to_solr.xslt"/>

which intially was just this:

  <?xml version="1.0" encoding="UTF-8"?>
  <!-- RELS-EXT -->
  <xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foxml="info:fedora/fedora-system:def/foxml#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    exclude-result-prefixes="rdf">
    <xsl:template match="foxml:datastream[@ID='RELS-EXT']/foxml:datastreamVersion[last()]" name='index_RELS-EXT'>
    <xsl:param name="content"/>
    <xsl:param name="prefix">RELS_EXT_</xsl:param>
    <xsl:param name="suffix">_ms</xsl:param>
    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_uri', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="@rdf:resource"/>
      </field>
    </xsl:for-each>
    <xsl:for-each select="$content//rdf:Description/*[not(@rdf:resource)][normalize-space(text())]">
    <field>
        <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_literal', $suffix)"/>
        </xsl:attribute>
      <xsl:value-of select="text()"/>
        </field>
    </xsl:for-each>
  </xsl:template>
  </xsl:stylesheet>

The initial version of this file just directly processes the contents of the RELS-EXT datastream of the object’s FOXML, eventually creating the SOLR fields RELS_EXT_isMemberOfCollection_uri_ms/mt and RELS_EXT_hasModel_uri_ms/mt (fedoragsearch created the _uri info, which SOLR extends to the _ms/_mt versions). We can facet directly on those to get the desired breakdowns by collection (and by model, for that matter), but the text presented to the user is basically meaningless. So, I added some code to load the actual collection data for each isMemberOfCollection relation, and then pulled the human-readable collection title from that.

From my perpsective there were three particularly tricky parts to this (further complicated by my limited proficiency/understanding of XSLT and XPATH). First, how do I catch all the memberships and nothing else. Second, how do I get the actual collection PID. Third, how do I pull in and process additional content based on that PID. In bulling my ways past these obstacles I ended up with code that I’m dead sure isn’t as pretty or efficient as it could be, but on the plus side it works for me.

In step one I re-used the looping example already in the file to look through all the description sub-fields that have a resource attribute, which processes this data:

  <rdf:Description rdf:about="info:fedora/somenamespace:52">
    <fedora:isMemberOfCollection rdf:resource="info:fedora/somenamespace:projectid"/>
    <fedora-model:hasModel rdf:resource="info:fedora/islandora:sp-audioCModel"/>
  </rdf:Description>

and hits both the fedora:isMemberOfCollection and fedora-model:hasModel fields. To make sure I’m not accidentially processing models I added an if test that examines the name of the field and makes sure I’m only proceeding with further work on isMemberOfCollection fields (NOTE: I’ll probably be adding a branch for processing hasModel at some point as well – the code will be very similar). Once I’ve ensured that I’m working with the right data I need to get the PID of the collection. This part baffled me for a while because I hadn’t noticed that the value of that field wasn’t just the collection PID, it had ‘info:fedora/’ prepended to it. Once I realized what was going on I used a simple substring to pull out only the PID part. Lastly, I needed to pull and process the collection with that PID in order to get its human-readable title. Luckily I had an analagous example of that kind of thing in the external datastream processing that happens in demoFoxmlToSolr.xslt – I loaded the collection FOXML into a local variable, then processed that to pull out the title. Finally, once I’d grabbed all the text I needed, I created the appropriate fields to send on to SOLR (the collection_membership. prefix I used is one I just made up on the spot – there’s nothing special about it and it’s entirely possible that there’s some other structure/naming scheme I should be using instead). The final, modified RELS-EXT_to_solr.xslt looks like this:

  <?xml version="1.0" encoding="UTF-8"?>
  <!-- RELS-EXT -->
  <xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:foxml="info:fedora/fedora-system:def/foxml#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    exclude-result-prefixes="rdf">

    <xsl:template match="foxml:datastream[@ID='RELS-EXT']/foxml:datastreamVersion[last()]" name='index_RELS-EXT'>
    <xsl:param name="content"/>
    <xsl:param name="prefix">RELS_EXT_</xsl:param>
    <xsl:param name="suffix">_ms</xsl:param>

    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_uri', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="@rdf:resource"/>
      </field>
    </xsl:for-each>

    <xsl:for-each select="$content//rdf:Description/*[not(@rdf:resource)][normalize-space(text())]">
      <field>
      <xsl:attribute name="name">
        <xsl:value-of select="concat($prefix, local-name(), '_literal', $suffix)"/>
      </xsl:attribute>
      <xsl:value-of select="text()"/>
      </field>
    </xsl:for-each>

    <xsl:for-each select="$content//rdf:Description/*[@rdf:resource]">

      <xsl:if test="local-name()='isMemberOfCollection'">
      <xsl:variable name="collectionPID" select="substring-after(@rdf:resource,'info:fedora/')"/>
      <xsl:variable name="collectionContent" select="document(concat($PROT, '://', $FEDORAUSERNAME, ':', $FEDORAPASSWORD, '@', $HOST, ':', $PORT,'/fedora/objects/', $collectionPID, '/datastreams/', 'DC', '/content'))"/>

      <field name="collection_membership.pid_ms">
        <xsl:value-of select="$collectionPID"/>
      </field>

      <xsl:for-each select="$collectionContent//dc:title">
        <xsl:if test="local-name()='title'">
        <field name="collection_membership.title_ms">
          <xsl:value-of select="text()"/>
        </field>
        <field name="collection_membership.title_mt">
          <xsl:value-of select="text()"/>
        </field>
        </xsl:if>
      </xsl:for-each>

      </xsl:if>
  <!--
      <xsl:if test="local-name()='hasModel'">
      <xsl:variable name="modelPID" select="substring-after(@rdf:resource,'info:fedora/')"/>
      <field name="CSW_test_if_model">
        <xsl:value-of select="$modelPID"/>
      </field>
      </xsl:if>
  -->
    </xsl:for-each>

    </xsl:template>

  </xsl:stylesheet>

Similar to the approach I used for the OCR text work, I was able to watch the fedoragsearch logs and verify that the output was as expected/needed. The changes on the SOLR side are pretty minor. I added a couple of lines to schema.xml to handle the new fields:

<field name="collection_membership.title_ms" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="collection_membership.title_mt" type="text_fgs" indexed="true" stored="true" multiValued="true"/>

and, though not necessary for the faceting aspect of this work, I added collection_membership.title_mt to the field list of the standard search request handler in solrconfig.xml:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
  <!-- default values for query parameters -->
  <lst name="defaults">
  <str name="echoParams">explicit</str>
  <str name="fl">*</str>
  <str name="q.alt">*:*</str>
  <str name="qf">
  PID
  dc.title
  ....
  ....
  collection_membership.title_mt
  </str>
  </lst>
</requestHandler>

The final step is to re-index everything, and add the collection_membership.title_ms field to the list of facet fields in the web-based islandora solr config tool (Islandora > Solr index > Solr settings > Facet settings; add collection_membership.title_ms to the Facet fields, and give it a label of Collection).

And that’s that. If anyone has any suggestions/thoughts about how to improve my XSLT I’d be thrilled to hear them.

Follow

Get every new post delivered to your Inbox.

Join 25 other followers