Islandora 7 – permissions for multisite and XACML June 30, 2014Posted by ficial in islandora, multisite, techy, XACML.
We used drupal-multisite to organize our repository. Permissions/access is a challenge. We used namespaces that are unique to each collection for site-level access, which required a bit of custom coding to support and has some limitations, but took much less coding and is more stable than the other main option we considered (implementing real per-collection access). Supporting XACML effectively across multiple sites then requires separate user tables for each site (backed by LDAP to unify login credentials), a separate entry in the fedora filter-drupal.xml file, and appropriate privileges granted on the DB for that site for the mysql user that filter-drupal specifies. Setting up a new site has three separate areas that require action:
- drupal : create the site using standard multi-site approach (share ldap_authorization and ldap_servers tables)
- mysql : grant access to the new database to the user that fedora uses to check credentials
- fedora : add an entry for the new site/DB to …/server/config/filter-drupal.xml, then restart fedora
In planning our repository a major challenge was for us to present and to control access to and mangement of our objects in a way that more-or-less matches how our users want and need things to work. Conceptually, our system has a tree-like structure, with potential for cross-connections. At the root is our over-all site http://unbound.williams.edu, which provides a face for the program / system as a whole and a convenient place / way to search across all the (unrestricted) objects in our system. From there we have project sites, which correspond to a particular department (e.g. http://unbound.williams.edu/williamsarchives), institutional project (http://unbound.williams.edu/facultypublications), or individual project (http://unbound.williams.edu/mayamotuldesanjosearchaeology). Within a given project there might be a single collection or multiple collections. A person or department might in turn have a single project or multiple projects, or might be involved in different projects and/or collection in different ways (e.g. managing one project, contributing to another collection, with read-only to a third, protected collection).
Setting up the technical infrastructure and processes to support the above model was (and continues to be) challenging. We used a drupal multi-site system to organize the main site and project sites. We leveraged islandora’s built-in namespace restriction capabilities to limit given collections to given sites. We did this by associating each collection with a unique namespace. This allows us to very easily include a given collection in multiple projects (e.g. the faculty articles collection might be in both the faculty publications project and the archives project). Essentially, we wanted to be able to support object access on a per-collection basis, but the built-in support only worked with namespaces, so we made them (semi-) synonymous. There were a couple of technical challenges to making this work, and there are also some less-than-ideal limitations that go with this approach.
On the technical side, there are two places that namespace restrictions come in to play: repository access and search access. On the back end there seems to be no limit to the number of namespaces that can be specified for these two areas, but the web form elements that are used for them limit the content to something too small for our purposes. We went through two levels of work-around here. First, we changed the form elements for those field from basic inputs to text area / paragraph inputs. However, we still had the problem that there were two separate places where namespaces had to be managed, which could easily lead to problems that would greatly impact user experience. So, we created a custom module that provides a single interface that controls both areas – the namespace list that’s entered in that one field is used to set both the SOLR preferences and the site namespace config values. With this in place our namespace list for a given site might become pretty long, but it’s easy enough to manage and we don’t ever end up in a situation where there’s a mis-match between the search-based access and the repository/site-based access.
On the data structure side of things this approach creates some hard limits in what we can do. We’re trying to emulate collection-based access control, but this doesn’t do that exactly. It fails in two main ways. First, an object’s namespace isn’t necessarily the same as that of a collection that contains that object. In the case when an object is in more than one collection then we’re guaranteed that there’s a mis-match for at least one of the collections. To try to get around this we more finely divide our object sets than we otherwise might and use the site-level grouping to bring them together rather than collection-level grouping. Second, we lose hierarchical object access control. In a pure collection based approach we would be able to nest collections and specify access by the top-level collection, but since each collection is it’s own namespace we have to manually manage access to whole hierarchies as individual elements. Neither of those two limitations are game-stoppers, but they do need to be taken into consideration when ingesting a new set of objects and setting up new projects an collections.
In an ideal world we’d have used collection membership directly for access control, but doing so would have required rather a lot of custom coding to implement. Essentially we’d have had to create a whole new set of fields and corresponding web forms that paralleled the namespace ones. Additionally, to make hierarchical collection membership work appropriately we’d have to get tangled in building and maintaining additional relationship fields in the RELS_EXT datastream. All certainly possible, but in our situation it required too much work and was too prone to implementation errors. We deliberately sacrificed functionality to gain stability and low technical investment and upkeep. So far it’s working OK for us.
Though we’re using namespaces as the primary way of associating given collections with given sites we still have the challenge of restricting access to collections (and indovidual objects) within a site via XACML. There are some subtleties in this due to how fedora checks permissions. Essentially, fedora has a component that checks in with the drupal database to verify that a user is authenticated and to check what roles the user has. This is explained briefly in the ‘Configure the Drupal Servlet Filter’ section at https://wiki.duraspace.org/pages/viewpage.action?pageId=34638844, with a very general directive to “use the Drupal LDAP module” to avoid difficulties in too-much-access. Making all that actually work required a certain amount of further research and experimentation for us.
We use LDAP for our central authentication system, and connect to it for our islandora system using LDAP for drupal 7. That package has a lot of sub-pieces, only three of which we found necessary to get things working: LDAP Authentication, LDAP Authorization, and LDAP Authorization – Drupal Roles (though one could probably get away with just the first). Once that’s set up for our main site we can simplify spinning up additional sites by sharing two key tables across the sites: ldap_authorization and ldap_servers. The modules still need to be enabled for new sites, but since the tables are shared no additional configuration is needed. Additionally, if our LDAP config needs to be changed then doing it once automatically ensures it works for all the sites. We do the table sharing by setting up one drupal as the primary install (in our case it’s our main site, using a database named main_drupal) and using the prefix attribute of the databases settings variable in the individual site settings files. (see below for an example)
We originally shared the user tables as well, but that caused serious problems when trying to use XACML to control object access by role. The fedora component that checks in with drupal about user validation and roles has an interesting behaviour where it combines all roles that a given username-password combination has across all sites. So, with a single, shared user table a user effectively has the same username and password for all sites, which means that the user would get for all sites any role they have on any site. In other words, making a user admin on one site would given them admin access to all objects on all sites. So, we have separate user tables. However, because we’re using LDAP as our authentication system it doesn’t impact user management – all the user management happens external to drupal anyway.
However, since we’re using seperate databases to hold all those different user tables (and other site-specific stuff, of course) the db user that fedora uses to check user authentication and roles need to be given access to those databases. One could create a user and given them universal grants, but that seems…. suspect, from a security standpoint. So, each time we create a new project site we need to make sure to grant that user select privileges for the new database. Also, simply granting the user those privileges isn’t enough in itself, the fedora component also needs to be configured actually to check the new database. This is done by adding an additional connection specification in the …/server/config/filter-drupal.xml file.
I think that fedora makes a seperate DB connection for each entry in the file, so at some point one runs into issues of scalability, where for any kind of islandora access to fedora data the system is checking against N databases. Hopefully fedora uses some sort of connection pooling and caching system to mitigate this somewhat, but I don’t really know.
In summary, to set up a new islandora-enabled instance of a drupal multi-site:
- have LDAP installed and configured for some primary site (which for the purposes of the example below uses a database called main_drupal)
- do all the usual mutli-site set-up stuff
- in the new site’s settings.php file, specify the primary site db as the prefix for the ldap_authorization and ldap_servers tables
$databases = array (
'database' => 'sitedbname',
'username' => 'a_db_user',
'password' => ',jdFN3952oiU54h6n2o987ytglaKEn68Yu34',
'host' => 'mysql-machine.institution.edu',
'port' => '',
'driver' => 'mysql',
'prefix' => array(
'default' => ''
,'ldap_authorization' => 'main_drupal.'
,'ldap_servers' => 'main_drupal.'
- in the database that backs drupal, grant the fedora db user selection access to the new database (probably really only need access to a few specific tables (users, users_roles, role), though that’s more work to specify and maintain)
GRANT SELECT ON SITEDBNAME.* TO 'fedora_mysql_user'@'fedora-machine.institution.edu';
- on the fedora host add an entry to …/server/config/filter-drupal.xml for the new database
<connection server="mysql-machine.institution.edu" dbname="sitedbname" user="fedora_mysql_user" password="nRExw890zV34hl56N245AV078kk45" port="3306">
SELECT DISTINCT u.uid AS userid, u.name AS Name, u.pass AS Pass, r.name AS Role FROM (users u LEFT JOIN users_roles ON u.uid=users_roles.uid) LEFT JOIN role r ON r.rid=users_roles.rid WHERE u.name=? AND u.pass=?;
- don’t forget to restart fedora so that the new filter-drupal stuff is used