jump to navigation

Google Web Forms Mail Merge(-ish) April 3, 2015

Posted by ficial in brain dump, javascript, software, techy.
Tags: , , , ,
1 comment so far

TLDR: Skip to the end of this post to get the final app script function. Otherwise, onward….

I just finished up a fun project: getting a submit from a google web form to be emailed to someone in a nicely formatted template as a PDF (and archived in one or more google folders). It turns out that I’m not the first to want this kind of thing, and I was lucky enough to find an excellent post on this at http://tjhouston.com/2012/03/merge-info-from-google-forms-to-pdf-document-and-send-via-email/. TJ outlines the general process/approach, but uses APIs that are now deprecated, is missing a step that took me a bit to figure out (my guess is that google has changed things a bit since that post was written), and implements a slightly different code model than I ended up using.

application process / design outline

At a high level, the components used are a form, the underlying/linked sheet, a doc that’s used as a template, and a folder in which to archive the results (though if you just want to delete the results that’s easy enough). Also, since we’ll be sending the resulting PDF’s via email we’ll need a google account from which to send them. The end user fills out and submits the form. Google’s black box copies the form data into the underlying sheet, AND (this is key) fires the ‘onFormSubmit’ event for that sheet. The mail merge part of this is some custom code – it copies a template sheet, fills in the template using data from the form, saves it in the archives folder, and sends out a PDF copy. I’ll walk through creating a basic system, and hopefully you can subsequently adapt that to whatever you need.

pieces (docs & folders in drive)

The first step is creating the form. Log in using whatever account you want to be sending out the emails, go to drive, and create a new form:

gdoc_mailmerge_01_google_web_form

Once you’ve done that click the ‘View live form’ link (right-hand side of the secondary tool bar) and fill out the form and submit it, just to make sure things are working the way you expect.

gdoc_mailmerge_02_google_web_form_live

Then hop back to the form creation window and click ‘View responses’, choosing to create a new sheet. You should then be able to open that sheet and see your response.

gdoc_mailmerge_03_google_web_form_first_response

(NOTE: responses are actually _copied_ into the sheet – the form itself also records the responses, but they’re not individually visible from the form; generally this distinction is irrelevant, but when you’re resetting a form to clear out test data after development you should both delete the rows in the spreadsheet and delete the responses in the form (‘Delete all responses’ under the ‘Responses’ menu when editing the form)).

Now create your mail merge template document

gdoc_mailmerge_04_mail_merge_template

and your folder for holding/archiving the mail-merged responses, optionally with sub-folders (I’m doing this demo with sub-folders because it’s more complex – easier to go from a complex demo to a simpler implementation than the reverse).

gdoc_mailmerge_05_archive_folders

At this point all the pieces are in place and “all” that we have to do is code up the actual merge part. Luckily Google provides some tools that make this pretty easy. Hop back to the sheet view of the form responses and choose ‘Script editor’ from the ‘Tools’ menu. This will pop up a dialog for starting a new scripting project. Choose the first option – ‘Blank project’.

gdoc_mailmerge_06_script_editor_menu_action

gdoc_mailmerge_07_new_script_project

gdoc_mailmerge_08_new_script_project_initial

Be sure to save it at this point! You’ll need to pick a name as well, though it will be automatically tied to the sheet that is storing the form responses regardless of the name you choose.

pseudocode

In brief, here’s the plan
1) create a very simple function
2) verify that it runs from the script editor
3) add code to make a copy of the template document and put in in the archives folder
4) verify that copying a document works, and grant permissions to do its thing
5) hook the function up to the form submit event
6) verify that the function runs when the form is submitted
7) add code for a simple search and replace, and verify it works
8) show all the form data in a basic string dump, and verify it works
9) fill out the template correctly using the form data, and verify it works
10) send out an email of a PDF of the filled template, and verify it works
11) put extra references or copies of the filled template in the appropriate archive sub-folders, and verify it works

Here goes…

1) create a very simple function

Google scripting is really just javascript, but the context in which it’s executed means that some things aren’t available (e.g. alert, console.log, etc.). I’ve had limited success in finding good introductions to it, but http://www.i-programmer.info/programming/other-languages/4600-getting-started-with-google-app-script.html provides a reasonable overview. You might also check out the official google pages at https://www.google.com/script/start/ and https://developers.google.com/apps-script/overview.

Our initial very simple function does two main things: establishes the name of the function, and demonstrates how to generate simple debugging output. Delete all the initial code that’s shown in the project and replace it with:

function formSubmitHandler() {
  Logger.log('my formSubmitHandler runs!');
}

gdoc_mailmerge_09_script_simple_function

and save it.

2) verify that it runs from the script editor

Then run it by clicking the run button or by selecting the function name from the ‘Run’ menu.

gdoc_mailmerge_10_script_run_menu

 

or

gdoc_mailmerge_11_script_run_button

Now look at the logs to verify that it worked.

gdoc_mailmerge_12_script_logs_menu

gdoc_mailmerge_13_script_logs_shown

3) add code to make a copy of the template document and put in in the archives folder

Now we’re getting to the fun stuff :) To access the template document we’ll need its ID, and to put the copy in the archives folder we’ll need the ID of that folder as well. Actually, since this demo is using archive sub-folders we’ll need the ID of the ALL folder in there. You can get the relevant IDs by going to the document or folder in your browser and copying it from the URL bar:

gdoc_mailmerge_14_doc_id_template

gdoc_mailmerge_15_id_folder

store those in variable in the script:

var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

We’ll also need a name for the new file:

var newFileName = 'TEST_new_doc';

To actually create the new file we’ll use one of the services that come with App Script – DriveApp. This is a javascript object that “Allows scripts to create, find, and modify files and folders in Google Drive.” See https://developers.google.com/apps-script/reference/drive/drive-app for details. In general the reference docs are good, as long as you know what you’re looking for. Basically, we use the ID of the template to get that file, then make a copy with a new name (‘newFileName’) in a new location (the folder for ‘archiveFolderId_ALL’).

var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);

All together it looks like this:

function formSubmitHandler() {
  Logger.log('my formSubmitHandler runs!');

  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
}

4) verify that copying a document works, and grant permissions to do its thing

Run it now. It will tell you that it needs authorization to run – go ahead and grant that access.

gdoc_mailmerge_17_permision_needed_alert

gdoc_mailmerge_18_permision_needed_dialog

After the script runs you should now be able to go to your archive folder and see the new document.

 

gdoc_mailmerge_19_new_test_file_in_folder

Now delete that file – the next step will be creating a new one and having multiple around with the same name gets confusing (later we’ll add unique tags to distinguish them).

5) hook the function up to the form submit event

Hop back to the script editor. Then under the ‘Resources’ menu select the ‘All your triggers’ item.

gdoc_mailmerge_20_triggers_menu

Delete anything that’s there (click the X box on the left of each item) until the triggers list is empty:

gdoc_mailmerge_21_triggers_empty_list

then click on the link to add a new trigger. In the ‘Run’ column you should see the name of your function. Under the ‘Events’ column, select ‘From spreadsheet’, then set the next field to the right to ‘On form submit’.

gdoc_mailmerge_22_triggers_form_submit

Save it – you’ll need to give additional authorization.

6) verify that the function runs when the form is submitted

Now hop over to your form – the live view – fill it out and click the Submit button. In a few seconds a new TEST_new_doc should show up in your archives folder. Once you’ve verified that everything is working delete that test doc.

7) add code for a simple search and replace, and verify it works

To do this we need access to the actual stuff inside the copy. We have the copy as a file, but we need it opened as a document, and then need access to the body of the document.

var copyDoc = DocumentApp.openById(copyFile.getId());
var copyBody = copyDoc.getActiveSection();

Once we have the body we use the replaceText function to find one of the KEY_blahblahblah string in the copy and replace it with whatever value we like. After than the document is saved and closed.

copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT');
copyDoc.saveAndClose();

We can also remove the logging line at this point. Altogether our function now looks like:

function formSubmitHandler() {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT');

  copyDoc.saveAndClose();
}

Now you MUST run this in the script editor before trying an actual form submission – this is necessary to grant authorization. If you just try to submit a form right away then nothing useful will happen.

After that go to the form again and submit another response. Open up the resulting document and see that ‘TEST_REPLACEMENT’ shows up (and go ahead and open your original template and verify that the KEY_ is still there – i.e. make sure we didn’t somehow run the replace on our template instead of our copy).

gdoc_mailmerge_23_first_replacement

8) show all the form data in a basic string dump, and verify it works

To get the actual form data we need a parameter added to our function to catch the info that the form is sending.

function formSubmitHandler(evt) {
...
}

Then we’ll use another service to see what info is available to us – convert the event to a JSON string and put it in the copy so we can inspect it.

copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));

However, if you run this from the script editor you’ll get nothing useful, because the evt only has info if there’s an actual form submission. So we add a conditional to handle that case:

if (evt) {
  copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));
} else {
  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
}

The full function now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  if (evt) {
    copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }

  copyDoc.saveAndClose();
}

Make another form submit and check out the resulting doc – you should see something like:

gdoc_mailmerge_24_event_data_replacement

That’s easier to understand when it’s formatted a little more cleanly:

gdoc_mailmerge_25_event_data_formatted

The key take-aways from this are that the form data is available in two ways (namedValues and values), and that there is a timestamp available even though the form doesn’t explicitly indicate that. Further, the namedValues data structure labels things by the text of the question in the form, and the values data structure shows things in the order that the form displays them. I prefer to use namedValues because I think it’s a little more robust and self-documenting in the long run. However, the timestamp there doesn’t have a real label. To put one there edit the reponses sheet and add a label at the top of the column that has the timestamps. Now when you submit the form and look at the data you’ll see something like

gdoc_mailmerge_26_sheet_no_timestamp_label

gdoc_mailmerge_27_sheet_with_timestamp_label

gdoc_mailmerge_28_event_data_with_timestamp_label

9) fill out the template correctly using the form data, and verify it works

Keep a copy of the JSON dump somewhere as a reference. Now we can do the actual filling out of the form, as well as make the file name for each submission unique. First, the file name. We could use the TIMESTAMP value from the form submission, but it’s actually easier just to do the data calculation right in the javascript (which also allows us to have unique filename even when the code is run from the script editor). However, we do want to get the user’s name from the form data, if it’s available.

var newFileName = 'TEST_new_doc';
if (evt) {
  newFileName = 'from ' + evt.namedValues['Name'];
}
newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');

The next step is the substitution in of the form values. Rather than doing this individually I prefer building a more general structure. We’ll create a map that links each key in the template to the corresponding form question. Then we can loop through the map to do the substitutions – this allows future changes to the form and/or template to be accounted for in the data portion of our script rather than execution portion, which greatly eases maintenance and reduces bugs. This map could be keyed using the form questions instead of the KEY_ strings, but using the KEY_ strings makes it easier to put appropriate placeholders in the copy.

// map KEY_ strings in the template to
var standard_keys_lookup = {
  'KEY_timestamp':'TIMESTAMP',
  'KEY_person':'Name',
  'KEY_reason':'Why are you interested in this?',
  'KEY_description':'This is...'
};
// handy access for looping-
var standard_keys = Object.keys(standard_keys_lookup);

Then the actual substitution looks like:

if (evt) {
// loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
  for (var i=0; i<standard_keys.length; i++) {
    var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
    if (! v) {
      v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
    }
    copyBody.replaceText(standard_keys[i], v);
  }
} else {
  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
}

That loop is might need a bit of explaining… The ‘standard_keys’ array contains the keys of the map – i.e. ‘KEY_timestamp’, ‘KEY_person’, etc. in no particular order. When executing the loop the current key is ‘standard_keys[i]’. The corresponding question text is found by using that key in the lookup table – the current question text is standard_keys_lookup[standard_keys[i]]. The actual form value is found by using the question text as a key in the evt.namedValues map – evt.namedValues[standard_keys_lookup[standard_keys[i]]]. The empty string is appended to that to force the type to be a string. Then if that is false (i.e. has no value), instead we use the placeholder ‘nothing given for ‘+standard_keys_lookup[standard_keys[i]] (which will evaluate to something like ‘nothing given for “Name”‘ when this is executed). Whatever that value is then is used to do the replacement – copyBody.replaceText(standard_keys[i], v).

The full function at this point looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');

  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };

  // handy access for looping-
  var standard_keys = Object.keys(standard_keys_lookup);

  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }

  copyDoc.saveAndClose();
}

With this approach the form questions can be freely rearranged without breaking anything, and even new questions can be added without breaking anything. To add a new question to the merge process, just put a KEY_ string in an appropriate place in the template and add a ‘KEY_’:’Question text’ entry to standard_keys_lookup.

Run a test from the script editor, then do a form submission. You should get unique file names, and the contents should be appropriately filled in.

gdoc_mailmerge_29_timestamped_file_names

gdoc_mailmerge_30_copy_file_with_replacements

10) send out an email of a PDF of the filled template, and verify it works

If you don’t want PDF’s emailed out then this can be skipped. Otherwise it goes after the document saveAndClose call. To create the PDF we again use the handy tools that Google provides – this is REALLY easy, especially compared to all the hoops one has to jump through to get other systems to do this kind of thing.

var pdfCopy = copyFile.getAs("application/pdf");

Building and sending the mail is similarly straightforward. I put the subject and body in separate variables for clarity.

var subject = "Mail Merge Demo Submission - "+newFileName;
var body = "Here is "+newFileName;
MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});

The full function now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';
  
  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');
  
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
  
  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();
  
  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };
  
  // handy access for looping-
  var standard_keys = Object.keys(standard_keys_lookup);
  
  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }
  
  // don't forget to save all those changes!
  copyDoc.saveAndClose();
  
  // send out the PDF copy
  var pdfCopy = copyFile.getAs("application/pdf");
  var subject = "Mail Merge Demo Submission - "+newFileName;
  var body = "Here is "+newFileName;
  MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});
}

At this point you MUST run it from the script editor again to grant it the authorization to send out emails. You should then get an email with an attached PDF!

gdoc_mailmerge_31_email_received

gdoc_mailmerge_32_pdf_version

Another form submission at this point should result in you getting an appropriately filled out PDF.

VICTORY!

11) put extra references or copies of the filled template in the appropriate archive sub-folders, and verify it works

This is another optional step. For many situtations it’s unnecessary, but in the few where it’s needed it’s very, very nice. In my case I was building a system to support on-line job applications. One of the form questions let the user check off all the jobs for which they wanted to apply. Putting the copy in folders split by job saved a LOT of time and hassle for the people evaluating the applications. An important note here is that Google Drive is not quite like the hard drive on your computer. The ‘folders’ that Google shows are really more like tags on the files. Unless you explicitly make a copy of a document, putting it in a folder just adds another tag to it, which means that when/if the document is deleted it disappears from ALL the folders at once. In some situations this may be exactly the desired behavior. In others you may want to put a real copy in each folder so that the actions in one folder don’t impact the contents of another folder.

To specify in which folder the filled template will be placed we use another lookup:

var subfolder_values_lookups = {
  'awesome':'0Bzufe4iO7j5HSWRDQmNJd3hoRkE',
  'rad':'0Bzufe4iO7j5HaXlKbGEzblVKTG8',
  'cool':'0Bzufe4iO7j5HZE8yZzdtNk9uLTA',
  'other':'0Bzufe4iO7j5HbWRYbnN0RUNsbnM'
};

// handy access for lookups
var subfolder_values = Object.keys(subfolder_values_lookups);

Then it’s just a matter of inspecting the appropriate form field(s) value(s). In this example we’re using the value of a multi-select checkbox question. Looking back at the JSON output of the event it appears that the values are joined together using commas (not sure how it handles values that have commas in them, though I’m guessing no differently, which means processing that situation would require a bit more work (probably via sub-string checking rather than the method I’m using here)). So, I’ll split the values of that field into an array and loop through it, doing the sub-folder placement as needed. There’s a bit of trickiness where I constrain values to ones that are avaialable in my lookup table keys.

// create an array of the values chosen for the multi-checkbox question
var form_values = (evt.namedValues['This is...']+"").split(",");

for (var i=0; i<form_values.length; i++) {
  var lookup_key = form_values[i];

  // constrain non-standard keys to 'other'
  if (subfolder_values.indexOf(lookup_key) < 0) {
    lookup_key = 'other';
  }

  subfolder_values_lookups[lookup_key].addFile(copyFile);
}

All of this need to be placed in a condition that makes sure there’s an event. This code could be put inside the existing check, but I find the function as a whole easier to read/understand (and not too much more complicated to maintain / extend) if I put this stuff at the end and just wrap the section in its own, identical conditional.

The completed function

This copies the template and fills it out, sends out a PDF, and puts the filled copy in appropriate sub-folders, now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';
  
  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');
  
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
  
  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();
  
  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };
  
  // handy access for looping
  var standard_keys = Object.keys(standard_keys_lookup);
  
  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }
  
  // don't forget to save all those changes!
  copyDoc.saveAndClose();
  
  //--------------------------
  // send out the PDF copy
  var pdfCopy = copyFile.getAs("application/pdf");
  var subject = "Mail Merge Demo Submission - "+newFileName;
  var body = "Here is "+newFileName;
  MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});
  
  //--------------------------
  // put in sub-folders
  if (evt) {
    var subfolder_values_lookups = {
      'awesome':DriveApp.getFolderById('0BziO7j5HSufe4WRDQmNJd3hoRkE'),
      'rad':DriveApp.getFolderById('0Bzufe4iO7lKbGEzblj5HaXVKTG8'),
      'cool':DriveApp.getFolderById('0Bzufe4iO7j5HZEtNk98yZzduLTA'),
      'other':DriveApp.getFolderById('0BzufebnN0R4iO7j5HbWRYUNsbnM')
    };
  
    // handy access for lookups
    var subfolder_values = Object.keys(subfolder_values_lookups);
    
    // create an array of the values chosen for the multi-checkbox question
    var form_values = (evt.namedValues['This is...']+"").split(",");
    
    for (var i=0; i<form_values.length; i++) {
      var lookup_key = form_values[i];
    
      // constrain non-standard keys to 'other'
      if (subfolder_values.indexOf(lookup_key) < 0) {
        lookup_key = 'other';
      }
   
      subfolder_values_lookups[lookup_key].addFile(copyFile);
    }
  }
 
}

In and not in my head March 27, 2015

Posted by ficial in brain dump.
add a comment

One of the things I find especially challenging is pushing things from the world inside my mind to the world outside (the so-called ‘real world’). I grapple with understanding the factors that affect the difficulty of enacting that transition (I’m guessing most people are always dealing with that one way or another, if not always so explicitly). The, for want of a better word, size of something is sometimes an issue, but at other times seems to be irrelevant. In some instances it feels easy to build, enact, enable, do for long duration and at large scale to make whatever is in my mind exist outside of it. In other cases simply writing or speaking even a single sentence is insurmountable, and not due to any particular content of the sentence – it’s just that the energy, or whatever, required to breach reality’s barrier is beyond what I can achieve.

One of the reasons that I like working with / on computers is that (for me) at times they make that process at least a bit easier, though certainly not always nor for everything. I also find talking and working with people can be similar for me in this respect – in addition to new insights, viewpoints, skills, load-sharing, etc. that other people bring, the active presence of and interaction with other people makes it easier for me to make things ‘real’. The word ‘real’ is in quotes there because it’s not quite the word I want, but I can’t find any closer one. Whatever-it-is is already real in my mind. All that making it ‘really real’ does is allow others (people or things, sentient or otherwise) to experience/use/understand/interact-with it as well.

There are many directions this sort of thinking can go: communication is/as action/creation and vice versa, the observer-influenced/determined nature of reality, the limits and conditions of transitions, the relation between stories and art and math, energy gradients and tunneling, memes and the noosphere, mind-body duality as a limited view on a line that goes further in both directions, the characteristics and ‘cardinalities’ of multiverses, and so on …but hauling more of it out and/or in any kind of detail is more than I can do right now). My questions / discussion points for the moment are: why is it sometimes so easy and sometimes so hard, and what can I (or, more to the point, anyone) do change/control that?

Academic/research software, and rogue-like game stuff January 12, 2015

Posted by ficial in computer games, games, open source, software, techy.
add a comment

This blog is still live, just… sporadic :)

I’ve wrapped up a couple of projects relatively recently that might be of general interest. Both are java applications for pretty specific areas of study. The first is actually a work project, while the second is a personal one.

GeoShear (on GitHub and at Williams) is an application to aid structural geology research and teaching. It models shearing deformations (simple and pure), providing both an interactive visual interface and exportable quantitative data. It was created in collaboration with Charles L. MacMillan Professor of Natural Sciences Paul Karabinos as a part of NSF grant 0942313 – “Visualizing Strain in Rocks with Interactive Computer Programs”. In brief, it lets you mark a set of ellipses (representing cross-sections of pebbles) and then apply a shearing transformation to them. Charts of the pebble attributes are updated as shear is applied, so you can easily see the connection between them. The file format used is a simple tab-delimited one, so data can be entered in a spreadsheet if desired, and/or a spreadsheet can be used for further work post-deformation.

TideMiner is a smaller, simpler tool (also available on GitHub) that’s used to calculate the flooding frequency and duration of one or more given elevations, with tide levels from tide station data from NOAA. It works best with intervals of an hour or less. Basically, get your tide data from NOAA, load it into TideMiner, type or paste in the elevations of concern, and click the analyze button. The results can be saved in a tab-delimited format or copied and pasted directly into a spreadsheet for further work. This one is personal both in that it was not for work and in that I created it as a gift for my father (and the larger saltmarsh ecology community).

On a slightly different note, I have an ongoing fascination with rogue-like games, starting with Moria, then Angband (and a number of its variants) and ADOM. As a coder, game designer, and GM I’ve long toyed around with the idea of putting one together but somehow never quite gotten around to it. Over the recent holidays I decided that it was time to do more serious hacking around. I don’t necessarily expect to release any particular game (though I’ll put something up somewhere at some point – how’s that for a commitment :) ), but after a couple of weeks I have a solid core engine working and I’m getting to toy with fun things like AI code and ecosystem engineering. For my first real project in this realm I decided on Javascript because it’s so very adaptable/flexible (I may switch to Java for my next one, but JS has so far worked out pretty well so maybe not…) and requires so little in the way of infrastructure. It helps that one of the excellent tutorials I came across (I particularly recommend Trystan’s Blog for java and Coding Cookies for javascript) is based on rot.js, a really good rogue-like library for javascript.

Islandora 7 – permissions for multisite and XACML June 30, 2014

Posted by ficial in islandora, multisite, techy, XACML.
2 comments

TLDR:

We used drupal-multisite to organize our repository. Permissions/access is a challenge. We used namespaces that are unique to each collection for site-level access, which required a bit of custom coding to support and has some limitations, but took much less coding and is more stable than the other main option we considered (implementing real per-collection access). Supporting XACML effectively across multiple sites then requires separate user tables for each site (backed by LDAP to unify login credentials), a separate entry in the fedora filter-drupal.xml file, and appropriate privileges granted on the DB for that site for the mysql user that filter-drupal specifies. Setting up a new site has three separate areas that require action:

  • drupal : create the site using standard multi-site approach (share ldap_authorization and ldap_servers tables)
  • mysql : grant access to the new database to the user that fedora uses to check credentials
  • fedora : add an entry for the new site/DB to …/server/config/filter-drupal.xml, then restart fedora

LONG:

In planning our repository a major challenge was for us to present and to control access to and mangement of our objects in a way that more-or-less matches how our users want and need things to work. Conceptually, our system has a tree-like structure, with potential for cross-connections. At the root is our over-all site http://unbound.williams.edu, which provides a face for the program / system as a whole and a convenient place / way to search across all the (unrestricted) objects in our system. From there we have project sites, which correspond to a particular department (e.g. http://unbound.williams.edu/williamsarchives), institutional project (http://unbound.williams.edu/facultypublications), or individual project (http://unbound.williams.edu/mayamotuldesanjosearchaeology). Within a given project there might be a single collection or multiple collections. A person or department might in turn have a single project or multiple projects, or might be involved in different projects and/or collection in different ways (e.g. managing one project, contributing to another collection, with read-only to a third, protected collection).

Setting up the technical infrastructure and processes to support the above model was (and continues to be) challenging. We used a drupal multi-site system to organize the main site and project sites. We leveraged islandora’s built-in namespace restriction capabilities to limit given collections to given sites. We did this by associating each collection with a unique namespace. This allows us to very easily include a given collection in multiple projects (e.g. the faculty articles collection might be in both the faculty publications project and the archives project). Essentially, we wanted to be able to support object access on a per-collection basis, but the built-in support only worked with namespaces, so we made them (semi-) synonymous. There were a couple of technical challenges to making this work, and there are also some less-than-ideal limitations that go with this approach.

On the technical side, there are two places that namespace restrictions come in to play: repository access and search access. On the back end there seems to be no limit to the number of namespaces that can be specified for these two areas, but the web form elements that are used for them limit the content to something too small for our purposes. We went through two levels of work-around here. First, we changed the form elements for those field from basic inputs to text area / paragraph inputs. However, we still had the problem that there were two separate places where namespaces had to be managed, which could easily lead to problems that would greatly impact user experience. So, we created a custom module that provides a single interface that controls both areas – the namespace list that’s entered in that one field is used to set both the SOLR preferences and the site namespace config values. With this in place our namespace list for a given site might become pretty long, but it’s easy enough to manage and we don’t ever end up in a situation where there’s a mis-match between the search-based access and the repository/site-based access.

On the data structure side of things this approach creates some hard limits in what we can do. We’re trying to emulate collection-based access control, but this doesn’t do that exactly. It fails in two main ways. First, an object’s namespace isn’t necessarily the same as that of a collection that contains that object. In the case when an object is in more than one collection then we’re guaranteed that there’s a mis-match for at least one of the collections. To try to get around this we more finely divide our object sets than we otherwise might and use the site-level grouping to bring them together rather than collection-level grouping. Second, we lose hierarchical object access control. In a pure collection based approach we would be able to nest collections and specify access by the top-level collection, but since each collection is it’s own namespace we have to manually manage access to whole hierarchies as individual elements. Neither of those two limitations are game-stoppers, but they do need to be taken into consideration when ingesting a new set of objects and setting up new projects an collections.

In an ideal world we’d have used collection membership directly for access control, but doing so would have required rather a lot of custom coding to implement. Essentially we’d have had to create a whole new set of fields and corresponding web forms that paralleled the namespace ones. Additionally, to make hierarchical collection membership work appropriately we’d have to get tangled in building and maintaining additional relationship fields in the RELS_EXT datastream. All certainly possible, but in our situation it required too much work and was too prone to implementation errors. We deliberately sacrificed functionality to gain stability and low technical investment and upkeep. So far it’s working OK for us.

Though we’re using namespaces as the primary way of associating given collections with given sites we still have the challenge of restricting access to collections (and indovidual objects) within a site via XACML. There are some subtleties in this due to how fedora checks permissions. Essentially, fedora has a component that checks in with the drupal database to verify that a user is authenticated and to check what roles the user has. This is explained briefly in the ‘Configure the Drupal Servlet Filter’ section at https://wiki.duraspace.org/pages/viewpage.action?pageId=34638844, with a very general directive to “use the Drupal LDAP module” to avoid difficulties in too-much-access. Making all that actually work required a certain amount of further research and experimentation for us.

We use LDAP for our central authentication system, and connect to it for our islandora system using LDAP for drupal 7. That package has a lot of sub-pieces, only three of which we found necessary to get things working: LDAP Authentication, LDAP Authorization, and LDAP Authorization – Drupal Roles (though one could probably get away with just the first). Once that’s set up for our main site we can simplify spinning up additional sites by sharing two key tables across the sites: ldap_authorization and ldap_servers. The modules still need to be enabled for new sites, but since the tables are shared no additional configuration is needed. Additionally, if our LDAP config needs to be changed then doing it once automatically ensures it works for all the sites. We do the table sharing by setting up one drupal as the primary install (in our case it’s our main site, using a database named main_drupal) and using the prefix attribute of the databases settings variable in the individual site settings files. (see below for an example)

We originally shared the user tables as well, but that caused serious problems when trying to use XACML to control object access by role. The fedora component that checks in with drupal about user validation and roles has an interesting behaviour where it combines all roles that a given username-password combination has across all sites. So, with a single, shared user table a user effectively has the same username and password for all sites, which means that the user would get for all sites any role they have on any site. In other words, making a user admin on one site would given them admin access to all objects on all sites. So, we have separate user tables. However, because we’re using LDAP as our authentication system it doesn’t impact user management – all the user management happens external to drupal anyway.

However, since we’re using seperate databases to hold all those different user tables (and other site-specific stuff, of course) the db user that fedora uses to check user authentication and roles need to be given access to those databases. One could create a user and given them universal grants, but that seems…. suspect, from a security standpoint. So, each time we create a new project site we need to make sure to grant that user select privileges for the new database. Also, simply granting the user those privileges isn’t enough in itself, the fedora component also needs to be configured actually to check the new database. This is done by adding an additional connection specification in the …/server/config/filter-drupal.xml file.

I think that fedora makes a seperate DB connection for each entry in the file, so at some point one runs into issues of scalability, where for any kind of islandora access to fedora data the system is checking against N databases. Hopefully fedora uses some sort of connection pooling and caching system to mitigate this somewhat, but I don’t really know.

In summary, to set up a new islandora-enabled instance of a drupal multi-site:

  1. have LDAP installed and configured for some primary site (which for the purposes of the example below uses a database called main_drupal)
  2. do all the usual mutli-site set-up stuff
  3. in the new site’s settings.php file, specify the primary site db as the prefix for the ldap_authorization and ldap_servers tables

    $databases = array (
      'default' =>
      array (
        'default' =>
        array (
          'database' => 'sitedbname',
          'username' => 'a_db_user',
          'password' => ',jdFN3952oiU54h6n2o987ytglaKEn68Yu34',
          'host' => 'mysql-machine.institution.edu',
          'port' => '',
          'driver' => 'mysql',
          'prefix' => array(
            'default' => ''
            ,'ldap_authorization' => 'main_drupal.'
            ,'ldap_servers' => 'main_drupal.'
          ),
        ),
      ),
    );
  4. in the database that backs drupal, grant the fedora db user selection access to the new database (probably really only need access to a few specific tables (users, users_roles, role), though that’s more work to specify and maintain)

    GRANT SELECT ON SITEDBNAME.* TO 'fedora_mysql_user'@'fedora-machine.institution.edu';
  5. on the fedora host add an entry to …/server/config/filter-drupal.xml for the new database


    <connection server="mysql-machine.institution.edu" dbname="sitedbname" user="fedora_mysql_user" password="nRExw890zV34hl56N245AV078kk45" port="3306">
      <sql>
       SELECT DISTINCT u.uid AS userid, u.name AS Name, u.pass AS Pass, r.name AS Role FROM (users u LEFT JOIN users_roles ON u.uid=users_roles.uid) LEFT JOIN role r ON r.rid=users_roles.rid WHERE u.name=? AND u.pass=?;
      </sql>
    </connection>

  6. don’t forget to restart fedora so that the new filter-drupal stuff is used

Islandora 7 – splitting CSV data on ingest June 24, 2014

Posted by ficial in code fixes, islandora, techy, xsl.
add a comment

TLDR:
It’s tricky to tokenize CSV values on ingest using a MODS form. To do so, create a self-transform XSL and manually tokenize the appropriate fields – create an XSL to do the tokenizing in …./sites/all/modules/islandora_xml_forms/builder/self_transforms/, then set that as the self-transform for the relevant form. You’ll need to create your own CSV tokenizer since Islandora 7 uses an older version of XSL. See below for example code.

LONG FORM:
In our Islandora install we’re using MODS as the main meta-data schema. That is, the ingest forms are set up for generating MODS XML. However, the way the form is set up is anti-helpful for some of the people that are doing our data loads. Specifically, the subject-topic, subject-geographic, and subject-temporal fields were not being processed as people expected.

Those three fields are multi-value ones, meaning they support a structure like:

...
<subject>
  <topic>cows</topic>
  <topic>bovines</topic>
  <topic>farm animals</topic>
  <geographic>field</geographic>
  <geographic>farm</geographic>
  <temporal>historic</temporal>
  <temporal>1800s</temporal>
</subject>
...

However, when using the form we want to be able to enter them as CSV values – e.g. ‘cows, bovines, farm animals’. Unfortunately, the default behavior is to treat such as a single value, giving a result like:

...
<subject>
  <topic>cows, bovines, farm animals</topic>
  <geographic>field, farm</geographic>
  <temporal>historic, 1800s</temporal>
</subject>
...

The Islandora 7 ingest forms system does provide a place where this can be corrected, but it’s subtle and tricky. Specifically, one has to create an XSL to do the proper tokenizing and set that up as a ‘self transform’ for the form. Creating the tokenizing XSL is in turn made more difficult because Islandora 7 uses XSL earlier than 2.0, which means that there is no built in tokekizing function. The place this needs to be done is in …/sites/all/modules/islandora_xml_forms/builder/self_transforms/, which took me a while to find because I was mis-lead by the ‘builder’ folder – code in that folder relates not only to the building of forms, but also the using/processing of forms.

Following some suggestions on various sites, I organized my tokenizing code in a separate file and included/imported it into the self-transform. Here’s where I ended up:

TOKENIZER (csv_tokenizer.xsl):
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:mods="http://www.loc.gov/mods/v3">
    <xsl:template name="csvtokenizer" >
      <xsl:param name="commaStr"/>
      <xsl:param name="tagLabel"/>
      <xsl:if test="normalize-space($commaStr) != ''">
        <xsl:choose>
          <xsl:when test="contains($commaStr, ',')">
            <xsl:call-template name="csvtokenizer">
              <xsl:with-param name="commaStr" select="substring-before($commaStr,',')"/>
              <xsl:with-param name="tagLabel" select="$tagLabel"/>
            </xsl:call-template>
            <xsl:call-template name="csvtokenizer">
              <xsl:with-param name="commaStr" select="substring-after($commaStr,',')"/>
              <xsl:with-param name="tagLabel" select="$tagLabel"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <xsl:if test="normalize-space($tagLabel) != ''">
              <xsl:element name="{$tagLabel}">
                <xsl:value-of select="substring($commaStr, string-length(substring-before($commaStr, substring(normalize-space($commaStr), 1, 1))) +   1)"/>
              </xsl:element>
            </xsl:if>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:if>
    </xsl:template>
</xsl:stylesheet>

SELF TRANSFORM (cleanup_mods.xsl - NOTE: this also removes empty fields):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mods="http://www.loc.gov/mods/v3">
<xsl:import href="csv_tokenizer.xsl"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" media-type="text/xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[not(node())]"/>
<xsl:template match="mods:subject/mods:topic">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:topic'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="mods:subject/mods:geographic">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:geographic'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="mods:subject/mods:temporal">
  <xsl:call-template name="csvtokenizer">
    <xsl:with-param name="commaStr" select="normalize-space(.)"/>
    <xsl:with-param name="tagLabel" select="'mods:temporal'"/>
  </xsl:call-template>
</xsl:template>
<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()[normalize-space()]|@*[normalize-space()]"/>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

I could have combined the three tokenizing template matches into a single one with or-ed parameters and dynamic tag label, but I find the code here much easier to read and the maintenance cost very low.

The self-transform runs before any other transforms, so the splitting done here propagates downstreams without any further work.

Follow

Get every new post delivered to your Inbox.

Join 28 other followers