jump to navigation

Protecting Privacy / Anonymity of an Online Identity February 17, 2017

Posted by ficial in brain dump, techy.
add a comment

TLDR:

Here are the easy steps-

1. Don’t say any / give any information that’s in any way connected to your offline identity (including avoiding biometric security)
2. Keep each private identity as separate from all other identities as possible.
3. Don’t use tablets or phones to access your private identity.
4. Use a randomizing service for creating email addresses, security question answers, passwords, etc.
5. Ideally, use the TOR browser to access your private identity. If not that, at least disable javascript, all ads, and any other active content, plus use private mode to access your private identity. And always use https. Preferentially use services that don’t track you (e.g. duckduckgo for search).
6. Limit what each identity is used for.

As always with security, nothing is ever completely safe. All mitigation efforts should be approached not as guaranteeing privacy, but simply making things a bit harder for an antagonist. Hopefully the antagonists resources and/or attention are limited to the point where the steps you take maintain your privacy.

——————————————————————————–

There are three main actions an antagonist can take to attack the privacy of an online identity:
– coalesce : use overlapping aspects to link one online identity to another (i.e. note that two separate online identities are controlled by the same person), unifying them into a single online identity
– clarify : discover information that links an online identity to an offline one
– cohere : discover information that makes it easier to link one online identity to another; make an online identity ‘stickier’
Each of these actions makes all subsequent actions easier, creating a positive feedback network. Maintaining privacy requires addressing each area of attack. There are two basic approaches to mitigating each action. First, to the extent possible avoid providing information that supports that action. This works directly against the action. Second, provide misleading information to lead an antagonist to incorrect conclusions. This also works directly against the action, and has the secondary benefit of potentially disrupting the feedback network. Since the privacy of a known identity is, by definition, already compromised the remainder of this discussion focuses on anonymous and pseudonymous identities.

All three kinds of actions essentially boil down to adding or marking aspects of those different identities that overlap until one has enough overlap to safely consider them the same. Coalescing is matching non-known identities using various markers. Clarifying is matching an online identity with a (unique) offline one. Cohering is adding more markers thus making the previous two actions possible / more likely. Here are five categories of markers that might be used, from easiest to hardest; interestingly, that’s the both the difficulty for you to mitigate those kinds of markers, and the difficulty for an antagonist to make use of them.

1. ANY offline identity information (not only name, mailing address, demographics, etc., but also any mention at all of local geography, employment, familial connections, social connections, political connections, any biometric info, answers to security questions, etc.). This is probably the easiest sort of marker to manage – simply don’t provide that information. Some of that is easy – e.g. not using your real name – while others might require more self discipline – e.g. not mentioning the local weather, or avoiding passing along a link for some funny cat video. Also of note here is that a biometric security element can increase your data protection while decreasing your identity protection, so for any identity where privacy is tantamount look to security measures other than biometrics.

2. arbitrary information (usernames, passwords, email addresses, answers to security questions, expressed opinions, etc.). This is also fairly easy to manage, though it generally necessitates a bit more work. Where offline identity information can be excluded you can be required to provide arbitrary information. The two keys here are avoiding any connection to your offline identity, and avoiding any connections to any other online identities. The most straightforward way I can think of doing this is by relying on randomizers. Fully arbitrary information can use random strings, while human-readable info (e.g. email addresses) are better generated from syllables or words. The main challenge here is just keeping track of that information for your own use when you need it. NOTE: just in case it wasn’t completely clear, each identity should have a separate email address, password, security answers, etc.

3. passive technical fingerprints (basic browser/computer data, browser cookies, IP address, MAC address, timing of activity, etc.). These are things that your browser/computer sends out either by default or as a necessary part of being online. There are technical and legal barriers to an antagonist getting and/or using some of this information, but it’s also harder for you to manage. As a very simple, minimally disruptive step, get and use the TOR browser, use services that don’t track your info (such as duckduckgo for search), and always use https. Further mitigation can require significant changes in behavior and / or technical skill to deal with, though there are simple things that can help here too. A person with a lot of technical skill/knowledge can get caught up in dealing with these kinds of things and forget about the other kinds of markers that would let an antagonist simply bypass all the measures they put in place – if you have that skill and inclination, be careful not to neglect the other categories. Passive technical information can be broken into two main sub-sections: the device you’re using, and how you’re accessing the internet. For your device itself, that means the settings of the device, and the browser or other software you’re using online. The device and software should both be configured to send out the minimum information required to function, and/or to send out deliberately misleading information. Turn off bluetooth, cell modems, and other such things. Use private / anonymized, independent browser sessions (keeping in mind that your ISP can still see what data your computer sends out and receives). Save as little data as possible in your browser (no passwords, no form auto-fill, maybe even no bookmarks). You have much less control over the information related to accessing the internet, as a lot of that information has to do with the ISP you’re using. The easiest way to deal with that is to add variety. At home, cycling your modem or router may get you new IP addresses (depends on how exactly your ISP works) – do some research to find out before starting to do that on a regular basis, and keep in mind that your ISP will still have records that track your previous information. More broadly, be active in different places, especially in places/ways that are hard to correlate with other things (e.g. pay cash at the cafe where you’re using the wifi). Anywhere there’s free open wifi (or closed wifi where you can use non-identifying info/credentials) is potentially usable: restaurants (fast food , cafe, bar, etc.), transportation hubs, libraries, schools/colleges, museums, etc. If possible vary your activity times as well. For all of this kind of thing, app based devices (tablets, phones, etc.) are harder to secure because of the nature of the software acquisition/installation, the limits to user control of the devices, and the typical way they connect to the internet using a cell network.

4. behavioral fingerprints (word choices, idioms, topics covered, writing style, media subjects, realm of activity, browsing choices, social connections, etc.). This kind of marker is quite hard to mitigate because it’s tied to the heart of who you are and what you do. On the plus side, checking these kinds of markers is relatively hard. These kinds of things can also be a little bit harder for an antagonist to use passively, which means you’re less likely to be compromised by them by accident, but on the flip side if they are being used against an identity it means that identity has already somehow caught the attention of a strong antagonist (and you likely don’t even know, since checking these things generally won’t send any kinds of alerts to you). These markers tend to give probabilistic results rather than direct matches, and even getting those requires quite a bit of knowledge, skill, and work. The simplest mitigation here is compartmentalization – keep an identity focused on one particular thing/area and keep it away from all others, especially ones in which you have a different identity active. Related to that, keep any social connections for this identity separate and different from that of other identities. The second relatively easy thing to do here is to limit what is available to use. Making these kinds of fingerprints requires content, and the more content available the better quality the fingerprint. So, don’t write much / provide much media, and use a plain/standard style as much as you can. Using ephemeral identities can also help a bit for this realm.

5. active technical fingerprints (non-cookie tracking, script-based browser/computer analysis, detailed timestamp matching, malware, etc.). This kind of marker is tied to the deeper aspects of the technology you use, from the browser/device you use to the services/sites you access. Dealing with this category of marker requires an antagonist to have a lot of technical skill / resources. These kinds of things are also often limited particular sites/services. Probably the easiest way to deal with these (though still not actually easy) is to use ephemeral virtual machines along with highly compartmentalized identities. That being said, turning off javascript and any other active page elements, using an ad-blocker, and avoiding suspicious files / links can also help a lot with this kind of thing, and those are actions that are easy for a non-technical user to do with only a little research/work and discipline.

Online Identities: Transparency and Separation February 14, 2017

Posted by ficial in brain dump, techy.
add a comment

There are two main axes to consider when thinking about the privacy of an online identity: transparency and separation. Transparency is a measure of how closely linked an online identity is to an offline one. This ranges from recognized to pseudonymous to anonymous. Separation is a measure of how connected an online identity is to another online identity of the same person. Separation ranges from associated (least separate) to isolated (most separate).

In recognized transparency level there is a clear link between your online and offline life – your online identity uses your real name, notes your physical address, has your main phone number, references other aspects of your life (work, family, etc.), and so on. This is typical of social networks that allow leverage from offline relationships (e.g. facebook, linked in, etc.). A person only ever has at most one recognized identity – all other identities that get linked to it become just another aspect of that single recognized identity.

In pseudonymous transparency level you have a persistent online identity, but there is a clear break between it and your offline life – no real name or other contact or demographic info, personally identifying information is deliberately minimized, etc. You can participate in social networks, have a reputation, build relationships, and so on, you just can’t build anything using your offline resources. A person can have any number of pseudonymous identities.

In anonymous transparency level not only is there no connection to your offline life, there isn’t even a persistent online identity. Actions and interactions are stateless – you have no reputation or connections to give you weight, nor to weigh you down, and each anonymous action/interaction is distinct from others. (note: if it’s not distinct then the transparency level is actually pseudonymous, not anonymous). Anonymous actions either are not identified (e.g. a comment that does not require a username to be given, or that is made from a general, public account), or have a one-use identity (e.g. a comment using a throw-away account).

These transparency levels aren’t general states, but instead describe the knowledge-of-identity relationship between one of your online identities and someone or something else. For example, one could be recognized to a given group, have multiple pseudonymous identities in another community, and be totally anonymous in all other online actions. An antagonist is something that tries to shift towards recognized the transparency level of that relationship between you and it.

An antagonist has three operations at its disposal: coalescing, clarifying, and cohering. Coalescing is linking two or more online identities into a single one. In coalescing transparency is dominated by recognition – any online identities that are linked become an single identity which has the privacy transparency level of the most recognized of the pre-linked identities. Clarifying is increasing the transparency of an identity via research an analysis (e.g. checking metadata, searching posted info for clues, etc.). Cohering is increasing the association of an online identity via research and analysis (e.g. device fingerprinting, social network comparisons word use patterns, etc.).

These operations form a fully connected positive feedback network – the result of any operation makes all subsequent operations easier. Three key consequences are that the transition from anonymous or pseudonymous to recognized can happen very quickly, that small footholds can easily lead an antagonist to recognition, and that protecting privacy means dealing with all three operations. This in turn means that building and using a truly, robustly private/not-recognized online identity requires extreme discipline.

What does an LMS offer? March 8, 2016

Posted by ficial in brain dump, Instructional Technology, LMS.
add a comment

The spectrum of LMS use can be divided into three general realms: administrative, organizational, and pedagogical. Administrative functions fulfill needs relating to the non-academic overhead of giving and receiving an education: managing course enrollments, providing communication channels, collecting assignments, scheduling, etc. Organizational functions improve the management of information and resources related to instruction: providing access to information whenever it’s needed, placing information into useful, usable groups, searching available information, handle various formats, etc. Pedagogical functions expand and refine the tools that instructors and students can use in the teaching and learning process: asynchronous discussion forums, auto-graded quizzes with immediate feedback, limited-audience authoring experiences, collaborative creation projects, etc.

In the administrative realm, an LMS integrated with other campus information systems offers a clear improvement over doing things piecemeal and by hand. This gives a better experience for students and significantly helps instructors through greater reliability (and auditability), increased consistency of experience, reduced non-academic workload, fewer distractions, some automation, and enhanced versions of traditional tools. Making use of the administrative aspects of an LMS requires little to no work or input on the part of instructors or students; an LMS provides a large net gain in the administrative realm.

In the organizational realm, an LMS allows an instructor better control over how students receive their information for the course, more flexibility in adding, removing or rearranging information, a wider array of information that be be offered, the ability to front-load information management work (allowing more efficient use of time), and preservation of the information associated with the course. For students, an LMS offers a single place where course information can be found and processed, access to that information whenever it’s needed, and the presentation / arrangement of that information that the instructor has determined is most effective. For an instructor to make use of the organizational aspects of an LMS requires him or her to find or to provide information in an electronic format and to organize and to present that information using the tools in the LMS; for instructors an LMS typically provides somewhere from a small loss to a moderate gain in the organizational realm, depending on the information being managed and the instructors comfort with the tools. For a student, taking advantage of the organizational aspects of an LMS requires little to no work or input; for students an LMS provides a large net gain in the organizational realm.

In the pedagogical realm an LMS can offer tools and techniques that would otherwise be impossible or impractical. These can be subdivided into enabling technologies (such as the ability to deliver video, electronic slideshows, etc.) which allows an instructor or student to extrapolate traditional pedagogical methods into the digital world, and alternate technologies (such as asynchronous forums, virtual environments, instant assessment, etc.) which allow exploration of new pedagogical models. Making effective use of the pedagogical aspects of an LMS requires lots of hard work on the part of both instructors and students, but teaching and learning takes a lot of hard work without an LMS as well; the net gain or loss in the pedagogical realm is highly situation-dependent, though an instructor would not invest the time in pursuing this use of an LMS in a particular situation unless he or she had a reasonable expectation of a net gain.

Specific functionality could fall into more than one realm, depending on how students and instructors apply it.  While a given application could be assigned to a given realm, this realms concept might be more useful in a comparative role; e.g. posting a course reading is more pedagogical than posting a syllabus and more organizational than discussing the reading in a forum.

Due to its set of functionality in the administrative and organizational realms and to its integration with other campus infrastructure, an LMS is very useful  for campus organizations and groups in addition to course-oriented academic purposes – ideally an LMS works well for organizations and groups.

Mastery levels of programming December 16, 2015

Posted by ficial in brain dump, techy.
add a comment

Programming language and/or framework levels/stages of understanding/mastery:

  1. black box – little to no understanding
  2. code comprehension – can look at source code and understand what a given piece of code is doing
  3. code adjustment – can make small changes to source code to adjust existing behaviors
  4. program comprehension – can look at a collection of code and understand (roughly) what the program does and how it works
  5. program adjustment – can alter the behavior of the program as a whole
  6. program extension / code creation – can add new behavior to an existing program
  7. program creation – can write a new program from scratch
  8. code evaluation – can distinguish between good and bad code (and why)
  9. program evaluation – can distinguish between good and bad program design choices (and why)
  10. meta evaluation – can discern the areas/ways in which the language/framework is good and bad (and why)
  11. meta adjustment – can make changes to the language / framework
  12. meta creation – create new language / framework

Also, this is worth a read: https://www.reddit.com/r/math/comments/1mtian/mathematicians_of_reddit_is_there_some_point/cccitg2

Google Web Forms Mail Merge(-ish) April 3, 2015

Posted by ficial in brain dump, javascript, software, techy.
Tags: , , , ,
1 comment so far

TLDR: Skip to the end of this post to get the final app script function. Otherwise, onward….

I just finished up a fun project: getting a submit from a google web form to be emailed to someone in a nicely formatted template as a PDF (and archived in one or more google folders). It turns out that I’m not the first to want this kind of thing, and I was lucky enough to find an excellent post on this at http://tjhouston.com/2012/03/merge-info-from-google-forms-to-pdf-document-and-send-via-email/. TJ outlines the general process/approach, but uses APIs that are now deprecated, is missing a step that took me a bit to figure out (my guess is that google has changed things a bit since that post was written), and implements a slightly different code model than I ended up using.

application process / design outline

At a high level, the components used are a form, the underlying/linked sheet, a doc that’s used as a template, and a folder in which to archive the results (though if you just want to delete the results that’s easy enough). Also, since we’ll be sending the resulting PDF’s via email we’ll need a google account from which to send them. The end user fills out and submits the form. Google’s black box copies the form data into the underlying sheet, AND (this is key) fires the ‘onFormSubmit’ event for that sheet. The mail merge part of this is some custom code – it copies a template sheet, fills in the template using data from the form, saves it in the archives folder, and sends out a PDF copy. I’ll walk through creating a basic system, and hopefully you can subsequently adapt that to whatever you need.

pieces (docs & folders in drive)

The first step is creating the form. Log in using whatever account you want to be sending out the emails, go to drive, and create a new form:

gdoc_mailmerge_01_google_web_form

Once you’ve done that click the ‘View live form’ link (right-hand side of the secondary tool bar) and fill out the form and submit it, just to make sure things are working the way you expect.

gdoc_mailmerge_02_google_web_form_live

Then hop back to the form creation window and click ‘View responses’, choosing to create a new sheet. You should then be able to open that sheet and see your response.

gdoc_mailmerge_03_google_web_form_first_response

(NOTE: responses are actually _copied_ into the sheet – the form itself also records the responses, but they’re not individually visible from the form; generally this distinction is irrelevant, but when you’re resetting a form to clear out test data after development you should both delete the rows in the spreadsheet and delete the responses in the form (‘Delete all responses’ under the ‘Responses’ menu when editing the form)).

Now create your mail merge template document

gdoc_mailmerge_04_mail_merge_template

and your folder for holding/archiving the mail-merged responses, optionally with sub-folders (I’m doing this demo with sub-folders because it’s more complex – easier to go from a complex demo to a simpler implementation than the reverse).

gdoc_mailmerge_05_archive_folders

At this point all the pieces are in place and “all” that we have to do is code up the actual merge part. Luckily Google provides some tools that make this pretty easy. Hop back to the sheet view of the form responses and choose ‘Script editor’ from the ‘Tools’ menu. This will pop up a dialog for starting a new scripting project. Choose the first option – ‘Blank project’.

gdoc_mailmerge_06_script_editor_menu_action

gdoc_mailmerge_07_new_script_project

gdoc_mailmerge_08_new_script_project_initial

Be sure to save it at this point! You’ll need to pick a name as well, though it will be automatically tied to the sheet that is storing the form responses regardless of the name you choose.

pseudocode

In brief, here’s the plan
1) create a very simple function
2) verify that it runs from the script editor
3) add code to make a copy of the template document and put in in the archives folder
4) verify that copying a document works, and grant permissions to do its thing
5) hook the function up to the form submit event
6) verify that the function runs when the form is submitted
7) add code for a simple search and replace, and verify it works
8) show all the form data in a basic string dump, and verify it works
9) fill out the template correctly using the form data, and verify it works
10) send out an email of a PDF of the filled template, and verify it works
11) put extra references or copies of the filled template in the appropriate archive sub-folders, and verify it works

Here goes…

1) create a very simple function

Google scripting is really just javascript, but the context in which it’s executed means that some things aren’t available (e.g. alert, console.log, etc.). I’ve had limited success in finding good introductions to it, but http://www.i-programmer.info/programming/other-languages/4600-getting-started-with-google-app-script.html provides a reasonable overview. You might also check out the official google pages at https://www.google.com/script/start/ and https://developers.google.com/apps-script/overview.

Our initial very simple function does two main things: establishes the name of the function, and demonstrates how to generate simple debugging output. Delete all the initial code that’s shown in the project and replace it with:

function formSubmitHandler() {
  Logger.log('my formSubmitHandler runs!');
}

gdoc_mailmerge_09_script_simple_function

and save it.

2) verify that it runs from the script editor

Then run it by clicking the run button or by selecting the function name from the ‘Run’ menu.

gdoc_mailmerge_10_script_run_menu

 

or

gdoc_mailmerge_11_script_run_button

Now look at the logs to verify that it worked.

gdoc_mailmerge_12_script_logs_menu

gdoc_mailmerge_13_script_logs_shown

3) add code to make a copy of the template document and put in in the archives folder

Now we’re getting to the fun stuff :) To access the template document we’ll need its ID, and to put the copy in the archives folder we’ll need the ID of that folder as well. Actually, since this demo is using archive sub-folders we’ll need the ID of the ALL folder in there. You can get the relevant IDs by going to the document or folder in your browser and copying it from the URL bar:

gdoc_mailmerge_14_doc_id_template

gdoc_mailmerge_15_id_folder

store those in variable in the script:

var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

We’ll also need a name for the new file:

var newFileName = 'TEST_new_doc';

To actually create the new file we’ll use one of the services that come with App Script – DriveApp. This is a javascript object that “Allows scripts to create, find, and modify files and folders in Google Drive.” See https://developers.google.com/apps-script/reference/drive/drive-app for details. In general the reference docs are good, as long as you know what you’re looking for. Basically, we use the ID of the template to get that file, then make a copy with a new name (‘newFileName’) in a new location (the folder for ‘archiveFolderId_ALL’).

var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);

All together it looks like this:

function formSubmitHandler() {
  Logger.log('my formSubmitHandler runs!');

  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
}

4) verify that copying a document works, and grant permissions to do its thing

Run it now. It will tell you that it needs authorization to run – go ahead and grant that access.

gdoc_mailmerge_17_permision_needed_alert

gdoc_mailmerge_18_permision_needed_dialog

After the script runs you should now be able to go to your archive folder and see the new document.

 

gdoc_mailmerge_19_new_test_file_in_folder

Now delete that file – the next step will be creating a new one and having multiple around with the same name gets confusing (later we’ll add unique tags to distinguish them).

5) hook the function up to the form submit event

Hop back to the script editor. Then under the ‘Resources’ menu select the ‘All your triggers’ item.

gdoc_mailmerge_20_triggers_menu

Delete anything that’s there (click the X box on the left of each item) until the triggers list is empty:

gdoc_mailmerge_21_triggers_empty_list

then click on the link to add a new trigger. In the ‘Run’ column you should see the name of your function. Under the ‘Events’ column, select ‘From spreadsheet’, then set the next field to the right to ‘On form submit’.

gdoc_mailmerge_22_triggers_form_submit

Save it – you’ll need to give additional authorization.

6) verify that the function runs when the form is submitted

Now hop over to your form – the live view – fill it out and click the Submit button. In a few seconds a new TEST_new_doc should show up in your archives folder. Once you’ve verified that everything is working delete that test doc.

7) add code for a simple search and replace, and verify it works

To do this we need access to the actual stuff inside the copy. We have the copy as a file, but we need it opened as a document, and then need access to the body of the document.

var copyDoc = DocumentApp.openById(copyFile.getId());
var copyBody = copyDoc.getActiveSection();

Once we have the body we use the replaceText function to find one of the KEY_blahblahblah string in the copy and replace it with whatever value we like. After than the document is saved and closed.

copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT');
copyDoc.saveAndClose();

We can also remove the logging line at this point. Altogether our function now looks like:

function formSubmitHandler() {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT');

  copyDoc.saveAndClose();
}

Now you MUST run this in the script editor before trying an actual form submission – this is necessary to grant authorization. If you just try to submit a form right away then nothing useful will happen.

After that go to the form again and submit another response. Open up the resulting document and see that ‘TEST_REPLACEMENT’ shows up (and go ahead and open your original template and verify that the KEY_ is still there – i.e. make sure we didn’t somehow run the replace on our template instead of our copy).

gdoc_mailmerge_23_first_replacement

8) show all the form data in a basic string dump, and verify it works

To get the actual form data we need a parameter added to our function to catch the info that the form is sending.

function formSubmitHandler(evt) {
...
}

Then we’ll use another service to see what info is available to us – convert the event to a JSON string and put it in the copy so we can inspect it.

copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));

However, if you run this from the script editor you’ll get nothing useful, because the evt only has info if there’s an actual form submission. So we add a conditional to handle that case:

if (evt) {
  copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));
} else {
  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
}

The full function now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  var newFileName = 'TEST_new_doc';
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  if (evt) {
    copyBody.replaceText('KEY_timestamp', JSON.stringify(evt));
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }

  copyDoc.saveAndClose();
}

Make another form submit and check out the resulting doc – you should see something like:

gdoc_mailmerge_24_event_data_replacement

That’s easier to understand when it’s formatted a little more cleanly:

gdoc_mailmerge_25_event_data_formatted

The key take-aways from this are that the form data is available in two ways (namedValues and values), and that there is a timestamp available even though the form doesn’t explicitly indicate that. Further, the namedValues data structure labels things by the text of the question in the form, and the values data structure shows things in the order that the form displays them. I prefer to use namedValues because I think it’s a little more robust and self-documenting in the long run. However, the timestamp there doesn’t have a real label. To put one there edit the reponses sheet and add a label at the top of the column that has the timestamps. Now when you submit the form and look at the data you’ll see something like

gdoc_mailmerge_26_sheet_no_timestamp_label

gdoc_mailmerge_27_sheet_with_timestamp_label

gdoc_mailmerge_28_event_data_with_timestamp_label

9) fill out the template correctly using the form data, and verify it works

Keep a copy of the JSON dump somewhere as a reference. Now we can do the actual filling out of the form, as well as make the file name for each submission unique. First, the file name. We could use the TIMESTAMP value from the form submission, but it’s actually easier just to do the data calculation right in the javascript (which also allows us to have unique filename even when the code is run from the script editor). However, we do want to get the user’s name from the form data, if it’s available.

var newFileName = 'TEST_new_doc';
if (evt) {
  newFileName = 'from ' + evt.namedValues['Name'];
}
newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');

The next step is the substitution in of the form values. Rather than doing this individually I prefer building a more general structure. We’ll create a map that links each key in the template to the corresponding form question. Then we can loop through the map to do the substitutions – this allows future changes to the form and/or template to be accounted for in the data portion of our script rather than execution portion, which greatly eases maintenance and reduces bugs. This map could be keyed using the form questions instead of the KEY_ strings, but using the KEY_ strings makes it easier to put appropriate placeholders in the copy.

// map KEY_ strings in the template to
var standard_keys_lookup = {
  'KEY_timestamp':'TIMESTAMP',
  'KEY_person':'Name',
  'KEY_reason':'Why are you interested in this?',
  'KEY_description':'This is...'
};
// handy access for looping-
var standard_keys = Object.keys(standard_keys_lookup);

Then the actual substitution looks like:

if (evt) {
// loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
  for (var i=0; i<standard_keys.length; i++) {
    var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
    if (! v) {
      v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
    }
    copyBody.replaceText(standard_keys[i], v);
  }
} else {
  copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
}

That loop is might need a bit of explaining… The ‘standard_keys’ array contains the keys of the map – i.e. ‘KEY_timestamp’, ‘KEY_person’, etc. in no particular order. When executing the loop the current key is ‘standard_keys[i]’. The corresponding question text is found by using that key in the lookup table – the current question text is standard_keys_lookup[standard_keys[i]]. The actual form value is found by using the question text as a key in the evt.namedValues map – evt.namedValues[standard_keys_lookup[standard_keys[i]]]. The empty string is appended to that to force the type to be a string. Then if that is false (i.e. has no value), instead we use the placeholder ‘nothing given for ‘+standard_keys_lookup[standard_keys[i]] (which will evaluate to something like ‘nothing given for “Name”‘ when this is executed). Whatever that value is then is used to do the replacement – copyBody.replaceText(standard_keys[i], v).

The full function at this point looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';

  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');

  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);

  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();

  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };

  // handy access for looping-
  var standard_keys = Object.keys(standard_keys_lookup);

  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }

  copyDoc.saveAndClose();
}

With this approach the form questions can be freely rearranged without breaking anything, and even new questions can be added without breaking anything. To add a new question to the merge process, just put a KEY_ string in an appropriate place in the template and add a ‘KEY_’:’Question text’ entry to standard_keys_lookup.

Run a test from the script editor, then do a form submission. You should get unique file names, and the contents should be appropriately filled in.

gdoc_mailmerge_29_timestamped_file_names

gdoc_mailmerge_30_copy_file_with_replacements

10) send out an email of a PDF of the filled template, and verify it works

If you don’t want PDF’s emailed out then this can be skipped. Otherwise it goes after the document saveAndClose call. To create the PDF we again use the handy tools that Google provides – this is REALLY easy, especially compared to all the hoops one has to jump through to get other systems to do this kind of thing.

var pdfCopy = copyFile.getAs("application/pdf");

Building and sending the mail is similarly straightforward. I put the subject and body in separate variables for clarity.

var subject = "Mail Merge Demo Submission - "+newFileName;
var body = "Here is "+newFileName;
MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});

The full function now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';
  
  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');
  
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
  
  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();
  
  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };
  
  // handy access for looping-
  var standard_keys = Object.keys(standard_keys_lookup);
  
  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }
  
  // don't forget to save all those changes!
  copyDoc.saveAndClose();
  
  // send out the PDF copy
  var pdfCopy = copyFile.getAs("application/pdf");
  var subject = "Mail Merge Demo Submission - "+newFileName;
  var body = "Here is "+newFileName;
  MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});
}

At this point you MUST run it from the script editor again to grant it the authorization to send out emails. You should then get an email with an attached PDF!

gdoc_mailmerge_31_email_received

gdoc_mailmerge_32_pdf_version

Another form submission at this point should result in you getting an appropriately filled out PDF.

VICTORY!

11) put extra references or copies of the filled template in the appropriate archive sub-folders, and verify it works

This is another optional step. For many situtations it’s unnecessary, but in the few where it’s needed it’s very, very nice. In my case I was building a system to support on-line job applications. One of the form questions let the user check off all the jobs for which they wanted to apply. Putting the copy in folders split by job saved a LOT of time and hassle for the people evaluating the applications. An important note here is that Google Drive is not quite like the hard drive on your computer. The ‘folders’ that Google shows are really more like tags on the files. Unless you explicitly make a copy of a document, putting it in a folder just adds another tag to it, which means that when/if the document is deleted it disappears from ALL the folders at once. In some situations this may be exactly the desired behavior. In others you may want to put a real copy in each folder so that the actions in one folder don’t impact the contents of another folder.

To specify in which folder the filled template will be placed we use another lookup:

var subfolder_values_lookups = {
  'awesome':'0Bzufe4iO7j5HSWRDQmNJd3hoRkE',
  'rad':'0Bzufe4iO7j5HaXlKbGEzblVKTG8',
  'cool':'0Bzufe4iO7j5HZE8yZzdtNk9uLTA',
  'other':'0Bzufe4iO7j5HbWRYbnN0RUNsbnM'
};

// handy access for lookups
var subfolder_values = Object.keys(subfolder_values_lookups);

Then it’s just a matter of inspecting the appropriate form field(s) value(s). In this example we’re using the value of a multi-select checkbox question. Looking back at the JSON output of the event it appears that the values are joined together using commas (not sure how it handles values that have commas in them, though I’m guessing no differently, which means processing that situation would require a bit more work (probably via sub-string checking rather than the method I’m using here)). So, I’ll split the values of that field into an array and loop through it, doing the sub-folder placement as needed. There’s a bit of trickiness where I constrain values to ones that are avaialable in my lookup table keys.

// create an array of the values chosen for the multi-checkbox question
var form_values = (evt.namedValues['This is...']+"").split(",");

for (var i=0; i<form_values.length; i++) {
  var lookup_key = form_values[i];

  // constrain non-standard keys to 'other'
  if (subfolder_values.indexOf(lookup_key) < 0) {
    lookup_key = 'other';
  }

  subfolder_values_lookups[lookup_key].addFile(copyFile);
}

All of this need to be placed in a condition that makes sure there’s an event. This code could be put inside the existing check, but I find the function as a whole easier to read/understand (and not too much more complicated to maintain / extend) if I put this stuff at the end and just wrap the section in its own, identical conditional.

The completed function

This copies the template and fills it out, sends out a PDF, and puts the filled copy in appropriate sub-folders, now looks like:

function formSubmitHandler(evt) {
  var templateDocId = '1-cllBTvj2qYiaii69VgO7X37mSo12KbOoBDPACVlsik';
  var archiveFolderId_ALL = '0Bzufe4iO7j5HRU0wSFljNHN3RHM';
  
  // set up the new file name, using the form data if it's available, and adding a timestamp to ensure uniqueness
  var newFileName = 'TEST_new_doc';
  if (evt) {
    newFileName = 'from ' + evt.namedValues['Name'];
  }
  newFileName += ' ' + Date().split(' ').splice(1,4).join(' ');
  
  var archiveFolder = DriveApp.getFolderById(archiveFolderId_ALL);
  
  // make a copy of the template in the archive folder and set up the variables that let us manipulate it
  var copyFile = DriveApp.getFileById(templateDocId).makeCopy(newFileName,archiveFolder);
  var copyDoc = DocumentApp.openById(copyFile.getId());
  var copyBody = copyDoc.getActiveSection();
  
  // map KEY_ strings in the template to
  var standard_keys_lookup = {
    'KEY_timestamp':'TIMESTAMP',
    'KEY_person':'Name',
    'KEY_reason':'Why are you interested in this?',
    'KEY_description':'This is...'
  };
  
  // handy access for looping
  var standard_keys = Object.keys(standard_keys_lookup);
  
  if (evt) {
    // loop through the keys, in each case subbing in the form value or a placeholder indicating that the form value is missing
    for (var i=0; i<standard_keys.length; i++) {
      var v = evt.namedValues[standard_keys_lookup[standard_keys[i]]] + "";
      if (! v) {
        v = 'nothing given for "'+standard_keys_lookup[standard_keys[i]]+'"';
      }
      copyBody.replaceText(standard_keys[i], v);
    }
  } else {
    copyBody.replaceText('KEY_timestamp', 'TEST_REPLACEMENT no form submit');
  }
  
  // don't forget to save all those changes!
  copyDoc.saveAndClose();
  
  //--------------------------
  // send out the PDF copy
  var pdfCopy = copyFile.getAs("application/pdf");
  var subject = "Mail Merge Demo Submission - "+newFileName;
  var body = "Here is "+newFileName;
  MailApp.sendEmail("youremail@address.com", subject, body, {htmlBody: body, attachments: pdfCopy});
  
  //--------------------------
  // put in sub-folders
  if (evt) {
    var subfolder_values_lookups = {
      'awesome':DriveApp.getFolderById('0BziO7j5HSufe4WRDQmNJd3hoRkE'),
      'rad':DriveApp.getFolderById('0Bzufe4iO7lKbGEzblj5HaXVKTG8'),
      'cool':DriveApp.getFolderById('0Bzufe4iO7j5HZEtNk98yZzduLTA'),
      'other':DriveApp.getFolderById('0BzufebnN0R4iO7j5HbWRYUNsbnM')
    };
  
    // handy access for lookups
    var subfolder_values = Object.keys(subfolder_values_lookups);
    
    // create an array of the values chosen for the multi-checkbox question
    var form_values = (evt.namedValues['This is...']+"").split(",");
    
    for (var i=0; i<form_values.length; i++) {
      var lookup_key = form_values[i];
    
      // constrain non-standard keys to 'other'
      if (subfolder_values.indexOf(lookup_key) < 0) {
        lookup_key = 'other';
      }
   
      subfolder_values_lookups[lookup_key].addFile(copyFile);
    }
  }
 
}