On git and separation of function May 31, 2011Posted by ficial in brain dump, techy.
As I’ve been getting[back] up to speed on Ruby on Rails I’m also forcing myself to learn and use a number of the tools common in the Rails community. One of these is the version control system git. Git is a relative newcomer to the version control universe, and approaches things a bit differently than the other, older systems that I’ve used or know a bit about (cvs, subversion, and perforce). Essentially, all these older systems have a central storage of information; you put information (code usually, though it could be any textual data), retrieve it as needed, make further developments, put it back, and the system tracks changes over time and manages multiple people working on the same data. An interesting and useful side effect of this approach is that all the important code is in one place despite the fact that many different people may be working on it. This means there’s a single place/system that can be backed up, and thus in the interests of efficiency the source control system doubles as a data protection system. I’d been so used to thinking in that framework that the two functions, source control and back up, had become nearly synonymous in my mind.
Git operates on a different model. I don’t understand it well enough to be able to provide a good description. However, the relevant aspect here is that there is no central repository; a given project may have 10’s, 100’s, 1000’s or more separate repositories and any one with the right permissions can move data into or out of them. Essentially, each person working on a project that uses git has their own, individual repository; data can be copied from your repository to a different one, but none is any more central/primary than another (at least, not by built-in constraint; in practice there’s usually a recognized master repository, BUT it’s a separate repository from the one in your own development environment). One side effect of this is a clear separation of the functional needs of source control and data back up.
When I use git to work on a project I don’t retrieve code from some where and then return it with the changes I’ve made. I COPY (‘clone’) the code from somewhere and continue on in my own little world (‘branch’ or ‘fork’) until I explicitly merge whatever I’ve done back into the master (or some other branch). While I’m working with my copy I have all the features of source control (mostly a lot of meta data about when and how the data changes, along with the ability to get back to earlier versions), but I don’t “automatically” get data back up. Git provides source control and source control alone. If you need your data backed up as well as controlled, you need to do so explicitly (and there are a number of options – e.g. it’s one of the features of using a service like github).
This separation of function is not necessarily better on a technical level (you gain flexibility via decoupling, but you lose a bit on the convenience side), but it feels… cleaner conceptually. Overall it’s been a good reminder that habit of thought doesn’t necessarily match what actually is or could be.