Skip to content | Skip to menu | Skip to search

Blog All Title

Tagline for Blog All

Categories: Version Control Systems, ClearCase, Perforce, Subversion

Using Version Control for Automatic Deployment

Introduction

About half the places I worked tend to be financial institutions who do what is known as internal development. That is, instead of producing software for customers, they are producing software for their own use. However, this article can also be for any organization that runs their own website or uses scripts written in Perl, Visual Basic, Python, etc. for system administration.

The Power of Version Control Systems

Version Control Systems are great ways of tracking the history of files. You can easily see the changes that have taken place from one version to another. When combined with good defect tracking practices (or when tightly integrated with a defect tracking system which is used in a correct manner), you can even track why changes took place. Most version control systems can help you see who wrote what line and when (see Perforce's p4 annotate command or Subversion's svn blame command).

Yes, with a good version control system, you can track any change, and answer the questions of who made a change, what change was made, when the change was made, and why the change was made. Too bad this great tracking falls completely apart when you deploy the change. At that point, especially for non-compiled code like scripts and websites, it becomes extremely difficult to know what was changed and why on deployed systems. Fortunately, you can use your version control system to do automatic deployment for you. By using your version control system for deployment, you can extend the powerful tracking mechanisms offered by your version control systems to your deployment. This includes deployment to your QA systems, production systems, and to all your various environments.

Perforce

The examples given are done via Perforce, and Perforce is probably one of the best tools for this particular task. Perforce's sync mechanism is rapid and takes very little overhead, so it will not slow down your system. Perforec allows you to use all of the power of the version control system inside the production directory, so you could do diffs, examine the history of files, and answer any questions without first creating a separate workspace.

Perforce also allows for complex remapping of the source archive directory structure to the production directory structure. After all, there are a lot of times when the file layout in development might not match your file layout in production. Perforce even allows you to merge two different source directories into a single production directory. For example, I have a development directory with my webpage source code. I also have separate directories for files that differ between each environment. These can be configuration files, but might also include webpages that differ slightly between different environments.

Another advantage of Perforce is that all of your files are set automatically to read-only. This prevents the temptation of editing the files that are in production without going through version control. Plus, Perforce permissioning scheme will even prevent someone from marking the file as editable and therefore, writeable.

And, whether you use Unix or Windows, you can use the P4V Perforce Visual Client GUI.

Subversion and CVS

Both Subversion and CVS can be used for this task, but there are a few caviets. Subversion and CVS add tracking directories to their checkouts. (The CVS and .svn directories). This could be solved by doing an import instead of a checkout. Unfortuately, that means you lose much of the version control tracking power in these directories. You still have the automatic deployment, but you lose most of the other functionality that makes using automatic version control deployment so nice. Just configure your application to ignore these tracking directories.

Subversion and CVS also don't allow you to easily remap your source directory layout to your production layout. You can do it, but it might take multiple checkout commands. And, you do not have the ability to merge two separate source directories into a single production directory.

Another problem is that the standard checkout in Subversion and in CVS sets the files as read/write. This means someone could manually edit a file without going through the version control system. The best thing to do is to set the files to read only once the checkout is complete.

Again, nothing prevents you from using Subversion nor CVS. It'll take a bit more work, but it is possible, and if you're doing this on a Windows system, using CVS Tortoise or SVN Tortoise give you a nice powerful GUI front end.

ClearCase

ClearCase is the only version control system that actually has the built in ability to handle this type of work. ClearCase dynamic views can be treated as read-only NFS mounts. When the ClearCase view changes, the NFS mount changes. The synchronization between is complete, automatic, and instantanious. There are just a few problems.

First of all, ClearCase doesn't allow the remapping of the ClearCase directory layout. You can setup symbolic links inside the ClearCase view to replicate the production structure, but that can be tedious. Instead, you'll probably have to settle to have your development structure match your deployment structure.

Also ClearCase NFS mounts are truely read only. Your program can't create temporary files inside of the mount, so you may have to modify your program to get around this issue. And, although the NFS mount is live, it isn't a version control view. In order to answer questions about what is in the directory, you have to startup a separate ClearCase instance with the mapped view. Finally, this is an NFS mount which means that if you are using Windows, you have to get an NFS package like Hummingbird Maestro in order to get everything to work. And, forget about using the C:\Inetpub\wwwroot directory.

The Configuration

You should have a separate Perforce user for each instance and each environment. For example, you have a web page for your New York and London offices. You also have a QA group that tests this page. For this one web page, it is best to have three different users. One for QA production, one for New York production, and one for London production. These users will have read access only to the directories and branches that they have permission to see. They should not be able to do a p4 edit on the files in their directory. And, these users should not have access to any other files in Perforce. You don't want to open up any security holes.

In fact, you might even consider setting up a separate Perforce server for these users, and using remote depots to export the files you want these users to see. That will prevent the user from perusing changelists, user lists, and other information that you want to keep only in your development environment.

In your development environment, you want to setup separate branches for each environment, for example, I might have this directory setup for Project Foo's setup:

//Depot/Foo/MAIN/... #Main Development Environment
//Depot/Foo/QA/... #QA's Testing Environment
//Depot/Foo/NEW_YORK/... #New York's Production Environment
//Depot/Foo/LONDON/... #London's Production Environment

When I want to deploy something to QA, I can run this command:

$ p4 integrate Depot/Foo/MAIN/... Depot/Foo/QA/...
$ p4 resolve -at //...
$ p4 submit

The p4 integrate copies all the developer's files to the QA branch. The -at flag on the resolve command tells Perforce to accept whatever is on the source branch and not to do an actual merge. This is usually what you want. The p4 submit command checks in the changes onto the QA branch. Your QA website will automatically update itself with the latest changes.

Setting Up the Perforce Client

You want to setup a Perforce client in the root of the directory tree that you want to use for your production files. This client should be owned by the Perforce user who has access to this directory and branch. This does not necessarily have to be the same user as the owner of the directory where the files will actually be deployed. In fact, there is probably an advantage to not having these as the same user.

For example, I want to setup my webpage in the /usr/local/apache/htdocs directory. I need to put the files from Depot/Foo/QA/... into this directory, plus the special environment files located in Depot/Foo/Special/QA/... My Perforce Client would look something like this:

Client: qa-web-client
Root: /usr/local/apache/htdocs
Options: noallwrite clobber nocompress unlocked nomodtime rmdir
View:
Depot/Foo/QA/... qa-web-client/...
+Depot/Foo/Special/QA/... qa-web-client/...

Now, a simple p4 sync will automatically update my installation to the latest version on QA.

The Sync Mechanism

Now that I have my client setup, and my Perforce depot setup, I need a way of automatically sychronizing my Perforce depot. How does this work?

First problem is logging into Perforce. You could run the Perforce commands that do the sync to tell Perforce which client, user, server, and password to use. However, Perforce provides a simple mechanism for this: The P4CONFIG environment variable. Set this variable to .p4rc. If you have a Windows system, you can do this in your System Control Panel. In Unix, you can do this either in /etc/profile, or in the user's .profile file.

Now use this to set P4USER, P4CLIENT, and P4PORT. Whenever you're in this directory, yor Perforce user, client, and port will be automatically set. The only other question is how to send your password. You could set your password using the P4PASSWD in the P4CONFIG file, but it will be visible to everyone.

I recommend that you create a Perforce group and put all of your autosync users in this group. You won't use this group in your Perforce tables, but you could setup the Timeout value to zero. This means, once you're logged into Perforce, you will create a Perforce login ticket that will never expire. This ticket can be set, so it is only valid on a particular machine. This is the most secure way of setting up this user. You can use the P4TICKETS environment variable in your P4CONFIG file to point to this ticket. This ticket should be read only, and only visible to the user who will be running the p4 sync command on that machine.

Okay, client is setup, the Perforce ticket is set, you've got the Perforce depot set correctly and the permissioning is set. Now, what is the Perforce mechanism that automatically does my sync'ing?

  1. Well, there isn't one built into Perforce itself. Fortunately, it isn't too difficult to create one. The easiest way is setting up a Unix cronjob or a Windows Schedule Task that runs every minute. This task will automatically change into the directory in question, have the P4CONFIG environment variable set, and do a p4 sync.

    It isn't the most elegent of systems, but it is fairly efficient. Perforce syncs have very low overhead, and that is especially true if there is nothing to sync. Although the synchronization isn't instantanious, it does happen pretty fast after the commit that you probably won't notice. Plus, if you want to only do changes after business hours, you can set your cronjob to only run after business hours.

  2. A more elegent solution and works under Apache's http server is to user Perforce's Webkeeper. This Apache module will automatically synchronize your webpage whenever the directory in your view changes. In fact, it is a fairly simple mechanism to use even if you're not using a webpage. Just setup an Apache httpd process that doesn't serve any webpages. Then you can use this same mechanism for SQL and server maintenance scripts.
  3. Another choice is to use a product like ICmanage. ICmanage is an entire Perforce server setup. It includes hardware, hot backups, fail over, and autosychronization software. It may be a bit of an overkill for simple auto synchronization, but if you really don't have a strong Perforce setup (Our Perforce server is Bob's desktop system. I think Bob backs it up to CDs every once in a while.), it might be worth considering.
  4. Finally, there's the roll your own approach. I wrote a simple Perl test synchronization manager in a few hours. The heart of the system was a server that took two different clients. Client #1 was a Perforce trigger whenever someone does a submit. It simply sent a one line message that said "Hey! Someone submitted a change.". Client #2 were all the machines that needed to synchronize to the Perforce server. All this sync-server did was relay the message to the various clients that they should do a sync. After playing around with it for a few minutes, I've decided to go with the cronjob approach.

So, What Does This Buy You?

Imagine you have this entire automatic syncronization system setup. Next time a new release is ready, it will automatically be put on your server. No more worrying if the release was or was not installed, and whether it was installed correctly. You can also see who did the install by looking at the integration history in Perforce.

Imagine that the new release has a bug in it. You go to your web page directory and do a p4 diff. Now you see what files were changed from the last release. You can also see who authorized the change, and why this change took place. Once you track down the file with the defect, you can find out which developer changed it, and why.

Is this a quick fix? If it is, you can log into Perforce as yourself, make the fix, on the development branch, and integrate it on your environment branch. With in a minute, it is now on your system.

Can't be fixed, and want to revert to your previous release? Several things you can do. Change your Perforce client to use a new branch. Then integrate your previous release to this new branch. Once the defect is straighten out, you can switch back to your original environment branch. Meanwhile, other changes from QA can still be integrated into your new release branch.

This site works better with web standards! Original skin design courtesy of Tristan NITOT.