<< Previous Page :: Next Page >>
Versioning Builds
There are two schools of thought on this subject:
- Why save something when you can just rebuild it? According to this school of thought, you should never version the binary results of a build. It simply waste space.
With a line oriented text file, that is pretty easy to do. Imagine you have a 100 line program, you modified three lines, deleted two lines, and added four new lines. If I had the original 100 line version, all I need to store are the instructions on how to get from the older version to the newer version - just nine changes. If I store two versions of a binary file, I must store a complete copy of each version. If I produce 500 megabytes of binaries with every build, and I am doing five builds per week, I am adding 2 1/2 gigabytes of storage per week or about 125 gigabytes of storage per year. I can quickly overwhelm any version control system if I don't have a policy that helps me reclaim obsolete binaries. This becomes another management headache.
Plus saving build binaries can lead to bad build practices or covers up bad build practices. If I know I have to rebuild my compiled code each and every time, I make sure that my build practices allow me to do just that.
The Subversion development team does not believe in storing products of the build process, and this shows up in the design philosophy behind Subversion: Subversion has no command or easy mechanism for removing versions of files.
In fact, much of the open source community is against storing products of the build process which is why most open source software is distributed strictly as source.
- Why build something when you can save it under version control? I personally lean in this camp, and the reason maybe because I work mainly in non-open source environments with very large teams of developers and multiple products. Our developers may depend upon the pre-built libraries created by their fellow developers. As part of my build process, I compile these libraries and allow other developers to use them. Yes, these developers could build their own libraries, but why should I assume that the developer will select the right code and version of the files to build?
By distributing the binaries of the libraries, I can ensure that each developer is using the same set of files. Another advantage is that storing the output of the build process means everyone knows where the official copy of the release is stored. Plus, I can use the power of my version control system to keep up with the binaries.
This is one of the areas where ClearCase excels. In ClearCase, I can take a built binary, and ClearCase will give me the names of the files used in the build, but also the versions, build scripts, and environmental settings. But. that is only true if the file never leaves ClearCase's storage area.
Another advantage is that I am pretty sure my version control area is being backed up. A storage area for the built files may not be backed up.
The big disadvantage is that you have to keep cleaning out obsolete and unimportant versions of your builds. For example, you might not be interested in built files older than two weeks old as long as those files haven't been sent to clients or to QA. This means determining which versions you want to keep and which to throw out. Again, ClearCase makes this very easy. Under ClearCase, the rmver (Remove version) command won't by default delete a version of a file if it is labeled. If you're using ClearCase, all you have to do is delete the labels that are no longer important to you, and run the rmver command.
Perforce allows you to remove obsolete versions of a file via the obsolete command. The problem is that Perforce will delete interesting and uninteresting versions of files. The best way to handle this is to use branches for environments. Any version that is QA'd should be moved onto the QA branch. Any built version that is ready for customer usage should be placed on the distribution branch. This way, you can remove old builds from the development versions without worrying that you might delete version that is sitting on 353 customer sites.
Another more philosophical question is how can you send something out to production when a different version of the file is in Ohio. If you store your build output, and QA tests what is stored and likes it, you know exactly what was tested. If you have to rebuild your output for production, how can you be sure that is what QA really tested.
In an era of cheap disk space, storing the results of a build is not a terrible waste. Yes, you have to have one more management headache -- finding the obsolete revisions and removing them on a regular basis. But, it isn't that difficult to implement such a task.
Hire me already: I'm bursting with talent
More fun with Cascading Style Sheets...
Internet Explorer and Opera don't quite agree with the way the Netscape, Firefox, and Safari display CSS commands. So, I converted my resume from index.html to index.php and separated the style sheet from the webpage. I now have two style sheets: One for IE and Opera, and another for the rest of the browser community. The tiny PHP script determines which type of broswer you have, and loads the correct CSS file. The result is that the Resume now looks good whether you use Firefox or IE.
So, just to post my resume on line, I've worked with HTML, CSS, and PHP. And, of course, I didn't use one of those Insta-Webpage Webpage layout programs that make more of a mess than their worth. Nope, I used a plain text editor and hewed out the code myself.
I'm a regular Swiss Army Knife of the Geek Patrol.
Lookin' Good!
I've been getting a lot of calls about new positions, so I decided it is time to freshen up my online resume. I decided to use CSS to do the heavy formatting and to hew the whole thing in VIM.
I've been quite familiar with HTML for quite a while, and you can do a bit of formatting in HTML, but not too much. HTML originally started as a Markup Language which means tags were simply a way to mark important stuff on a page. For example, the H1 tag was a major heading, so an indexing system should think that this might be important information how the page was arranged. In fact, there were a whole bunch of tags that were used in writing out definitions, etc.
The original intent of HTML is that the author didn't format the page for looks as much as for content. The browser would decide how they wanted to see the page, and what was important to them. This lasted for about 34 seconds before most people realized that there really wasn't much in the way of content on the Internet. Very quickly, web pages started being formatted for looks. Netscape was the first culprit with the center tag. This was the first tag that stated specifically how a particular line should be formatted.
Soon, specialized tags for setting fonts, alignment, etc. bloomed into HTML and pretty much rendered it useless. Each browser had its own set of proprietary tags, and even different versions of the same browser would render the same webpage differently.
Cascading Style Sheets or CSS was suppose to take care of this. In CSS, HTML tags are stripped of formatting information. Creating your basic HTML content is once again fairly straight forward.
Formatting is now done by specifying how you'd like each tag displayed in a separate style sheet (which can be included in the webpage). Changing the style sheet changes the look of all webpages based upon the style sheet. It's why you can choose different skins for most blogs.
Take a look at the source code of my Resume. First ignore the stuff between the <style> and </style tags and find where the body of the webpage begins. It's all simple HTML codes (except for the class specifier in a few tags). If you set your browser to ignore the style sheet, you'd still see the resume in a very basic layout.
There is an excellent on line introductory tutorial on CSS that will take you through much of the basics. Once you get the basics, you'll need a pocket reference like O'Reilly's CSS Pocket Reference (part of the Nutshell series of books). Or, if you need more help, take a look at O'Reilly's entire CSS collection.
Using Version Control for Automatic Deployment
Introduction
About half the places I worked tend to be financial institutions who do what is known as internal development. That is, instead of producing software for customers, they are producing software for their own use. However, this article can also be for any organization that runs their own website or uses scripts written in Perl, Visual Basic, Python, etc. for system administration.
The Power of Version Control Systems
Version Control Systems are great ways of tracking the history of files. You can easily see the changes that have taken place from one version to another. When combined with good defect tracking practices (or when tightly integrated with a defect tracking system which is used in a correct manner), you can even track why changes took place. Most version control systems can help you see who wrote what line and when (see Perforce's p4 annotate command or Subversion's svn blame command).
Yes, with a good version control system, you can track any change, and answer the questions of who made a change, what change was made, when the change was made, and why the change was made. Too bad this great tracking falls completely apart when you deploy the change. At that point, especially for non-compiled code like scripts and websites, it becomes extremely difficult to know what was changed and why on deployed systems. Fortunately, you can use your version control system to do automatic deployment for you. By using your version control system for deployment, you can extend the powerful tracking mechanisms offered by your version control systems to your deployment. This includes deployment to your QA systems, production systems, and to all your various environments.
Perforce
The examples given are done via Perforce, and Perforce is probably one of the best tools for this particular task. Perforce's sync mechanism is rapid and takes very little overhead, so it will not slow down your system. Perforec allows you to use all of the power of the version control system inside the production directory, so you could do diffs, examine the history of files, and answer any questions without first creating a separate workspace.
Perforce also allows for complex remapping of the source archive directory structure to the production directory structure. After all, there are a lot of times when the file layout in development might not match your file layout in production. Perforce even allows you to merge two different source directories into a single production directory. For example, I have a development directory with my webpage source code. I also have separate directories for files that differ between each environment. These can be configuration files, but might also include webpages that differ slightly between different environments.
Another advantage of Perforce is that all of your files are set automatically to read-only. This prevents the temptation of editing the files that are in production without going through version control. Plus, Perforce permissioning scheme will even prevent someone from marking the file as editable and therefore, writeable.
And, whether you use Unix or Windows, you can use the P4V Perforce Visual Client GUI.
Subversion and CVS
Both Subversion and CVS can be used for this task, but there are a few caviets. Subversion and CVS add tracking directories to their checkouts. (The CVS and .svn directories). This could be solved by doing an import instead of a checkout. Unfortuately, that means you lose much of the version control tracking power in these directories. You still have the automatic deployment, but you lose most of the other functionality that makes using automatic version control deployment so nice. Just configure your application to ignore these tracking directories.
Subversion and CVS also don't allow you to easily remap your source directory layout to your production layout. You can do it, but it might take multiple checkout commands. And, you do not have the ability to merge two separate source directories into a single production directory.
Another problem is that the standard checkout in Subversion and in CVS sets the files as read/write. This means someone could manually edit a file without going through the version control system. The best thing to do is to set the files to read only once the checkout is complete.
Again, nothing prevents you from using Subversion nor CVS. It'll take a bit more work, but it is possible, and if you're doing this on a Windows system, using CVS Tortoise or SVN Tortoise give you a nice powerful GUI front end.
ClearCase
ClearCase is the only version control system that actually has the built in ability to handle this type of work. ClearCase dynamic views can be treated as read-only NFS mounts. When the ClearCase view changes, the NFS mount changes. The synchronization between is complete, automatic, and instantanious. There are just a few problems.
First of all, ClearCase doesn't allow the remapping of the ClearCase directory layout. You can setup symbolic links inside the ClearCase view to replicate the production structure, but that can be tedious. Instead, you'll probably have to settle to have your development structure match your deployment structure.
Also ClearCase NFS mounts are truely read only. Your program can't create temporary files inside of the mount, so you may have to modify your program to get around this issue. And, although the NFS mount is live, it isn't a version control view. In order to answer questions about what is in the directory, you have to startup a separate ClearCase instance with the mapped view. Finally, this is an NFS mount which means that if you are using Windows, you have to get an NFS package like Hummingbird Maestro in order to get everything to work. And, forget about using the C:\Inetpub\wwwroot directory.
The Configuration
You should have a separate Perforce user for each instance and each environment. For example, you have a web page for your New York and London offices. You also have a QA group that tests this page. For this one web page, it is best to have three different users. One for QA production, one for New York production, and one for London production. These users will have read access only to the directories and branches that they have permission to see. They should not be able to do a p4 edit on the files in their directory. And, these users should not have access to any other files in Perforce. You don't want to open up any security holes.
In fact, you might even consider setting up a separate Perforce server for these users, and using remote depots to export the files you want these users to see. That will prevent the user from perusing changelists, user lists, and other information that you want to keep only in your development environment.
In your development environment, you want to setup separate branches for each environment, for example, I might have this directory setup for Project Foo's setup:
//Depot/Foo/MAIN/... #Main Development Environment
//Depot/Foo/QA/... #QA's Testing Environment
//Depot/Foo/NEW_YORK/... #New York's Production Environment
//Depot/Foo/LONDON/... #London's Production Environment
When I want to deploy something to QA, I can run this command:
$ p4 integrate Depot/Foo/MAIN/... Depot/Foo/QA/...
$ p4 resolve -at //...
$ p4 submit
The p4 integrate copies all the developer's files to the QA branch. The -at flag on the resolve command tells Perforce to accept whatever is on the source branch and not to do an actual merge. This is usually what you want. The p4 submit command checks in the changes onto the QA branch. Your QA website will automatically update itself with the latest changes.
Setting Up the Perforce Client
You want to setup a Perforce client in the root of the directory tree that you want to use for your production files. This client should be owned by the Perforce user who has access to this directory and branch. This does not necessarily have to be the same user as the owner of the directory where the files will actually be deployed. In fact, there is probably an advantage to not having these as the same user.
For example, I want to setup my webpage in the /usr/local/apache/htdocs directory. I need to put the files from Depot/Foo/QA/... into this directory, plus the special environment files located in Depot/Foo/Special/QA/... My Perforce Client would look something like this:
Client: qa-web-client
Root: /usr/local/apache/htdocs
Options: noallwrite clobber nocompress unlocked nomodtime rmdir
View:
Depot/Foo/QA/... qa-web-client/...
+Depot/Foo/Special/QA/... qa-web-client/...
Now, a simple p4 sync will automatically update my installation to the latest version on QA.
The Sync Mechanism
Now that I have my client setup, and my Perforce depot setup, I need a way of automatically sychronizing my Perforce depot. How does this work?
First problem is logging into Perforce. You could run the Perforce commands that do the sync to tell Perforce which client, user, server, and password to use. However, Perforce provides a simple mechanism for this: The P4CONFIG environment variable. Set this variable to .p4rc. If you have a Windows system, you can do this in your System Control Panel. In Unix, you can do this either in /etc/profile, or in the user's .profile file.
Now use this to set P4USER, P4CLIENT, and P4PORT. Whenever you're in this directory, yor Perforce user, client, and port will be automatically set. The only other question is how to send your password. You could set your password using the P4PASSWD in the P4CONFIG file, but it will be visible to everyone.
I recommend that you create a Perforce group and put all of your autosync users in this group. You won't use this group in your Perforce tables, but you could setup the Timeout value to zero. This means, once you're logged into Perforce, you will create a Perforce login ticket that will never expire. This ticket can be set, so it is only valid on a particular machine. This is the most secure way of setting up this user. You can use the P4TICKETS environment variable in your P4CONFIG file to point to this ticket. This ticket should be read only, and only visible to the user who will be running the p4 sync command on that machine.
Okay, client is setup, the Perforce ticket is set, you've got the Perforce depot set correctly and the permissioning is set. Now, what is the Perforce mechanism that automatically does my sync'ing?
- Well, there isn't one built into Perforce itself. Fortunately, it isn't too difficult to create one. The easiest way is setting up a Unix cronjob or a Windows Schedule Task that runs every minute. This task will automatically change into the directory in question, have the P4CONFIG environment variable set, and do a
p4 sync.It isn't the most elegent of systems, but it is fairly efficient. Perforce syncs have very low overhead, and that is especially true if there is nothing to sync. Although the synchronization isn't instantanious, it does happen pretty fast after the commit that you probably won't notice. Plus, if you want to only do changes after business hours, you can set your cronjob to only run after business hours.
- A more elegent solution and works under Apache's http server is to user Perforce's Webkeeper. This Apache module will automatically synchronize your webpage whenever the directory in your view changes. In fact, it is a fairly simple mechanism to use even if you're not using a webpage. Just setup an Apache httpd process that doesn't serve any webpages. Then you can use this same mechanism for SQL and server maintenance scripts.
- Another choice is to use a product like ICmanage. ICmanage is an entire Perforce server setup. It includes hardware, hot backups, fail over, and autosychronization software. It may be a bit of an overkill for simple auto synchronization, but if you really don't have a strong Perforce setup (Our Perforce server is Bob's desktop system. I think Bob backs it up to CDs every once in a while.), it might be worth considering.
- Finally, there's the roll your own approach. I wrote a simple Perl test synchronization manager in a few hours. The heart of the system was a server that took two different clients. Client #1 was a Perforce trigger whenever someone does a submit. It simply sent a one line message that said "Hey! Someone submitted a change.". Client #2 were all the machines that needed to synchronize to the Perforce server. All this sync-server did was relay the message to the various clients that they should do a sync. After playing around with it for a few minutes, I've decided to go with the cronjob approach.
So, What Does This Buy You?
Imagine you have this entire automatic syncronization system setup. Next time a new release is ready, it will automatically be put on your server. No more worrying if the release was or was not installed, and whether it was installed correctly. You can also see who did the install by looking at the integration history in Perforce.
Imagine that the new release has a bug in it. You go to your web page directory and do a p4 diff. Now you see what files were changed from the last release. You can also see who authorized the change, and why this change took place. Once you track down the file with the defect, you can find out which developer changed it, and why.
Is this a quick fix? If it is, you can log into Perforce as yourself, make the fix, on the development branch, and integrate it on your environment branch. With in a minute, it is now on your system.
Can't be fixed, and want to revert to your previous release? Several things you can do. Change your Perforce client to use a new branch. Then integrate your previous release to this new branch. Once the defect is straighten out, you can switch back to your original environment branch. Meanwhile, other changes from QA can still be integrated into your new release branch.
Is Your Passover Matzoh ISO 9000 Certified?
Is your organization ISO 9000 certified? I bet it is, and I bet management is pretty darn proud of that fact. Every few months, your whole organization is thrown into a full panic because the "certification people" are mucking around making sure all of your I's are dotted and T's crossed. You've probably been to a couple of meetings and you're given a review of what to say to these "certification people". It's a great achievement and certainly you must have an excellent CM process. But, if that's true, why does our software stink?
Next Passover, go down the aisle with all the Passover products. Go ahead, I won't tell anyone. Find the white box of Yehuda branch Matzoh, and take a look at the side of the box. There on the side of the box is their ISO 9000 certification. That's right, your fancy shmancy up-to-date-and-modernly-with-it company is following the same certification process they do when they bake this cardboard-like substance. The only difference is that Yehuda successfully got their product to market and it does what it is suppose to do. Chances are you can't make the same claim.
My sons go to a private school, and every year, they take something called the TerraNova test. The school loves to brag about their students' performance. My kids always score in the top 5% of all of their subjects. Both the school and I as a parent should be proud. I hate the whole thing.
I hate it because I am paying the school for two weeks teaching the kids how to do well on this test. I am paying the school, so that their curriculum matches what is taught on this test. I am not against the test itself. Achievement tests can help the school understand whether or not they are teaching their students. It can help them evaluate their methods and their teachers. But, this is not what happens.
The purpose of the school is to teach the students a wide variety of subjects. The students should be trained to think and solve problems. They should learn to communicate and write. They need to learn to learn and to love learning. And, achievement tests like the TerraNova can do this. Unfortunately, instead of the TerraNova test helping the school become better, it has subverted the school, so the main purpose of the school is to get their students to do well on the TerraNova test.
There is nothing intrinsically wrong with following a process. After all, a lot of CM is establishing well known processes, so people know what they are suppose to be doing and how to communicate with each other. And, there is nothing wrong with adopting an outside process for your needs. It might take you years to figure out how to make sure that a change in your code gets tracked. Or, you may believe you have a repeatable process for doing development. Then, discover you don't when something goes wrong. A well tested and developed process like ISO and CMMI can really help there.
The problem is when our business becomes following the process and not producing software. Sure, gather requirements and track them. Take metrics. Use a $100,000 defect tracking to squash your bugs. But, keep your eyes on the prize: You're in the software business. If your software doesn't do what your users need, all the certifications in the world won't help.
The best CM process I ever was involved in was a company called ExecuFlow Systems. They produced medical software. We had no defect tracking system. We didn't even have a version control system. Instead, we logged all of our changes defects in a book. What made the process so good? The president of the company really believed in producing a quality product, and this trickled down into the management and the workers.
Nothing got out the door until it was thoroughly tested. We had user groups to find out what people liked and didn't like about our software. We tracked each change on paper, and anyone who messed up would hear from the President of the company himself. We did six to eight releases per year (and this was back at a time when a company may have made only one revision every two years).
We would have never been ISO 9000 certified. We probably didn't even make CMMI Level 1. But, we did produce excellent software. Because of that, we had an excellent reputation, and were one of the biggest medical billing systems in the Northeastern United States.
I left right before the company was sold to a software conglomerate. This company was going to blend our software into their development methodology. After all, they were ISO 9000 certified and we weren't. I have no idea what happened after that. The local office was closed, and all of the people, the developers, the trainers, the QA people, the engineers, and all of the people who dedicated their lives to producing the best software they could moved onto other companies.
The doctors who use to use our product moved on too. All that's left is a piece of paper gathering dust in some forgotten filing cabinet stating that we were once ISO 9000 certified.