7.2. Version Control with SubversionThe purpose
This is very important for source code. When source code files are stored in a shared folder, for example, it is easy for changes to be lost when more than one programmer is working on the software. Even a single programmer, working on source code that resides in a folder on his computer, can run into problems. He might make a large change, only to realize that he needs to roll it back. Or he may find that he's got several copies of the code on his hard drive and laptop, and that each of them contains a different set of changes that all need to be integrated. Subversion is a free and open source version control system. It is available from http://subversion.tigris.org for many operating systems and platforms, including Linux, BSD, Solaris, BeOS, OS/2, Mac OS X, and Windows. Subversion provides many advanced features for bringing source code under control, but it takes only a few basic commands for a programming team to use a simple version control repository effectively. The examples below are based on Version 1.2, the latest version available at the time of this writing. The remainder of this chapter assumes that this version (or later) is installed on the programmer's machine. The Subversion installation procedure does not install a graphical user interfaceall functions are accessed through command-line programs. There are graphical utilities and web interfaces that work with Subversion, including TortoiseSVN (http://tortoisesvn.tigris.org), ViewCVS (http://viewcvs.sourceforge.net), and SmartSVN (http://www.smartcvs.com/smartsvn). This is the only section in this book in which a specific software package is recommended. However, Subversion is not the only option for version control. Other popular version control systems include:
7.2.1. Multiple People Can Work on One FileProgramming teams that do not use a modern version control system usually have a rule that only one person at a time can work on any given file. Sometimes this rule is enforced by using an antiquated version control system, which allows only one person to check out a file. Some teams use a master folder to store all of the source code, and individual programmers must rename files that they are using or move them temporarily to a "checkout" folder. There are version control systems that essentially function as an automated checkout folder. A system like this requires that a programmer check out a file from the repository before modifying it, and keeps track of the checkout status of each file in the repository. The system prevents anyone from modifying any file until it is checked back in. This is known as the "lock-modify-unlock" model, and it is very intuitive to many programmers. However, it is a very restrictive way to manage code, and can cause delays and problems for the team. One worst-case scenario that some teams encounter is a schedule delay caused by having many programming tasks that must occur on a single piece of code, when that code can be updated by only a single person at a time. For example, programmers might run into trouble when they must add behavior to a complex window in the software that contains many tabs and controls, all of which reside in a single file. If they are only using a folder to store the code, only one person can work on the file at a time. Even if the modifications themselves are relatively straightforward, this process could take a lot of time, and may even require a senior developer to be pulled in to perform the work as quickly as possible (which could lead to extra delays because he can't be assigned to any other task until this one is done.) When a team has to work with large files that contain a lot of code that is not under control, or that is checked into a lock-modify-unlock version control system, it runs into "unavoidable" delays because only one person can edit each file at a time. Adopting a modern version control system like Subversion is one effective way to fix this problem. Subversion allows multiple people to work on a single file at the same time, using an alternative model called "copy-modify-merge." In this model, a file can be checked out any number of times. When a programmer wants to update the repository with his changes, he retrieves all changes that have occurred to the checked out files and reconciles any of them that conflict with changes he made before updating the repository. (See below for details about how copy-modify-merge works.) Many programmers who have never worked with a copy-modify-merge version control system In practice, the opposite is true: it turns out that it is usually easy to merge changes, and that very few changes conflict. Code is almost always built in such a way that the functionality is highly encapsulated in functions. And even within individual functions, there are small, independent blocks of code. Even if two neighboring blocks are altered at the same time, it is rare for a conflict to be introduced. Copy-modify-merge is very efficient. Teams lose much more time from schedule bottlenecks caused by waiting for files that are checked out than they do from trying to figure out how to merge the changes. Giving multiple people the ability to merge changes into a shared repository is an important way to eliminate those bottlenecks. Version control systems based on the copy-modify-merge model have been used for years on many projects. This is especially true for large teams or teams in which the members are distributed over a large geographical areatheir work would grind to a halt waiting for people to check code back in. 7.2.2. Understanding SubversionThe Subversion repository contains a set of files laid out in a tree structure (similar to the file systems in most operating systems), with folders that contain files and other folders, as well as links or shortcuts between files. The main difference is that the Subversion file system tracks every change that is made to each file stored in it. There are multiple versions of each file saved in the repository. The files in the repository are stored on disk in a database, and can only be accessed using the Subversion software. A Subversion repository has a revision number. This revision number gets incremented every time a change is made to the repositorythis way, no revision number is used more than once. The standard convention for writing revision numbers is a lowercase "r" followed by the number of the revision; all output from Subversion will adhere to this convention. The Subversion repository keeps track of every change that is made to every file stored in it. Even if a file is modified by adding or removing lines, or if that file is removed entirely, all previous revisions still can be accessed by date or number. Subversion can also generate a list of differences between any two revisions of a file. For example, consider a repository that contains Revision 23 of the file cookies.txt in Table 7-1, which was checked in on March 4.
Subversion uses global revision numbers, which means that the revision number applies to the entire tree in the repository. Any time that a change is made to any file in the repository, the global revision number of the repository is incremented. Table 7-1 shows a file as it looked on March 4, when the repository was at Revision 23. Some people may refer to this as "Revision 23 of cookies.txt," but what they really mean is "cookies.txt as it appeared in Revision 23 of the repository." If someone updates a different file in the repository on March 5, the repository's revision number will increment to 24. When that happens, there is now an r24 of cookies.txtit just happens to be identical to r23. Now, let's say that time has passed since the file was checked in. It's May 13, and other people have added several other files to this repository, so the repository is now at r46. If someone (let's call her Alice) decides to check out cookies.txt and add the line that calls for chopped nuts, when she commits her change the repository increments to r47, and this revision, r47, will contain the new version of cookies.txt in Table 7-2.
The next time someone checks the tree that contains cookies.txt out of the repository, they will get r47 of the file. However, they can specifically ask for a previous version in one of several ways:
Any of these requests will yield the version of the recipe without the nuts in Table 7-1. Even if the file is later deleted from the Subversion repository, previous revisions are still available and can be retrieved by revision number or date. (This means that the file is never permanently deleted from the repository. When a file is "deleted," the repository simply no longer lists it in the current revision. If a user checks out an old revision from before the file was removed, the version of the file associated with that revision is still available.) 7.2.3. Check Out Code into a Working CopyBefore a Subversion repository can be accessed, it must be checked out. Checking out a repository simply means retrieving a snapshot of the repository and copying it to the user's local machine. The repository is not altered in any way when files are checked out. The local copy of the repository is called a working copy. This is the core of the copy-modify-merge model: any changes that the user needs to make are done in the working copy, which must be brought up to date before it can be checked back in.[*]
Everybody gets their own working copy; one person can have any number of working copies. When files are checked out from a repository, the Subversion client creates a new working copy for those files. Usually a programmer will check out only one copy of a given directory or tree at a time, but there are occasions when a programmer will have several working copies that contain different snapshots of different parts of the repository. (Only directories or trees can be checked outSubversion does not allow a programmer to check out a single file in a directory.) There can be many working copies of the same code, even on the same machine. Each working copy keeps track of both who checked out the files and when they were checked out. The user can retrieve the differences between the working copy and the revision that he had originally checked out at any time, even if the user's machine does not currently have access to the server from which the working copy was checked out. Additionally, a programmer can check out all of a repository, or just part of it. The checkout can include a single folder or an entire branch including all subfolders. (It can also include the entire repository, but in Subversion this is almost never done, because it can cause an enormous amount of data to be retrieved.) When the files are checked out, the latest revision of each file is copied to the working copy. When the programmer looks in the working copy, she sees all of the files and folders that she checked out, plus another hidden folder called .svn. This folder contains all of the data that Subversion uses to check for differences and to keep track of the state of the working copy. The programmer edits the files in the working copy, and, when she is satisfied that the files in the working copy are ready, she can commit them back to the repository. It is only during the commit that the repository is updated to look like the working copy. The commit is the step that the programmer takes to finalize the changes and integrate them back into the source code stored in the repository. When someone commits changes to the repository, Subversion updates only those files that have changed since the working copy was checked out. Once changes have been committed, they will be accessible to anyone looking at the repository. Subversion performs an atomic commit Since multiple people can each check out their own working copies, they can all work on the same file simultaneously. There is no limit to the number of times a file can be checked out at once; any part of the repository can be checked out into multiple working copies on multiple machines at the same time. If several people have files checked out of a repository, it is likely that there are earlier versions of some files in some of their working copies, and later versions in other ones. If the repository has not changed since the working copy was checked out, then the user can simply commit the change, and Subversion will update the repository. However, if the repository has changed, Subversion will not allow the user to commit the changes just yetit will tell the user that the working copy is out-of-date. At this point, it is up to the user to bring the working copy up-to-date by telling Subversion to update the working copy. Subversion will then retrieve any changes that have been committed to the repository since the working copy was checked out. Often, most of the changes that have been committed to the repository since the working copy was checked out are mergeable, meaning that none of the lines that changed in the repository have changed in the working copy. When this happens, Subversion will update all of the files in the working copy to reflect all of the changes that have been committed to the repository. However, if a conflict occurs, it is up to the user to resolve it. A conflict occurs when two people (let's call the second person Bob) introduce different changes to the same line in the same file in their separate working copies. A conflict occurs when Alice and Bob both need to make different changes to the same lines in the same file. They both check out the file and make changes to it, and Alice commits it back to the repository first. But when Bob attempts to commit the change to the repository, Subversion discovers that there is a conflict. That conflict must be resolved in Bob's working copy before it can be committed to the repository. To illustrate this idea, consider what would happen to the previous example (cookies.txt r47 in Table 7-2) if Alice committed her changes under different circumstances. Suppose she checked out r23 (see Table 7-1), but Bob checked in r38 (shown in Table 7-3) before she could commit her changes.
This version contains a change: Bob added chopped M&Ms to the recipe. This change occurred on April 26, a few weeks before Alice checked in r47. So when it was committed, Subversion did not complain; at the time, there were no conflicts. It simply allowed the change and was able to update the repository without requiring any input from the user. When May 13 rolls around and Alice tries to commit r47 in Table 7-2, Subversion discovers that the working copy is out of date (because the revision number of the repository is greater than it was when the working copy was checked out). She updates the working copy to incorporate any changes. When she does this, Subversion detects the conflict in line 4: the version in her working copy contains chopped nuts, while the version in the repository contains M&Ms. Alice will then be required to decide which version of that line should be kept; once she makes that decision, the file can be committed to the repository. (See below for more details about how she specifies this.) This is how the working copy allows two people to check out the same file, and even alter the same lines in that file. They did not have to coordinate with each other, and the first person was never even aware that a conflict occurred. It's up to each person to resolve the individual conflicts that arise. What's more, a full audit trail is available, so if it turns out that someone made the wrong choice, then that choice can be undone later and it will be possible to figure out who made the mistake. 7.2.4. Access the Subversion Repository Using a URLA Subversion repository exists on a computer as a folder. It's easy to recognize a Subversion repository: it usually contains subfolders named conf, dav, db, hooks, and locks. These folders contain different important elements of the repository: for example, the conf folder contains configuration files, the db folder contains a database that stores all of the files and revision history, and the hooks folder contains scripts that can be triggered automatically when Subversion performs certain actions. There are two commands that are installed with Subversion that programmers will typically use to access and maintain the repository. The repository administration tool svnadmin performs actions directly on a repository. It can create a new repository, verify its contents, make a copy of an entire repository, dump the repository to a portable format, recover from a database error, and perform other administrative tasks. svnadmin always works on a local directorywhen a user executes it, the repository is always passed as a path to a local folder, in whatever format the operating system expects. The command-line client tool svn is the main interface that most programmers use to check out and update files in the repository. The svn command-line client can access a repository using any of several different methods: accessing the folder directly on the hard drive or a shared drive, connecting to the svnserve server, using svnserve server over SSH, or using Apache to serve the repository via HTTP or HTTPS. The svn client uses a URL instead of a simple path, in order to differentiate between the different kinds of access methods. This way, the programmer using it does not need to change the way she works in order to accommodate different kinds of access methods. Table 7-4 shows the schema for each access method.
For the most part, Subversion's URLs use the standard syntax, allowing server names and port numbers to be specified as part of the URL. One difference is that the file:/// access method can be used only for accessing a local repositoryif a server name is given, it should always be file://localhost/. It should always be followed by the full path of the repository:
If the repository contains folders, those folders can be appended to the repository. (These are the virtual folders contained within the repository, not the physical folder that contains the repository database.) For example, to access /recipes/cookies.txt in the repository /usr/local/svn/repos, the client must be passed the following URL:
The file:/// URL scheme on Windows platforms uses the unofficially "standard" syntax for specifying a drive letter by using either X: or X|. Note that a URL uses ordinary slashes, even though the native (non-URL) form of a path on Windows uses backslashes. It is important to use quotes so that Windows does not interpret the vertical bar character as a pipe:
Most of the Subversion examples in this chapter will use UNIX-style command syntax and pathnames. 7.2.5. Create a RepositoryThe Subversion installation procedure does not create a repositorythat must be done using the svnadmin command. Use svnadmin create to create an empty repository:
The svnadmin command requires that the repository path be specified in the operating system's native format. On a UNIX system, the following commands will create a repository in the /usr/local/svn/repos/ directory:
On a Windows system, svnadmin uses a Windows-style path to create the repository. The following commands will create a repository in the c:\svn\repos\ directory. Note that this is a raw path, and not a file:///URL:
A shared folder on a network is all it takes to set up a repository that multiple programmers can use. A single programmer can do the same on her own machine to store her own work in a local repository. Once the empty repository is created, an initial set of files can be added to it using the svn import command, which allows files to be added to a repository without checking it out:
There are several ways that the directories in a repository can be structured. One easy way to do it is to have a root-level folder for each project. Each project folder contains three subfolders: trunk, tags, and branches. The working version of the source code is stored in the trunk folder. (The other two folders are reserved for more advanced version control activities, which are beyond the scope of this book.) For example, to import a project called hello-world that contains two files, hello-world.c and hello-world.h, they should be put into a folder:
Note that a log message reading "Initial import" was passed to svn using the -m flag. If this is omitted, Subversion will launch the default editor to edit the log message. The environment variable SVN_EDITOR can be used to change the default editor. (Many Windows users prefer to set this variable to notepad.) The Subversion repository is now ready for use. Additional files can be added either by using the svnadmin import command or by checking out a branch of the repository and using the svn add command. 7.2.6. Share the RepositoryA Subversion repository can be accessed by programmers using any of the access methods listed in Table 7-4. The simplest way to do this is to use a shared folder and the file:/// URL schema. However, this is not secure and may be problematic to share on an intranet or over the Internet. Running a server is very straightforward using the svnserve program. Many people think that the word "server" is synonymous with "administrative headache." In fact, setting up and running a Subversion server is simple, and can be done on any machine where a repository is installed. The following command will run a read-only server:
It works equally well on a Windows platform:
If this is running on a machine called mybox.example.com, the repository can be accessed using the URL svn://mybox.example.com. (Note: At the time of this writing, svnserve does not work with Windows 95/98/ME.) By default, the server is read-only, which means that it only allows users to retrieve files, not to commit changes. However, it does not take much work to turn on password authentication. First a password file must be created in the conf folder in the Subversion repository. In this example, the file passwords.txt contains the following lines:
Then the following lines must be added to the end of svnserve.conf in the same folder:
Now when the svnserve command listed above is executed, it will run a server that supports authentication. The username and password can be passed to svn on the command-line with the username and password flags. When the repository is checked out, the working copy remembers the authentication information, so the username and password only need to be supplied when files are checked out.
Note that in this example, since no destination folder name is given, it checks out the working copy into a folder named trunk, which corresponds to the path given in the URL. A third way to share files is using svnserve tunneled over SSH. All that is required here is that Subversion be installed on a computer that is running an SSH server. A programmer just needs to pass the correct URL schema to svn. Subversion establishes an SSH connection and executes svnserve on the server to connect to the specified repository. Note that this example uses the abbreviation co on the command line:
The http:// and https:// URL schemas require that an Apache server be set up to work with Subversion using WebDAV and mod_svn. (Configuring an Apache server to support access to a Subversion repository is beyond the scope of this book.) 7.2.7. The Subversion Basic Work CycleThe authors of Version Control with Subversion (O'Reilly) recommend a basic work cycle
7.2.7.1. Update the working copy of the codeWhen a programmer needs to modify the code for a project, the first thing to do is to bring his working copy up to date so that it reflects all the changes that are in the latest revision. If he does not yet have a working copy, he can check one out of the repository with the svn checkout command. The programmer supplies the URL to the repository and, optionally, a path to check it out into. If no path is specified, Subversion makes a new subdirectory in the current directory and checks it out there. The following action will check out the "trunk" branch from the example above:
Since the username andrew was given during the checkout, the working copy will remember that username. Before any changes are committed to the repository, it will ask the user for a password. To avoid this, the password flag can be used to specify the password. (It is important for the URL to contain the trunkfolderotherwise, Subversion will grab a copy of every branch and tag in the repository, which could be an enormous amount of data.) The third parameter passed to the svn command is the checkout folderin this case, the user specified hello-world in order to check out the contents of the trunk into a folder called hello-world. Without this parameter, Subversion would instead check out the working copy into a folder called trunk. The svn update command brings the working copy up-to-date. If someone has committed the file hello-world.c since it was checked out and that file has not been altered in the working copy, the following command will update it in the working copy:
The svn update command can also be called from anywhere inside the working copy. In that case, the path should be omitted from the command line:
The letter next to hello-world.c is a code to indicate what action Subversion performed for that file:
7.2.7.2. Make changes to the codeOnce the working copy is up to date with the latest revision, the programmer can make changes. Generally, these changes will be done using whatever editor or IDE the programming team has always used. Subversion does not require that the working copy is up to date in order for the user to make changes. The repository will generally change while the user is making changes; the programmer will merge these changes into the working copy before committing it back to the repository. Sometimes files need to be added. For example, the programmer might add a file called Makefile to the hello-world project. Subversion won't recognize the file if it is simply added, so the programmer must also let Subversion know that the file is there:
This tells Subversion that the file Makefile has been added to the working copy. The programmer must commit the working copy in order to add the file to the repository. The delete, copy, and move commands all work in a similar manner (see Table 7-6).
7.2.7.3. Examine all changesWhen the programmer is ready to commit all of his changes to the repository, he should use the svn status command to ensure that his working copy contains only the changes he intends.
The svn status command generates a list of all of the files in the working copy that have changed since it was checked out. It does not connect to the repository to do this. Instead, it uses the reference copy of the checked-out revision that was stored as part of the working copy. Each file listed by svn status has a one-letter code next to it. Table 7-7 contains a list of codes that svn status will return. (If the programmer passes a specific path to svn status, it only returns the status of that file.)
In addition to svn status, the programmer can use svn diff to see the specific changes that have been made to each file. The svn diff command can take a file parameter to generate a list of differences for that file; if no file is given, then it generates a list of differences for every file that was changed. The differences are displayed using the unified diff format. The svn revert command can be used to roll back changes. If the programmer issues that command and gives it the filename of a file in the working copy, then that file is overwritten with a "pristine" copy that is identical to one in the revision that was checked out of the repository. 7.2.7.4. Merge any changes made since the working copy was checked outBefore the changes can be checked in, the user should update the working copy. This causes any changes made to the repository since the working copy was checked out to be changed in the working copy as well. The programmer does this by using the svn update command. Most of the time, the svn update will automatically merge any changes that were made. But occasionally a programmer will have a change in his working copy that overlaps with another change that was made to the repository since he checked out. In the example above, Alice wanted to update cookies.txt and checked out r23 of the recipe (which contained neither nuts nor M&Mssee Table 7-1). She planned on adding nuts to the recipe but, before she could commit that change to the repository, Bob added M&Ms and committed r38 (see Table 7-2). When Alice attempted to commit her new revision (see Table 7-3), Subversion gave her the following message:
This told Alice that she needed to update her working copy in order to integrate any changes by issuing the svn update command:
When Subversion detects a conflict, it updates the file and marks the conflict using a string of greater-than and less-than signs. Table 7-8 shows the conflicts marked in cookies.txt after the update.
The text between <<<<<<<.mine and ======= indicates the changes that were found in the working copy. The text between ======= and >>>>>>>.r38 indicates the conflicting changes that were found in r38 (see Table 7-3). It is up to the user to choose one or the other of these changes. The user can also come up with a way to use both of themfor example, indicating that the chopped M&Ms are optional. Once the user has resolved the changes, the svn resolved command is used to indicate that the conflict has been resolved:
Now Alice can commit the changes to the repository. 7.2.7.5. Commit the changesOnce the working copy has been updated and all of the conflicts have been resolved, it is time to commit the changes by using svn commit. In the example above, Alice would issue the following command:
Note: Additional information on version control and working with Subversion (including branches, tags, and setting up Apache to work with Subversion) can be found in Version Control with Subversion by Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato (O'Reilly, 2004). It can be downloaded free of charge or browsed online at http://svnbook.red-bean.com/. |
Friday, November 13, 2009
Section 7.2. Version Control with Subversion
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment