Wednesday, November 11, 2009

Hack 42. Speed Up Compiles










Hack 42. Speed Up Compiles




While compiling, make full use of all of your computers with a distributed compiling daemon


Many other distribution users make fun of the Gentoo fanboys, because Gentoo users have to spend a lot of time compiling all of their code. And even though these compiles can take hours or days to complete, Gentooists still tout their distribution as being one of the fastest available. Because of their constant need to compile, Gentoo users have picked up a few tricks on making the process go faster, including using distcc to create a cluster of computers for compiling. distcc is a distributed compiling daemon that allows you to combine the processing power of other Linux computers on your network to compile code. It is very simple to set up and use, and it should produce identical results to a completely local compile. Having three machines with similar speeds should make compiling 2.6 times faster. The distcc home page at http://distcc.samba.org has testimonials concerning real user's experiences using the program. Using this hack, you can get distcc to work with any Linux distribution, which will make compiling KDE and GNOME from scratch quick and easy.



distcc does not require the machines in your compile farm to have shared filesystems, synchronized clocks, or even the same libraries and headers. However, it is a good idea to make sure you are on the same major version number of the compiler itself.




Before getting started with distcc, first you must know how to perform a parallel make when building code. To perform a parallel make, use the j option in your make command:



dbrick@rivendell:$ make j3; make j3 modules



This will spawn three child processes that will make maximum use of your processor power by ensuring that there is always something in the queue to be compiled. A general rule of thumb for how many parallel makes to perform is to double the number of processors and then add one. So a single processor system will have j3 and a dual processor system j5. When you start using distcc, you should base the j value on the total number of processors in your compiling farm. If you have eight processors available, then use j17.



4.15.1. Using distcc


You can obtain the latest version of distcc from http://distcc.samba.org/download.html. Just download the archive, uncompress it, and run the standard build commands:



dbrick@rivendell:$ tar -jxvf distcc-2.18.3.tar.bz2
dbrick@rivendell:$ cd distcc-2.18.3
dbrick@rivendell:$ ./configure && make && sudo make install



You must install the program on each machine you want included in your compile farm. On each of the compiling machines, you need to start the distccd daemon:



root@bree:# distccd daemon N15
root@moria:# distccd daemon N15



These daemons will listen on TCP port 3632 for instructions and code from the local machine (the one which you are actually compiling software for). The N value sets a niceness level so the distributed compiles won't interfere too much with local operations. Read the distccd manpage for further options.


On the client side, you need to tell distcc which computers to use for distributed compiles. You can do this by creating an environment variable:



dbrick@rivendell:$ export DISTCC_HOSTS='localhost bree moria'



Specify localhost to make sure your local machine is included in the compiles. If your local machine is exceptionally slow, or if you have a lot of processors to distribute the load to, you should consider not including it at all. You can use machine IP addresses in place of names. If you don't want to set an environment variable, then create a distcc hosts file in your home directory to contain the values:



dbrick@rivendell:$ mkdir ~/.distcc
dbrick@rivendell:$ echo "localhost bree moria" > ~/.distcc/hosts



To run a distributed compile, simply pass a CC=distcc option to the make command:



dbrick@rivendell:$ make j7 CC=distcc



It's that simple to distribute your compiles. Read the manpages for distcc and distccd to learn more about the program, including how to limit the number of parallel makes a particular computer in your farm will perform.




4.15.2. Distribute Compiles to Windows Machines


Though some clever people have come up with very interesting ways to distribute compiles to a Windows machine using Cygwin, there is an easier way to perform the same task using a live CD distribution known as distccKnoppix, which you can download from http://opendoorsoftware.com/cgi/http.pl?p=distccKNOPPIX. Be sure to download the version that has the same major version number of gcc as your local machine.


To use distccKnoppix, simply boot the computer using the CD, note it's IP address, and then enter that in your distcc hosts file or environment variable as instructed earlier. Happy compiling!


David Brickner













No comments: