Friday, December 25, 2009

10.1 Basics of Scripting



[ Team LiB ]






10.1 Basics of Scripting


The world of programming languages can be divided in to two major categories: compiled languages and interpreted languages.[1] The source for a compiled program is stored in one or more files, and then a special program called the compiler translates those files into instructions for the processor. Those instructions are stored in a binary file. The text files can be thrown away if you wish; the binary file is all that is required to execute the program. The program has to be compiled only once, at what's called "compile time." After that the binary executable can be run repeatedly.

[1] Actually, the distinction between compiled and interpreted does not have to be inherent in the language itself, just to the way the program is run. Some languages can be either compiled or interpreted, and yet others, like Java, are somewhere in between. Typically, though, a given language tends to be considered a compiled language or an interpreted language based on the most common use.


Most of the programs you run on a Unix system are compiled programs. You can check with the file command:





Solaris% file /usr/bin/date
/usr/bin/date: ELF 32-bit MSB executable SPARC Version 1, ...


The text "ELF" and "SPARC" are tipoffs that the program is compiled.


An interpreted program, in contrast, is stored in one more files that are fed to a program called an interpreter, which decodes the program lines and executes the appropriate instructions on the fly. The program is interpreted every time it is run, and both the interpreter and script file must be present each time you wish to run the program. The term "script" or "shell script" always implies an interpreted language. The Perl and Bourne shell languages described here are both interpreted languages.


If you run the file program on a script, it will produce output like this:





Solaris% file test.pl
test.pl: executable /usr/bin/perl script


What are the practical differences between compiled and interpreted languages? Compiled languages tend to run faster while interpreted languages are usually slower. Compiled languages produce a binary file that can be run on only machines of the same processor architecture and operating system, whereas an interpreted file can run on any machine. If the tool you need requires a good deal of performance (speed, memory, or anything else), a script is probably not the best way to create it. But if you need a low-performance program that can be created quickly and run anywhere, a script is just the right thing.



10.1.1 Running a Script


On Unix operating systems, there are two ways to cause a script to be interpreted. One is to invoke the interpreter program, passing it the filename of the script to be interpreted. For example, if you have a file called testscript with the following one line:





echo "Hello world"


it can be run as:





Solaris% sh testscript
Hello world


Note that sh is the interpreter program for the Bourne shell. In this example, the sh program reads the file testscript and takes action based on its contents. It would be nice if a script could be run like a normal program, though, without requiring you to specify the name of the interpreter first. This way, people using the script do not need to remember which language it was written in and they can type one less command. The operating system will let you do this if you place a special syntax at the beginning of the file. If testscript contains the two lines:





#!/bin/sh
echo "Hello world"


and you give it execute permissions:





Solaris% chmod u+x testscript


it can now be run directly:





Solaris% ./testscript
Hello world


When the kernel tries to execute testscript, it notices the #! as the first two characters, and upon finding them, it runs the interpreter named later on the line, feeding it all following lines of the file. It is also perfectly acceptable to give arguments to the interpreter on that special line:





#!/bin/sh -n
echo "Hello World"


The -n option is used simply for illustration; it instructs the Bourne shell to read the commands but not execute any of them. This is useful for checking the syntax of a script.


In many languages, including the Bourne shell and Perl, any line beginning with # is considered a comment and ignored by the interpreter. The first line is always treated as a special case by the operating system. Be aware that if you create an executable script file but do not include an initial #! line, the operating system will default to using the Bourne shell interpreter. You should not engage in this practice yourself, but you should at least recognize it when you come across it. For example, operators will sometimes give execute privileges to system startup scripts so that they can be invoked without the initial sh on the command line.




10.1.2 Naming Conventions


Often you will find that scripts are named with a suffix that reflects the language the script is written in. Bourne shell scripts commonly end in .sh and Perl scripts commonly end in .pl. This is not a requirement, and just as often, you may find scripts named without a suffix.




10.1.3 Local and Environment Variables


Every running program has an "environment" associated with it. The environment is a list of variables and their values. The command env will print the current environment:





Linux% env
PWD=/var/tmp/
XUSERFILESEARCHPATH=/usr/athena/lib/X11/app-defaults/%N
PAGER=less
VERBOSELOGIN=1
...


Scripting languages typically have a way to modify environment variables, but it is important to understand the distinction between modifying an environment variable and modifying a variable local to the program. If you modify an environment variable, the value will be passed on in the environment of other programs run from your script. If you modify a local variable, it has no impact on other programs.







    [ Team LiB ]



    No comments: