Monday, October 26, 2009

sort








 

 










sort



You're familiar with the basic operation of sort:





$ sort names

Charlie

Emanuel

Fred

Lucy

Ralph

Tony

Tony

$



By default, sort takes each line of the specified input file and sorts it into ascending order. Special characters are sorted according to the internal encoding of the characters. For example, on a machine that encodes characters in ASCII, the space character is represented internally as the number 32, and the double quote as the number 34. This means that the former would be sorted before the latter. Note that the sorting order is implementation dependent, so although you are generally assured that sort will perform as expected on alphabetic input, the ordering of numbers, punctuation, and special characters is not always guaranteed. We will assume we're working with the ASCII character set in all our examples here.



sort has many options that provide more flexibility in performing your sort. We'll just describe a few of the options here.



The -u Option



The -u option tells sort to eliminate duplicate lines from the output.





$ sort -u names

Charlie

Emanuel

Fred

Lucy

Ralph

Tony

$



Here you see that the duplicate line that contained Tony was eliminated from the output.



The -r Option



Use the -r option to reverse the order of the sort:





$ sort -r names Reverse sort

Tony

Tony

Ralph

Lucy

Fred

Emanuel

Charlie

$



The -o Option



By default, sort writes the sorted data to standard output. To have it go into a file, you can use output redirection:





$ sort names > sorted_names

$



Alternatively, you can use the -o option to specify the output file. Simply list the name of the output file right after the -o:





$ sort names -o sorted_names

$



This sorts names and writes the results to sorted_names.



Frequently, you want to sort the lines in a file and have the sorted data replace the original. Typing





$ sort names > names

$



won't work�it ends up wiping out the names file. However, with the -o option, it is okay to specify the same name for the output file as the input file:





$ sort names -o names

$ cat names

Charlie

Emanuel

Fred

Lucy

Ralph

Tony

Tony

$



The -n Option



Suppose that you have a file containing pairs of (x, y) data points as shown:





$ cat data

5 27

2 12

3 33

23 2

-5 11

15 6

14 -9

$



Suppose that you want to feed this data into a plotting program called plotdata, but that the program requires that the incoming data pairs be sorted in increasing value of x (the first value on each line).



The -n option to sort specifies that the first field on the line is to be considered a number, and the data is to be sorted arithmetically. Compare the output of sort used first without the -n option and then with it:





$ sort data

-5 11

14 -9

15 6

2 12

23 2

3 33

5 27

$ sort -n data Sort arithmetically

-5 11

2 12

3 33

5 27

14 -9

15 6

23 2

$



Skipping Fields



If you had to sort your data file by the y value�that is, the second number in each line�you could tell sort to skip past the first number on the line by using the option





+1n



instead of -n. The +1 says to skip the first field. Similarly, +5n would mean to skip the first five fields on each line and then sort the data numerically. Fields are delimited by space or tab characters by default. If a different delimiter is to be used, the -t option must be used.





$ sort +1n data Skip the first field in the sort

14 -9

23 2

15 6

-5 11

2 12

5 27

3 33

$



The -t Option



As mentioned, if you skip over fields, sort assumes that the fields being skipped are delimited by space or tab characters. The -t option says otherwise. In this case, the character that follows the -t is taken as the delimiter character.



Look at our sample password file again:





$ cat /etc/passwd

root:*:0:0:The super User:/:/usr/bin/ksh

steve:*:203:100::/users/steve:/usr/bin/ksh

bin:*:3:3:The owner of system files:/:

cron:*:1:1:Cron Daemon for periodic tasks:/:

george:*:75:75::/users/george:/usr/lib/rsh

pat:*:300:300::/users/pat:/usr/bin/ksh

uucp:*:5:5::/usr/spool/uucppublic:/usr/lib/uucp/uucico

asg:*:6:6:The Owner of Assignable Devices:/:

sysinfo:*:10:10:Access to System Information:/:/usr/bin/sh

mail:*:301:301::/usr/mail:

$



If you wanted to sort this file by username (the first field on each line), you could just issue the command





sort /etc/passwd



To sort the file instead by the third colon-delimited field (which contains what is known as your user id), you would want an arithmetic sort, skipping the first two fields (+2n), specifying the colon character as the field delimiter (-t:):





$ sort +2n -t: /etc/passwd Sort by user id

root:*:0:0:The Super User:/:/usr/bin/ksh

cron:*:1:1:Cron Daemon for periodic tasks:/:

bin:*:3:3:The owner of system files:/:

uucp:*:5:5::/usr/spool/uucppublic:/usr/lib/uucp/uucico

asg:*:6:6:The Owner of Assignable Devices:/:

sysinfo:*:10:10:Access to System Information:/:/usr/bin/sh

george:*:75:75::/users/george:/usr/lib/rsh

steve:*:203:100::/users/steve:/usr/bin/ksh

pat:*:300:300::/users/pat:/usr/bin/ksh

mail:*:301:301::/usr/mail:

$



Here we've emboldened the third field of each line so that you can easily verify that the file was sorted correctly by user id.



Other Options



Other options to sort enable you to skip characters within a field, specify the field to end the sort on, merge sorted input files, and sort in "dictionary order" (only letters, numbers, and spaces are used for the comparison). For more details on these options, look under sort in your Unix User's Manual.












     

     


    No comments: