Tuesday, October 27, 2009

Recipe 6.5. Listing a Directory










Recipe 6.5. Listing a Directory



Problem


You want to list or process the files or subdirectories within a directory.




Solution




If you're starting from a directory name, you can use
Dir.entries
to get an array of the items in the directory, or
Dir.foreach
to iterate over the items. Here's an exampleof each run on a sample directory:



# See the chapter intro to get the create_tree library
require 'create_tree'
create_tree 'mydir' =>
[ {'subdirectory' => [['file_in_subdirectory', 'Just a simple file.']] },
'.hidden_file', 'ruby_script.rb', 'text_file' ]


Dir.entries('mydir')

# => [".", "..", ".hidden_file", "ruby_script.rb", "subdirectory",
# "text_file"]


Dir.foreach('mydir') { |x| puts x if x != "." && x != ".."}
# .hidden_file
# ruby_script.rb
# subdirectory
# text_file



You can also use Dir[] to pick up all files matching a certain pattern, using a format similar to the bash shell's glob format (and somewhat less similar to the wildcard format used by the Windows command-line shell):



# Find all the "regular" files and subdirectories in mydir. This excludes
# hidden files, and the special directories . and ..
Dir["mydir/*"]
# => ["mydir/ruby_script.rb", "mydir/subdirectory", "mydir/text_file"]

# Find all the .rb files in mydir
Dir["mydir/*.rb"] # => ["mydir/ruby_script.rb"]



You can also open a directory handle with Dir#open, and treat it like any other Enumerable. Methods like each,each_with_index, grep, and reject will all work (but see below if you want to call them more than once). As with File#open, you should do your directory processing in a code block so that the directory handle will get closed once you're done with it.



Dir.open('mydir') { |d| d.grep /file/ }
# => [".hidden_file", "text_file"]

Dir.open('mydir') { |d| d.each { |x| puts x } }
# .
# ..
# .hidden_file
# ruby_script.rb
# subdirectory
# text_file





Discussion




Reading entries from a Dir object is more like reading data from a file than iterating over an array. If you call one of the Dir instance methods and then want to call another one on the same Dir object, you'll need to call Dir#rewind first to go back to the beginning of the directory listing:



#Get all contents other than ".", "..", and hidden files.

d = Dir.open('mydir')
d.reject { |f| f[0] == '.' }
# => ["subdirectory", "ruby_script.rb", "text_file"]
#Now the Dir object is useless until we call Dir#rewind.
d.entries.size # => 0
d.rewind
d.entries.size # => 6

#Get the names of all files in the directory.
d.rewind
d.reject { |f| !File.file? File.join(d.path, f) }
# => [".hidden_file", "ruby_script.rb", "text_file"]

d.close




Methods for listing directories and looking for files return string pathnames instead of File and Dir objects. This is partly for efficiency, and partly because creating a File or Dir actually opens up a filehandle on that file or directory.


Even so, it's annoying to have to take the output of these
methods and patch together real File or Dir objects on which you can operate. Here's a simple method that will build a File or Dir, given a filename and the name or Dir of the parent directory:



def File.from_dir(dir, name)
dir = dir.path if dir.is_a? Dir
path = File.join(dir, name)
(File.directory?(path) ? Dir : File).open(path) { |f| yield f }
end



As with File#open and Dir#open, the actual processing happens within a code block:



File.from_dir("mydir", "subdirectory") do |subdir|
File.from_dir(subdir, "file_in_subdirectory") do |file|
puts %{My path is #{file.path} and my contents are "#{file.read}".}
end
end
# My path is mydir/subdirectory/file_in_subdirectory and my contents are
# "Just a simple file".



Globs make excellent shortcuts for finding files in a directory or a directory tree. Especially useful is the ** glob, which matches any number of directories. A glob is the easiest and fastest way to recursively process every file in a directory tree, although it loads all the filenames into an array in memory. For a less memoryintensive solution, see the find library, described in Recipe 6.12.



Dir["mydir/**/*"]
# => ["mydir/ruby_script.rb", "mydir/subdirectory", "mydir/text_file",
# "mydir/subdirectory/file_in_subdirectory"]

Dir["mydir/**/*file*"]
# => ["mydir/text_file", "mydir/subdirectory/file_in_subdirectory"]



A brief tour of the other features of globs:



#Regex-style character classes
Dir["mydir/[rs]*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory"]
Dir["mydir/[^s]*"] # => ["mydir/ruby_script.rb", "mydir/text_file"]

# Match any of the given strings
Dir["mydir/{text,ruby}*"] # => ["mydir/text_file", "mydir/ruby_script.rb"]

# Single-character wildcards
Dir["mydir/?ub*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory"]



Globs will not pick up files or directories whose names start with periods, unless you match them explicitly:



Dir["mydir/.*"] # => ["mydir/.", "mydir/..", "mydir/.hidden_file"]





See Also


  • Recipe 6.12, "Walking a Directory Tree"

  • Recipe 6.20, "Finding the Files You Want"













No comments: