Wednesday, October 28, 2009

Recipe 6.11. Performing Random Access on










Recipe 6.11. Performing Random Access on "Read-Once" Input Streams





Problem


You have an IO object, probably a socket, that doesn't support random-access methods like seek, pos=, and rewind. You want to treat this object like a file on disk, where you can jump around and reread parts of the file.




Solution


The simplest solution is to read the entire contents of the socket (or as much as you're going to need) and put it into a
StringIO
object. You can then treat the StringIO object exactly like a file:



require 'socket'
require 'stringio'

sock = TCPSocket.open("www.example.com", 80)
sock.write("GET /\n")

file = StringIO.new(sock.read)
file.read(10) # => "<HTML>\r\n<H"
file.rewind
file.read(10) # => "<HTML>\r\n<H"
file.pos = 90
file.read(15) # => " this web page "





Discussion


A socket is supposed to work just like a file, but sometimes the illusion breaks down. Since the data is coming from another computer over which you have no control, you can't just go back and reread data you've already read. That data has already been sent over the pipe, and the server doesn't care if you lost it or need to process it again.


If you have enough memory to read the entire contents of a socket, it's easy to put the results into a form that more closely simulates a file on disk. But you might not want to read the entire socket, or the socket may be one that keeps sending data until you close it. In that case you'll need to buffer the data as you read it. Instead of using memory for the entire contents of the socket (which may be infinite), you'll only use memory for the data you've actually read.


This code defines a
BufferedIO
class that adds data to an internal
StringIO
as it's read from its source:



class
BufferedIO
def initialize(io)
@buff = StringIO.new
@source = io
@pos = 0
end

def read(x=nil)
to_read = x ? to_read = x+@buff.pos-@buff.size : nil
_append(@source.read(to_read)) if !to_read or to_read > 0
@buff.read(x)
end

def pos=(x)
read(x-@buff.pos) if x > @buff.size
@buff.pos = x
end

def seek(x, whence=IO::SEEK_SET)
case whence
when IO::SEEK_SET then self.pos=(x)
when IO::SEEK_CUR then self.pos=(@buff.pos+x)
when IO::SEEK_END then read; self.pos=(@buff.size-x)
# Note: SEEK END reads all the socket data.
end
pos
end

# Some methods can simply be delegated to the buffer.
["pos", "rewind", "tell"].each do |m|
module_eval "def #{m}\n@buff.#{m}\nend"
end

private

def _append(s)
@buff << s
@buff.pos -= s.size
end
end



Now you can seek, rewind, and generally move around in an input socket as if it were a disk file. You only have to read as much data as you need:



sock = TCPSocket.open("www.example.com", 80)
sock.write("GET /\n")
file = BufferedIO.new(sock)

file.read(10) # => "<HTML>\r\n<H"
file.rewind # => 0
file.read(10) # => "<HTML>\r\n<H"
file.pos = 90 # => 90
file.read(15) # => " this web page "
file.seek(-10, IO::SEEK_CUR) # => 95
file.read(10) # => " web page "



BufferedIO doesn't implement all the methods of IO, only the ones not implemented by socket-type IO
objects. If you need the other methods, you should be able to implement the ones you need using the existing methods as guidelines. For instance, you could implement readline like this:



class BufferedIO
def readline
oldpos = @buff.pos
line = @buff.readline unless @buff.eof?
if !line or line[-1] != ?\n
_append(@source.readline) # Finish the line
@buff.pos = oldpos # Go back to where we were
line = @buff.readline # Read the line again
end
line
end
end

file.readline # => "by typing "example.com",\r\n"





See Also


  • Recipe 6.17, "
    Processing a Binary File," for more information on IO#seek













No comments: