Tuesday, April 17, 2007

Random Access with StreamReader - Seeking and File Position with ReadLine

I spent some time researching how to have dynamic access with StreamReader while using ReadLine and the solution turned out to more complicated that I would expect for .NET at this stage of its development.

My problem is that I have a text file I am reading with a StreamReader object and I am using ReadLine to get one line of text at a time. Nothing special here, but periodically I need to go back to a previous position in the file and read that part again.

If you use the BaseStream property of the StreamReader Object, then you can use the Position property and Seek method to get and set the current position in the file. These two items would appear to be all that you need to get the random access you need; however, when you use Position to get the offset of the file after the ReadLine, you don’t get the position of the file after the last line read, you get the position of the end of the buffer (usually 1024 bytes in size). So, when you use Seek to go back to that position, you will unlikely get back to the position you want unless the buffer boundary just happens to work out right.

I searched around for a simple solution to this problem (there may be and I haven’t found it yet). There were lots of posting about using FileStream, but it doesn't have ReadLine to read a line of text, so they were suggesting that you implement your own version of ReadLine for FileStream. I also found information on DiscardBufferedData that can be used with StreamReader, but it doesn’t help you get the correct offset after using ReadLine.
There were several suggestions on writing your own version of StreamReader:

http://bytes.com/forum/thread238508.html

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=6720&SiteID=1

I finally bit the bullet and created my own class to accomplish what I need (and only what I need). The previous two posts didn’t provide me with a working solution, so I am going to post what I have come up with. It may need some more refinement, but it appears to work okay. This is in VB.NET, but should be easily translatable into C# if necessary.

The first code snippet is an example of using the FileClass to read a file. In this example, I am looking for the string “*****StartOfData******”. When this string is found, I get the position and then call my PreprocessData function to read the rest of the data. I then go back to the position and run ProcessData from the same point:



Dim list() As String
Dim sep() As Char = {""c}

Try
Dim s As New FileClass
s.Open(TextBoxDataFile.Text)

Dim buffer As String = ""

Do

If Not s.GetNextLine(buffer) Then
Exit Do
End If

list = buffer.Split(sep)

If buffer = "*****StartOfData*****" Then

Dim startOfData As Integer = s.GetCurrentOffset()

PreprocessData(s)

s.SetCurrentOffset(startOfData)

ProcessData(s)

End If

Loop Until s.EOF()

s.Close()

Catch ex As Exception

Return False

End Try



The FileClass is shown below:



Imports System.IO
Imports System.text

Public Class FileClass

Const BUFFER_SIZE As Integer = 1024

Private g_file As StreamReader = Nothing
Private g_line As Integer = 0
Private g_position As Integer = 0
Private g_buffer(BUFFER_SIZE) As Char
Private g_bufferSize As Integer = 0
Private g_offset As Integer = 0
Private g_eofFlag As Boolean = True
Private g_lineBuffer As New StringBuilder(BUFFER_SIZE)
Private g_bufferOffset As Integer = 0

Public Function Open(ByVal filename As String) As Boolean

If Not g_file Is Nothing Then Close()

g_file = New StreamReader(filename)
g_line = 0
g_position = 0
g_eofFlag = False
g_bufferSize = 0
g_bufferOffset = 0

LoadBuffer()

End Function

Public Function Close() As Boolean

g_file.Close()
g_file = Nothing
g_line = 0
g_position = 0
g_eofFlag = True
g_bufferSize = 0

Return True

End Function

Public Function GetCurrentOffset() As Integer
Return g_offset
End Function

Public Function SetCurrentOffset(ByVal offset As Integer) As Boolean

Dim pos As Long = g_file.BaseStream.Seek(offset, SeekOrigin.Begin)

g_file.DiscardBufferedData()

LoadBuffer()

Return offset = pos

End Function

Public Function GetNextLine(ByRef data As String) As Boolean

g_lineBuffer.Length = 0

Dim ch As Char
Dim flag As Boolean = False

While Not flag

ch = g_buffer(g_position)

If ch = vbCr Then
' do nothing - skip cr
ElseIf ch = vbLf Then
flag = True
Else
g_lineBuffer.Append(ch)
End If

g_position = g_position + 1

If g_position = g_bufferSize Then
If Not LoadBuffer() Then
Exit While
End If
End If

End While

If flag Then
g_offset = g_bufferOffset + g_position
data = g_lineBuffer.ToString
Return True
End If

Return False

End Function

Private Function LoadBuffer() As Boolean

g_bufferOffset = Convert.ToInt32(g_file.BaseStream.Position)
g_position = 0
g_bufferSize = g_file.Read(g_buffer, 0, BUFFER_SIZE)

If g_bufferSize = 0 Then
g_eofFlag = True
Return False
End If

Return True

End Function

Public Function EOF() As Boolean
Return g_eofFlag
End Function

End Class



The FileClass is pretty simple, it just has to fill a buffer and look for the carriage returns and linefeeds itself to make generate the line it reads.