Programmer's Cookbook

Recipes for the practical programmer

Thursday, February 09, 2006

 

IsText and IsBinary Functions for VBScript

The following functions determine if a file is text or binary, returing a true or false value. The results are determined by reading the first 512 characters in the file, and if less than 1/3 of the characters seem binary it is flagged as a text document. The algorythm comes from Perl's implementation of this function. A null character that is found will always indicates a binary file. Any character outside of the ASCII range 32-127 (except 8, 9(tab), 10(lf), 12, 13(cr), 27) are considered binary-like.  This function will not properly handle UTF-8 encoded files.

WScript.Echo(CStr(IsText("foo.xls")))
WScript.Echo(CStr(IsBinary("foo.xls")))

Function IsBinary(strCheckFileName)
IsBinary = Not IsText(strCheckFileName)
End Function

Function IsText(strCheckFileName)
Dim testFile, fileSpec, len, i, buf, char, odd

odd = 0

Set fso = CreateObject("Scripting.FileSystemObject")
On Error Resume Next
Set testFile = fso.OpenTextFile(strCheckFileName, 1, False, 0)

If Err.Number <> 0 Then
WScript.Echo "Unable to open file. Error: " & Err.Description
Err.Clear
IsText = False
Exit Function
End If

Set fileSpec = fso.GetFile(strCheckFileName)
len = fileSpec.Size

' read a max of 512 bytes
If (len > 512) Then
len = 512
End If

buf = testFile.Read(len)

If Err.Number <> 0 Then
WScript.Echo "Unable to read file. Error: " & Err.Description
Err.Clear
IsText = False
Exit Function
End If

For i = 1 To len
char = Asc(Mid(buf, i, 1))

If char = 0 Then
' text can't contain nulls
odd = len
Exit For
ElseIf char > 127 Then
odd = odd + 1
ElseIf char < 32 _
And char <> 8 And char <> 9 And char <> 10 _
And char <> 12 And char <> 13 And char <> 27 Then
odd = odd + 1
End If
Next

' allow for up to 1/3 odd
If (odd * 3) > len Then
IsText = False
Else
IsText = True
End If

testFile.Close
Set fso = Nothing
End Function

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?