Thursday, February 09, 2006
IsText and IsBinary Functions for VBScript
The following functions determine if a file is text or binary, returing a true or false value. The results are determined by reading the first 512 characters in the file, and if less than 1/3 of the characters seem binary it is flagged as a text document. The algorythm comes from Perl's implementation of this function. A null character that is found will always indicates a binary file. Any character outside of the ASCII range 32-127 (except 8, 9(tab), 10(lf), 12, 13(cr), 27) are considered binary-like. This function will not properly handle UTF-8 encoded files.
WScript.Echo(CStr(IsText("foo.xls")))
WScript.Echo(CStr(IsBinary("foo.xls")))
Function IsBinary(strCheckFileName)
IsBinary = Not IsText(strCheckFileName)
End Function
Function IsText(strCheckFileName)
Dim testFile, fileSpec, len, i, buf, char, odd
odd = 0
Set fso = CreateObject("Scripting.FileSystemObject")
On Error Resume Next
Set testFile = fso.OpenTextFile(strCheckFileName, 1, False, 0)
If Err.Number <> 0 Then
WScript.Echo "Unable to open file. Error: " & Err.Description
Err.Clear
IsText = False
Exit Function
End If
Set fileSpec = fso.GetFile(strCheckFileName)
len = fileSpec.Size
' read a max of 512 bytes
If (len > 512) Then
len = 512
End If
buf = testFile.Read(len)
If Err.Number <> 0 Then
WScript.Echo "Unable to read file. Error: " & Err.Description
Err.Clear
IsText = False
Exit Function
End If
For i = 1 To len
char = Asc(Mid(buf, i, 1))
If char = 0 Then
' text can't contain nulls
odd = len
Exit For
ElseIf char > 127 Then
odd = odd + 1
ElseIf char < 32 _
And char <> 8 And char <> 9 And char <> 10 _
And char <> 12 And char <> 13 And char <> 27 Then
odd = odd + 1
End If
Next
' allow for up to 1/3 odd
If (odd * 3) > len Then
IsText = False
Else
IsText = True
End If
testFile.Close
Set fso = Nothing
End Function