TextStream, OpenText, and reading UTF-8 in Excel

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

Greetings,

I've got an external device that Emails me .csv files with UTF-8 encoding.
The file contains numbers which may have leading zeroes and strings which may contain special characters.

Reading them seems to be problematic. What I get depends on how I read them.
For example, with Notepad or Word I get "Don’t change me!", if I read it as a TextStream with TextFileRead, I get "Don’t change me!", If I read it into a new sheet with OpenText, I get "Donât change me!"

The first line of the file defines the separator. This makes it difficult to use OpenText without opening the file twice. In any case, OpenText seems to have its own problems.

Code: Select all

Sub getData()

Dim vColumns(1 To 7) As Variant
Dim iOrigin As Integer
Dim iIndex As Integer

    iOrigin = 65001         ' MS codepage for UTF-8
    For iIndex = 1 To 7
        vColumns(iIndex) = Array(iIndex, xlTextFormat)
    Next iIndex

    Workbooks.OpenText Filename:= _
            ".\TestCases\TC7c1.csv", _
            Origin:=65001, DataType:=xlDelimited, Semicolon:=True, _
            TextVisualLayout:=vColumns
End Sub ' getData
causes the sub to get a runtime error. Sometimes Excel falls over!
The created worksheet does not use the TextVisualLayout info as documented at http://msdn.microsoft.com/en-us/library ... 11%29.aspx" onclick="window.open(this.href);return false;. The first line in the file is ignored. Dates and times are converted, numbers lose leading zeroes, etc.

Short of building my own UTF-8 datastream decoder, is there a quick fix?

TIA for any insight
ZK

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

How about

Code: Select all

Sub getData()
    Workbooks.OpenText Filename:=".\TestCases\TC7c1.csv", _
        Origin:=65001, StartRow:=2, DataType:=xlDelimited, _
        Semicolon:=True, FieldInfo:=Array(Array(1, xlTextFormat), _
        Array(1, xlTextFormat), Array(1, xlTextFormat), _
        Array(1, xlTextFormat), Array(1, xlTextFormat), _
        Array(1, xlTextFormat), Array(1, xlTextFormat))
End Sub ' getData
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

Thanks Hans, but the results are the same.
I've hung a little test file here, in case you want to play with it some.
ZK
You do not have the required permissions to view the files attached to this post.

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

Change the first line of the text file to

content-type:application/csv;charset=UTF-8

If necessary, you can do this programmatically. Code similar to what I posted should work then.
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

Almost, but not quite...
The extended characters are read OK, but the TextVisualLayout array still seems to be dysfunctional.
The first column comes in as Text, which is OK, cols 2-6 are Standard, and col 7 is Custom ("hh:mm").
My leading zeroes are still AWOL. Leading blanks come out OK.
I can't tell if I'm winning! :scratch:
ZK

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

My code was rather sloppy, I put it together too hastily. If you want all columns as text:

Code: Select all

Sub getData()
    Workbooks.OpenText Filename:="TCUTF_8.csv", _
        Origin:=65001, StartRow:=2, DataType:=xlDelimited, _
        Semicolon:=True, FieldInfo:=Array(Array(1, xlTextFormat), _
        Array(2, xlTextFormat), Array(3, xlTextFormat), _
        Array(4, xlTextFormat), Array(5, xlTextFormat), _
        Array(6, xlTextFormat), Array(7, xlTextFormat))
End Sub ' getData
(I had forgotten to change 1 to 2, 3, ..., 7)
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

In that case, the original loop also works (since I don't know in advance how many columns there will be.
I noticed in my original code I usedTextVisualLayout instead of FieldInfo, which surely didn't help.

I'll admit that I don't understand why it works, but I'll take it!

Thanks
:cheers:
ZK

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

TextVisualLayout is a mysterious argument - it's intended to specify left-to-right (Western) or right-to-left (Hebrew, Arabic) text, but at least in Excel 2002 it didn't work - see The TextVisualLayout Parameter Is Not Available for the OpenText Method in Excel 2002.
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

Final outcome: writing the UTF-8 decoder was easier than fiddling around with editing the files, so that's what I did.
I'll shove it in my toolbox until I need it again.
Thanks for the help; at least I learned something.
ZK

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

Thanks for the update.
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

...and at the end of the day, it turns out that the documentation was wrong!
They're really using ANSI 16-bit encoding.
At least now I can restore the code to improve the performance. :hairout:
ZK

User avatar
HansV
Administrator
Posts: 78488
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by HansV »

Don't you :love: documentation!
Best wishes,
Hans

Zauberkind
2StarLounger
Posts: 141
Joined: 21 Oct 2011, 10:08

Re: TextStream, OpenText, and reading UTF-8 in Excel

Post by Zauberkind »

You know what they say (or, at least said in my time), "that and 50 cents will buy you a 25 cent cup of coffee!"

:cheers: - who needs coffee?

ZK