Convert text file to excel in tabular format

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Convert text file to excel in tabular format

Post by YasserKhalil »

Hello everyone
How can I be able to convert a text file to excel as tabular format ..? I tried to open with excel directly but the output is not as expected
I will attach sample text file
You do not have the required permissions to view the files attached to this post.

User avatar
HansV
Administrator
Posts: 78236
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Convert text file to excel in tabular format

Post by HansV »

That is a totally unsuitable format. Try to get the data in a more usable format.
Best wishes,
Hans

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

This is in fact a pdf that I need to convert to excel ... I will attach the PDF file
I have tried so many applications but none is perfect. I could convert it by Able2Extract Professional and it is OK except one point which is that I am sure there is hidden content in some columns that the converter doesn't display
You do not have the required permissions to view the files attached to this post.

User avatar
HansV
Administrator
Posts: 78236
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Convert text file to excel in tabular format

Post by HansV »

Try opening the PDF file in Word, that should produce a better result. You'll still have to correct things manually.
Best wishes,
Hans

User avatar
StuartR
Administrator
Posts: 12577
Joined: 16 Jan 2010, 15:49
Location: London, Europe

Re: Convert text file to excel in tabular format

Post by StuartR »

I opened your PDF file in Foxit PhantomPDF and saved it as an Excel file. It has done a reasonable job, but some rows have been combined with in cell newlines.
You do not have the required permissions to view the files attached to this post.
StuartR


YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

Thanks a lot. The problem is not in conversion in fact
I can get a suitable output using Able2Extract like the attachment but the problem with the hidden content
For example: the content in the first row the third column should be like that
RAP, ROCK, HIP HOP, POST-GRUNGE, EDM, POP, HARD ROCK, ELECTRONIC, PROGRESSIVE HOUSE, INDIETRONICA, METAL, SOUNDTRACK, PUNK, BROSTEP, HOUSE

What is displayed is only
RAP, ROCK, HIP HOP, POST-GRUNGE,
You do not have the required permissions to view the files attached to this post.

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

Can I get the text file attached in the first post to excel to have each content in a row regardless it is related or not ..?
If you opened the text file with Notepad++ you would see the contents well ..

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

After trying a lot of applications, I have found PDF Shaper but this application converts to Word document .. The result is satisfying in the word document. But I couldn't convert the word document into excel. I think the contents are stored in shapes in word and I have no great idea about how to deal with such contents in that case. Can you help me please?

This is my try till now (It seems those frames are not under control)

Code: Select all

Sub MyTest()
    Dim wrdApp As Object, wrdDoc As Object, blnStart As Boolean, strFile As String, strContent As String
    Dim cCtl As Word.ContentControl, i As Long
    
    strFile = ThisWorkbook.Path & "\oo.docx"
    On Error Resume Next
    Set wrdApp = GetObject(Class:="Word.Application")
    If wrdApp Is Nothing Then
        Set wrdApp = CreateObject(Class:="Word.Application")
        blnStart = True
    End If
    On Error GoTo ErrHandler
    Set wrdDoc = wrdApp.Documents.Open(strFile)
    For i = 1 To wrdDoc.Frames.Count
        Cells(i, 1).Value = wrdDoc.Frames.Item(i).Range.Text
    Next i
    Stop
    'strContent = wrdDoc.Content
    
    'Dim x
    'x = Split(strContent, vbCr)
    'Debug.Print strContent
ExitHandler:
    On Error Resume Next
    wrdDoc.Close SaveChanges:=False
    If blnStart Then wrdApp.Quit SaveChanges:=False
    Exit Sub
ErrHandler:
    MsgBox Err.Description, vbExclamation
    Resume ExitHandler
End Sub
Can I control so as to get the text of each frame row by row?
You do not have the required permissions to view the files attached to this post.

User avatar
Doc.AElstein
BronzeLounger
Posts: 1499
Joined: 28 Feb 2015, 13:11
Location: Hof, Bayern, Germany

Re: Convert text file to excel in tabular format

Post by Doc.AElstein »

Hello Yasser,
I am only considering your original text file in this post
YasserKhalil wrote:
23 Nov 2020, 16:51
Can I get the text file attached in the first post to excel to have each content in a row regardless it is related or not ..?
If you opened the text file with Notepad++ you would see the contents well ..
I am not sure exactly what you are asking for there, but you can certainly get something similar in Excel to what you see in the text file……. But you will have all the text in the first column…….

I cant solve your problem, but here is some info that might be useful for you to know, if you have not already considered the things I am talking about in this post….
I am only considering the contents of your text file in this post.

I examined your text file, tt.txt , in a similar way to which we have done many times in the past with your text files.
https://excelfox.com/forum/showthread.p ... #post15136
I have only looked initially at the first bit with a reduced size file, ttFirstBit.txt

I use a macro like I have done with you many times before:

Code: Select all

 '    Lets have a look at a bit of the text file
Sub LookInFirstBitOfTextString()
Dim FileNum As Long: Let FileNum = FreeFile(1)                                                                               ' https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/freefile-function
Dim PathAndFileName As String, TotalFile As String, FlNme As String
 Let FlNme = "ttFirstBit.txt"
 Let PathAndFileName = ThisWorkbook.Path & "\" & FlNme '
Open PathAndFileName For Binary As #FileNum 'Open Route to data. Binary is a fundemental type data input...
TotalFile = Space(LOF(FileNum)) '....and wot recives it hs to be a string of exactly the right length
Get #FileNum, , TotalFile
Close #FileNum
' What is in this string?
Call WtchaGot_Unic_NotMuchIfYaChoppedItOff(TotalFile, FlNme)
End Sub
The function you need, you have had from me a few times before, and its also in this file:
tt_ExtraminationsRock.xlsm : https://app.box.com/s/z3nr7ecnj540rond1437bo48wmaxsbch


You can see that your text file seems to have no recognisable column separator. There is just a lot of spaces.
A new line is represented by the conventional vbCr & vbLf.
https://excelfox.com/forum/showthread.p ... #post15137
So there is no problem in interpreting the new line. The problem is the columns: There is no obvious way to determine where they are.
( There does not appear to be any “hidden” characters in your text file, ( other than the vbCr & vbLf ) , at least not in the portion that I have examined***. )

If you import the text file in a conventional way , (http://www.eileenslounge.com/viewtopic. ... 67#p274367
http://www.eileenslounge.com/viewtopic. ... 70#p274370
http://www.eileenslounge.com/viewtopic. ... 21#p274721
https://excelfox.com/forum/showthread.p ... #post15138
) then everything will appear in the first column. You can see that in the worksheet named “ConventionalTextImport” in the file, tt_ExtraminationsRock.xlsm . That worksheet is looking similar to your original text file

If you follow the same approach and look at different portions of your text file, you might or might not find hidden characters or possibly extra unwanted characters for a new line.
*** I have not examined the entire text file, tt.txt



I can’t help any more at this stage , because I have not much experience looking at pdf files. But I think it is always good to know what is in any text files you are looking at.
Maybe that is of some small help.

I suppose the bottom line of what I am saying in this post is the same as Hans, that the format of the text file is pretty useless if you want to get that data in tabular form: There are no characters, hidden or otherwise, that you can find to use as the column separator.


Alan

_._______________________________________

Share ‘tt_ExtraminationsRock.xlsm’ : https://app.box.com/s/z3nr7ecnj540rond1437bo48wmaxsbch
Share ‘ttFirstBit.txt’ : https://app.box.com/s/zzeqis8qhdfbzj68fzyficdfszh2tjoo
You do not have the required permissions to view the files attached to this post.
I am having difficulty logging in with this account just now.
You can find me at DocAElstein also

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

Thank you very much Mr. Alan for your great support all the time.
Can you please have a look at the word output that has the contents in frames ..? Maybe you have a better idea ..

User avatar
Doc.AElstein
BronzeLounger
Posts: 1499
Joined: 28 Feb 2015, 13:11
Location: Hof, Bayern, Germany

Re: Convert text file to excel in tabular format

Post by Doc.AElstein »

I did have a quick look at your word file already, but I don’t see initially anything I can do with it.
It does not seem to be a simple word file. It has things in it which I am unfamiliar with.


I will try again when I have more time to look at the word doco in detail, but I do not think I can help further.
I expect your pdf file may be using some advanced image type stuff. I have encountered such problems before in my own work occasionally, when trying to put table data from pdf files into Excel, and so far I have always had to resort finally to making a lot of final manual adjustments. I could never completely automate the process for some pdf files.

I think that sometimes pdf files are in a format that can only be consistently converted to text or Excel using proprietary ( what you must pay for ) software. Sometimes the free software works, or comes close, but in my , ( limited ) , experience , not always.

If I have any more ideas I will post again , but for now I think I am out of ideas.

Alan

Edit P.S. I think in fact, my initial post has suggesd the data is more organised than it actually is. In fact the simple text import gives us about 150 rows. But you actually have a wanted final result of around approx 50 rows. So clearly we do have some rogue extra new lines as well as some mixed up order and randomly split over extra line data. So the text file is a real mess, and unlkely to be able to be able to be sorted out, other than humanly...
Last edited by Doc.AElstein on 24 Nov 2020, 16:02, edited 2 times in total.
I am having difficulty logging in with this account just now.
You can find me at DocAElstein also

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

Thanks a lot. I already tried the paid software for converting the PDF to excel and word with the assistance of one of my friends and none of those applications succeed with 100 % to get the desired output.

User avatar
Doc.AElstein
BronzeLounger
Posts: 1499
Joined: 28 Feb 2015, 13:11
Location: Hof, Bayern, Germany

Re: Convert text file to excel in tabular format

Post by Doc.AElstein »

Hello Yasser
I took another look at your text file data
( Once again I am only concerned with the Text file data. I don’t think I can help with anything to do with your Word file attempts)
The text file data is much more messed up than I initially thought.


But I do have a macro solution. But I think it’s a very bad one. At first glance it seems to get the correct results, or very close to the correct results, based on your test data.
But there may well be some errors in the results that I have not noticed, and with other data it might give extra errors
I am passing the code on to you , as I spent some time doing it. But, I don’t think it is a reliable or good solution. It is mostly using very simple coding techniques in a very long boring tedious way. I don’t think it is a reliable solution that I would want to use for anything important.
The only thing remotely “clever” in the macro is another use of that interesting “Index on a unjagged jagged 1Ds arrays technique” that we first noticed here:
https://eileenslounge.com/viewtopic.php ... 91#p266691
https://eileenslounge.com/viewtopic.php ... 27#p266727
https://www.ozgrid.com/forum/index.php? ... ost1239241

That novel technique does save a lot of looping. But the improved performance from that is wasted in this case because of all the other complicated Ifs and Looping

The output produced by the macro ( shown in worksheet “TextToTabular” ) of the uploaded file,
“tt_ExtraminationsRock.xls” ,
is very similar to the
“Sample.pdf”
But I can see some discrepancies in the column for Followers. This is because two numbers are completely missing from the text file ( 958 and 17145 ) –
Missing_958.jpg : https://i.imgur.com/q9fFtW0.jpg https://imgur.com/q9fFtW0
https://excelfox.com/forum/showthread.p ... #post15141
Its obviously totally impossible for any macro to extract certain data from a text file if that particular data is not contained anywhere in the text file in any form what so ever!!. Sometimes the macro I have written will deal with missing data correctly and leave an empty cell. Sometimes missing data will introduced errors.
I have indicated this problem in the results in worksheet “TextToTabular”
Initially I don’t see any other differences between the pdf file and what the macro produces, but I have not thoroughly checked every result in detail. Obviously I leave that work to you.
_.____________________

The macro, Sub TextFileToTabular() , is based approximately on the following assumptions/ criteria

_ (A) Mostly there are 6 element(column) entries per line, that is to say , mostly in the text file there are 6 element(column) entries which belong to a finally wanted row outoput.
_(A)(i) The macro will deal with some cases of Missing data for a row, but it wont deal with all such occurrences
_(B) The first 3 element(column) entries are separated by at least two spaces on a row in the text file that has at least those three entries.
_(B)(i) The macro will deal with some cases of Curator and Genres only being separated by one space, but it wont deal with all such occurrences
_(C) the element(column) entry for Best Way To Contact will always contain one of the following in the text string
@
https://www.instagram.com
https://www.facebook.com
http://reddit.com

_(D) the element(column) entry for Spotify Link will always contain the string
https://open.spotify.com/

Here is the macro , Sub TextFileToTabular() :
https://excelfox.com/forum/showthread.p ... #post15143
The macro is also in the file, “tt_ExtraminationsRock.xls”


Alan

_._____________________________________________________________

Share ‘tt_ExtraminationsRock.xls’ https://app.box.com/s/o5ka0fckmdp573tfyz9swwwir73hcnow
Last edited by Doc.AElstein on 26 Nov 2020, 15:07, edited 8 times in total.
I am having difficulty logging in with this account just now.
You can find me at DocAElstein also

YasserKhalil
PlatinumLounger
Posts: 4911
Joined: 31 Aug 2016, 09:02

Re: Convert text file to excel in tabular format

Post by YasserKhalil »

Thanks a lot Mr. Alan for your support all the time
Best and Kind Regards

User avatar
Doc.AElstein
BronzeLounger
Posts: 1499
Joined: 28 Feb 2015, 13:11
Location: Hof, Bayern, Germany

Re: Convert text file to excel in tabular format

Post by Doc.AElstein »

Hello
This is just a short follow up that might be useful for future reference.

I have used my “WotchaGot” function in this and some similar Threads recently. The main purpose of that function is to get a list , or some similar output to show clearly all the characters present in a particular text string, or text file.

That Function adds a worksheet or two, to the active workbook giving a detailed list of all characters present , and also one of the main outputs of the function is to produce a single text string, held in a variable “WotchaGot” which is in the form that you would use in coding to produce that String.

For example, if this was my actual text, as seen , for example in Notepad .._

Happy
Advent Sunday


_.. then the text inside Wotchagot, as seen in the Immediate Window, or pasted into a cell would have the code line form like

”Happy” & vbCr & vbLf & “Advent Sunday”

So far so good, and mostly that is useful.
You hit a couple of problems if you are wanting to look at the contents of a large text string or large Text file.

The first problem is that in a cell or the Immediate Window you hit a character limit.

To overcome that problem I have added a section '3c)(i) that puts that string into a text file, which , as far as I know has a much larger character limit than a cell or the Immediate Window
However that can, for a very large text string or text file, be a bit difficult to navigate through, since you end up with a few extremely long lines that you have to scroll left and right through to look at the string. It is difficult for example to relate the string to the original text file.
Here for example , is how the text file produced looks for the original text file example given at the start of this thread.
WotchaGot_in_tt.txt : https://app.box.com/s/3hqrkgity8945tx70izjhj9e6wpaewg7 , http://i.imgur.com/8MPUFm9.jpg
tt__txt.JPG
To make the string a bit more easy to navigate through, I have added another section , '3c)(ii)
This section adds a “real” , that is to say a “hidden” vbCr & vbLf pair after every text "vbCr & vbLf & "pair within the string produced by the function, WotchGot
A text file is then made of that modified string. In that text file it’s a bit easier to relate to the lines seen in the original file if viewed in Notepad or something similar, or in Excel
Here the text file output from section , '3c)(ii) for the same example file
WotchaGot_in_tt_inLines.txt : https://app.box.com/s/8impk67jxe9m1jxe61bv3a39sp3sqgzm , http://i.imgur.com/V0S3x6u.jpg
ttInLines__txt.JPG
I also added one last section, '3c(iii) , which adds a line number to the text as displayed in the text file output from the function.
If , for example, you were importing the original text file using VBA in the typical way, then those line number would relate to the element number of the 1 Dimensional array of rows, that you would typically make towards the start of the macro . ( That array would be the one usually made by splitting the complete original text string of the file by the typical line separator, vbCr & vbLf )
WotchaGot_in_tt_inNumberedLines.txt : https://app.box.com/s/smjurb3oqpv1vz9hy6cdoxldcg0lhs5r , http://i.imgur.com/1gxYpE7.jpg
ttInNumberedLines__txt.JPG
_._______________

One thing to remember is …._
WotchaGot_in_tt.txt represents the string accurately in a code form that you could use to copy and paste into the VB editor to make the string in a code line
The other two files have some extra characters in so as to make it easier for you to see and visualise in a text editor such as Notepad. You could still copy and paste parts of a line from the second two text file outputs, to use in VBA coding, but if you copy multiple lines from the text editor, then you will copy extra characters that don’t belong to the original text, ( Those are the invisible vbCr and vbLf and the added text for the line numbers) .




Alan
”Happy” & vbCr & vbLf & “Advent Sunday”
Or
”Happy” & vbCr & vbLf
“Advent Sunday”

Or
0 ”Happy” & vbCr & vbLf
1 “Advent Sunday”

Code: Select all

 '3c) Output  WotchaGot  string to a text file
'3c)(i) Simple string
Dim FileNum2 As Long: Let FileNum2 = FreeFile(0)                                  ' https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/freefile-function
Dim PathAndFileName2 As String
 Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WotchaGot_in_" & Replace(FlNme, ".txt", "", 1, 1, vbBinaryCompare) & ".txt" ' CHANGE path TO SUIT
 Open PathAndFileName2 For Output As #FileNum2 ' CHANGE TO SUIT  ' Will be made if not there
 Print #FileNum2, WotchaGot ' write out entire text file
 Close #FileNum2
'3c)(ii) Introduce an  "invisible"  vbCr & vbLf  pair after each  seen pair within  the string. this will give actual lines in the text file
 Let WotchaGot = Replace(WotchaGot, "vbCr & vbLf & ", "vbCr & vbLf" & vbCr & vbLf, 1, -1, vbBinaryCompare)
 Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WotchaGot_in_" & Replace(FlNme, ".txt", "", 1, 1, vbBinaryCompare) & "_inLines.txt" ' CHANGE path TO SUIT
 Open PathAndFileName2 For Output As #FileNum2 ' CHANGE TO SUIT  ' Will be made if not there
 Print #FileNum2, WotchaGot ' write out entire text file
 Close #FileNum2
 '3c(iii) Number the new introduced actual Liners in the text file
Dim arrIt() As String: Let arrIt() = Split(WotchaGot, vbCr & vbLf, -1, vbBinaryCompare)
 Let WotchaGot = ""
    For Cnt = 0 To UBound(arrIt())
     Let WotchaGot = WotchaGot & Cnt & " " & arrIt(Cnt) & vbCr & vbLf
    Next Cnt
 Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WotchaGot_in_" & Replace(FlNme, ".txt", "", 1, 1, vbBinaryCompare) & "_inNumberedLines.txt" ' CHANGE path TO SUIT
 Open PathAndFileName2 For Output As #FileNum2 ' CHANGE TO SUIT  ' Will be made if not there
 Print #FileNum2, WotchaGot ' write out entire text file
 Close #FileNum2

End Sub
_.______________
‘Function_WotchaGot_Code.txt’ : https://app.box.com/s/e8hi5k2bli361ytntmnj4jy3c0yn96m9
You do not have the required permissions to view the files attached to this post.
I am having difficulty logging in with this account just now.
You can find me at DocAElstein also