Fixing Illegal Characters in Filenames

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

This thread is a brief alert on bad filenames.
Please see also "Puzzle for a rainy day"

This Google Search returns a zillion threads telling me that illegal characters are illegal in Windows file names. (Thanks Guys!)
This thread seems to offer a Python solution, but setting aside time to learn and install Python is beyond me right now.
This thread suggests I educate Mac users, but I don’t fancy educating millions of You Tube folks who upload music tracks as MP4 files.
This thread tells me how to strip illegal characters, but that is not my problem.

My problem is RENAMEing the files once found.

The files are found quite easily; my VBA code points to a folder T:\Music\ and grabs 18,000 FullNames of MP3 files, stores them in an array.
Sadly the SHELL function (FI.Path) or the DIR statement returns a Full name with the illegal characters (Unicode chrW(8208) for a hyphen replaced by a legal hyphen (Chr(045))). Only when I go to make use of the Full name am I told “File does not exist”.
I can loop through my string array(18000) and test the existence of every file in the string array and locate 161 Full names that fail. The trouble is that (a) I am not told which characters have been replaced (-, é, ö, and so on) and (b) I cannot issue a rename command (Name xxx AS yyy) because the original name with illegal characters is, well, illegal.

I can accept that Windows designers could not have known that Apple was pursuing a different course, nor could they have known that Asia existed.

What I can’t accept is that Windows quite clearly maintains the fiction that these files exist (I can manually visit the folder, locate the file “Bach ‐ BWV 289 Wir bitten dich, du ewger Sohn.mp3” and manually F2-edit the file name to switch from the illegal hyphen (or whatever) to a legal hyphen) but there appears to be no mechanism provided to correct these names programmatically.
Doing it all manually in a pre-processor pass (LocateIllegallyNamedFiles) to my mind defeats the whole purpose of computers – that they are good at doing Boring And Repetitive Tasks.

Unless anyone knows of a package or technique that allows me to obtain the real (and sometimes illegal) filename of a file and use that real and illegal name to refer to a file.

Cheers
Chris
Last edited by ChrisGreaves on 10 Apr 2021, 13:37, edited 1 time in total.
An expensive day out: Wallet and Grimace

JoeP
SilverLounger
Posts: 2051
Joined: 25 Jan 2010, 02:12

Re: Fixing Illegal Characters in Filenames

Post by JoeP »

You can probably do it with PowerShell. I don't know the correct syntax but I'm sure there are many examples if you search.
Joe

User avatar
HansV
Administrator
Posts: 78236
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Fixing Illegal Characters in Filenames

Post by HansV »

See if this old discussion helps: Renaming long file name with illegal characters.
Suggestions found there:

- Use \\?\C:\Test instead of C:\Test
- Enclose the 'wrong' filename in quotes.
- And in a discussion linked to: import the files into a 7Zip archive, then export them.
Best wishes,
Hans

User avatar
StuartR
Administrator
Posts: 12577
Joined: 16 Jan 2010, 15:49
Location: London, Europe

Re: Fixing Illegal Characters in Filenames

Post by StuartR »

You can make changes to a file that has an illegal name by using the old DOS 8.3 format name
Parse the output of DIR /X at a command prompt to find the 8.3 name to use.
StuartR


User avatar
jonwallace
5StarLounger
Posts: 1118
Joined: 26 Jan 2010, 11:32
Location: "What a mighty long bridge to such a mighty little old town"

Re: Fixing Illegal Characters in Filenames

Post by jonwallace »

Hi Chris
To take a lateral view, if you're renaming MP3s, then you you should take a look at mp3tag. I've used this for years to rename those tracks I've ripped from my cd collection to a standardised format. The filename from tag function is particularly handy, as is the tag from filename function to go the other way. It does much more than this of course (but not too much more :innocent: that it's grown out of its socks) and I suggest that you check out the webpage. It is :free: but you can donate if you find it useful and want to support the author.
John

“Always trust a microbiologist because they have the best chance of predicting when the world will end”
― Teddie O. Rahube

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

StuartR wrote:You can make changes to a file that has an illegal name by using the old DOS 8.3 format name
Parse the output of DIR /X at a command prompt to find the 8.3 name to use.
Stuart this is BRILLiant!
I have begun adding "DIR /s/w/x" as a pre-processor to my task and will report back with findings and code.
Cheers
Chris
An expensive day out: Wallet and Grimace

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

jonwallace wrote:To take a lateral view, if you're renaming MP3s, then you you should take a look at mp3tag. I've used this for years to rename those tracks I've ripped from my cd collection to a standardised format. The filename from tag function is particularly handy, as is the tag from filename function to go the other way. It does much more than this of course (but not too much more :innocent: that it's grown out of its socks) and I suggest that you check out the webpage. It is :free: but you can donate if you find it useful and want to support the author.
Hi Jon, and thanks for th tip.
I shall take a look at the package.

The web site caused a few hairs to trigger on the back of my neck:
"Replace characters or words Replace strings in tags and filenames (with support for Regular Expressions)."
This sounds good on the surface but my experience with Win7HP/Word2003/VBA tells me that renaming a file is good only if you can refer to the file.

Names with illegal characters are sanitized by Win/Word, so that by the time I have built an array of filenames and started working through them, a test "blnFileExists" will return FALSE (or a program will fail) because, of course, the file on the hard drive still has an illegal character and the sanitised name does not refer to an existing file.
Or worse: the sanitised file name refers to a different file that you have been hanging on to for twenty years ...
Cheers
Chris
An expensive day out: Wallet and Grimace

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

ChrisGreaves wrote:
08 Jan 2020, 15:03
... My problem is RENAMEing the files once found.
Well, it's been a year ...
This morning I finally got rid of all illegal characters in filenames in my 18,000 track T:\Music\ folder.
I found no easy way to automate the process, and I grew resigned to this because: since one part of my computer (hardware plus operating system plus VBA machine) allows storage of files with illegal names, and another part of the same computer refuses to let me address the files programmatically, there is not likely to be a way in which I can fully automate the complete process of detection and correction.

I thought that the DIR/X was a way around the problem, but it did not work, and I have forgotten why I dropped the idea.

In the end I assembled the 18,000 names in an array and tested each name using the ancient Function blnFileExists (attached), then built a two-column array of Full Names. The left hand column holds the illegal names. The right-hand column holds my best attempt at an automated renaming.

The automated renaming is based on a mapping of characters ("Function LoadMappingArray" attached).
I manually copy a left-hand cell to the clipboard and paste that into Everything. This locates ALL copies of this illegally named file.
I then Copy the right-hand cell filename into the clipboard and in Everything <F2 edit> and Paste the correct name over the illegal name.

You might think that the list in Everything shrinks to null as the set of the illegally-named files is reduced, but Everything seems to do its own mapping. For example the 8208-hyphen is illegal, but shows up in Everything (thank heavens!), but when I replace an 8208-hyphen with a 045-hyphen, Everything shrugs and says "It's the same thing"!

Repeated runs of the search for illegal file names (each run across 18,000 files takes about one minute) bring up illegal names that have not been sufficiently cleansed, and in this way I painfully add yet-another-symbol translation to my Function LoadMappingArray. But at least that is one more character that should not plague me again.

Renaming files from the two-column array is a good exercise while waiting for a meeting in the cafe. I can rename manually at the rate of about 60 per hour.

Much more: for example there are valid reasons for "illegal" file names on my hard drive, just not in my T:\Music\ or T:\Images\ folders.

I plan to run this utility on a monthly basis.

Thanks again to all those who provided avenues of research.
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace

User avatar
DaveA
GoldLounger
Posts: 2599
Joined: 24 Jan 2010, 15:26
Location: Olympia, WA

Re: Fixing Illegal Characters in Filenames

Post by DaveA »

What names are you having that are "illegal"?
How are they being saved to your hard drive to start with?
I am so far behind, I think I am First :evilgrin:
Genealogy....confusing the dead and annoying the living

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

DaveA wrote:
04 Feb 2021, 16:35
What names are you having that are "illegal"? How are they being saved to your hard drive to start with?
Hi Dave.
The illegal file names for MP3 files commonly occur via a Tubemate download on my Android smart phone. Here is is an example.
I attend a free lunchtime concert (at least I used to in Toronto!)
One of the performed pieces pleases me greatly.
On my smartphone (Android) I use TubeMate to locate and then download a performance of the same piece.
The file comes down as an MP4 video from YouTube.

I then drag the folder "Video" with its files from the smartphone to my Windows Laptop.
No problem is reported by Windows at this stage, and later investigation shows that the original MP4 file with Windows-illegal characters has been copied (by the Explorer drag mechanism) onto the laptop with the illegal character(s) still in place.

On the laptop I use "Free M4a to MP3 Converter" to convert the MP4 video files on the laptop to MP3 files on the laptop.
The MP3 tracks maintain the illegal characters.

I now have what I call "illegal file names" on my laptop.
No problems as yet.

I can play these files in WinAmp, media-player, what have you. The files are backed up by RoboCopy up to my nightly USB external hard drive, and weekly to my weekly USB hard drive.
No problems as yet.

I have managed to drag the files (Mouse in Explorer), process the files with two 3rd-party applications, and make copies with Windows RoboCopy.
Win7 and Win10, since you ask.
No problems as yet.

But when I use VBA in Word2003 to process that folder which contains illegal file names, I can use VBA to build a string array of file names (complete with illegal characters), but when I try to use such filenames in VBA - Rename, Kill, or Open, I am given an error telling me that the file does not exist. I think too that the files are declared not to exist when I try and pass them to WinAmp via a SHELL from VBA.

I call these illegal filenames because they raise an error when I try to use them.

And yet in the one VBA program, Windows/VBA was happy to deliver these illegal names to me, it just won't let me use what it has given to me!

Close inspection shows that the original YouTube MP4 files were uploaded with (usually) European diacritical marks or Asiatic character sets.

On your smart phone try TubeMate and search for "la Campanella".
Untitled2.png
Here is a clumsy montage of three screenshots; the third shows the piece sitting in my B: drive on the laptop with Asiatic characters in the file name. When I try to process this file in VBA I will be told that the file does not exist.
Once I delete the foreign characters from the file name, all runs well.
A common problem with Bach pieces is the use of the symbols of the German character set, and the use of the code 8280 (I think) where a regular hyphen (045) would suffice.

If you hurry over to http://chrisgreaves.com/Downloads/IllegalFileNames.doc you can download a document (which is in draft) describing the situation in more detail.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

DaveA wrote:
04 Feb 2021, 16:35
What names are you having that are "illegal"?How are they being saved to your hard drive to start with?
Hi Dave. As I expand my search folder-tree away from T:\Music I find more illegal file names.
Untitled.png
In this case, the email folders (and as well the save Fullname) of messages that arrive via Gmail. These messages were saved directly from FireFox, for at that time I had not taken up Thunderbird again.
I know that my sister used an iPad, so I suspect that her husband too had such a device. These are emails from Geoff, not from my sister.
My guess is that the illegal characters in the subject lines are "a christmas tree" and "laughing" emoji icons.
Any lounger with both an iPad and a Windows machine could verify my suppositions.

If this is so, then files stored with illegal characters are not a whimsical byproduct of my sitting in free lunchtime concerts in Toronto (grin), but a common occurrence for people who save emails that originate on iPads.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace

User avatar
DaveA
GoldLounger
Posts: 2599
Joined: 24 Jan 2010, 15:26
Location: Olympia, WA

Re: Fixing Illegal Characters in Filenames

Post by DaveA »

Thanks, I am sure others out there are having this same issue.
I do not use any music or videos, so I have not run into this issue.

Since you have isolated this issue from a iPad, then I wonder if these are legal characters on a Mac?

Any Mac users out there?
I am so far behind, I think I am First :evilgrin:
Genealogy....confusing the dead and annoying the living

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

DaveA wrote:
09 Feb 2021, 17:00
Thanks, I am sure others out there are having this same issue.
Dave, I wrote "illegal filenames" as a by-product of checking for duplicated MP3 tracks in the folder-tree T:\Music\. The duplicate-checker would amass a string array of about 18,000 MP3 files from the folder-tree T:\Music\ and almost immediately report "cannot find this file". What? "I built this array not two seconds ago and already 200 files are missing?!!???"

I reran the "illegal filenames" two nights ago on the contents of my data drive T:\ (313,324 files in 31,404 folders), and it turned up 403 illegal filenames (0.13181%).
It was from this list that I spotted the emails which I knew came from someone who (once) had access to an iPad.

However, some of the 403 illegal filenames are Windows Shortcuts and other garden-variety MSWindows files, so at a guess about thirteen of every ten thousand files on your system could well be "illegal filenames" in the sense that if you build a string array of full names in VBA and then try to access each of the files, 13 out of 10,000 will be reported as AWOL.
I am not suggesting that you do this, but I would bet a coffee with cream and sugar on the outcome (grin)

I do not use any music or videos, so I have not run into this issue.
Please see above: I identified the problem while manipulating files with an MP3 extent, but have since found the problem in other types of files.

Since you have isolated this issue from a iPad, then I wonder if these are legal characters on a Mac?
Please see above: my original thoughts a year ago were "uploaded to YouTube by Apple-users", and then two days ago "also emailed to me by an iPad user"

I have attached a partial file of illegal file names, some of which may well exist (as file names!) on your system.

If you (or anyone) wants a copy of the illegal-filenames checker I can make it available.
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

ChrisGreaves wrote:
08 Jan 2020, 15:03
... My problem is RENAMEing the files once found....
A year on and I am now detecting illegal files names in applications that use my FileR application. FileR is a general-purpose application for developers, that produces a string array of file names according to various sampling criteria. This morning it dawned on me (finally!) that illegal file names should be detected as far upstream as possible, so today I am working in FileR.
FileR throws up a GUI form, the user sets various criteria, and FileR gets to work, delivering an array of filenames to the developer's application which is being used by the user.
Illeg003.png
This image is of my Debug Window. I debug.print the strFilename, and then I debug.print the UCASE(strFilename).
Just the one illegal character, right?
Illeg004.png
This image is from Windows File Explorer.
Two illegal characters, right?

It seems to me that VBA is doing its bit to confuse the issue and hiding some aspects of the characters that make up file names.

This file is of particular local interest. Last night I received an email from a member of The Bedford Trio with two links to "https://drive.google.com/drive/folders". I, of course, quickly asked Google Drive to download the 24 MP4 videos, used "Free M4a to MP3 Converter" to convert to 24 MP3 tracks - and here I am.

The files are from Canada, from a Canadian professional musician who knows her stuff. I must assume that the names are typed correctly. These files were NOT uploaded by the traditional teenager somewhere else looking to gain brownie points by volume uploads.

When I wrote FileR I would not have believed how many false paths there are in processing files. And I am still only on audio filers (MP3)

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
An expensive day out: Wallet and Grimace

User avatar
HansV
Administrator
Posts: 78236
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Fixing Illegal Characters in Filenames

Post by HansV »

Recent versions of Windows support Unicode, so I wouldn't think ć and š are "illegal characters".
Best wishes,
Hans

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 15498
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Fixing Illegal Characters in Filenames

Post by ChrisGreaves »

HansV wrote:
03 Mar 2021, 12:50
Recent versions of Windows support Unicode, so I wouldn't think ć and š are "illegal characters".
Hi Hans. I would think so too, but I have not yet resolved why one of the characters drops out when UCase is applied in VBA. Or why the filename does not exist when tested for FileLen(again, in VBA).

Of course, I am still using Office2003/VBA ... but then this problem must have been around almost twenty years ago, No?

My main message here is that there are a great many Traps For Young Players in this game.

Cheers
Chris
An expensive day out: Wallet and Grimace