Rename pdf with internal content
-
- 4StarLounger
- Posts: 432
- Joined: 23 Mar 2017, 19:51
Rename pdf with internal content
Hello Friends, I have pdf document whih has name for example: 4437748 I need script so that it will open pdf file and rename this file name to C853. PS I have 1000 pdf files so this pcoccess needs to be done in batch is there some way?
You do not have the required permissions to view the files attached to this post.
-
- Panoramic Lounger
- Posts: 8544
- Joined: 25 Jan 2010, 09:09
- Location: retirement
-
- 4StarLounger
- Posts: 432
- Joined: 23 Mar 2017, 19:51
Re: Rename pdf with internal content
No, as I need to rename files based on internal content not just rename...
-
- Panoramic Lounger
- Posts: 8544
- Joined: 25 Jan 2010, 09:09
- Location: retirement
Re: Rename pdf with internal content
In which case, have you not tired a simple search on:
rename file based on file content
When I do that, the first hit is something from stackoverflow that might help you.
Ken
rename file based on file content
When I do that, the first hit is something from stackoverflow that might help you.
Ken
-
- 5StarLounger
- Posts: 666
- Joined: 27 Jun 2021, 10:46
Re: Rename pdf with internal content
>have you not tired a simple search
Thing is this is not as simple as you seem to think. The OP is asking about the contents of a PDF, not a simple text file (for which the solution is indeed easy)
I've got an idea I want to test ...
<fx: later> Nah, doesn't work ...
Thing is this is not as simple as you seem to think. The OP is asking about the contents of a PDF, not a simple text file (for which the solution is indeed easy)
I've got an idea I want to test ...
<fx: later> Nah, doesn't work ...
-
- Lounger
- Posts: 43
- Joined: 07 Jun 2023, 15:34
Re: Rename pdf with internal content
I would try to extract text from pdf with some library, for example this:
http://www.xpdfreader.com/about.html (see pdftotext: converts PDF to text)
I made a test with another library called sejda-console:
https://github.com/torakiki/sejda/releases/tag/v2.10.4
Note that last versions are not open source
Below an example script with AutoIt, not finish because sejda-console fails to extract text from your pdf for some issue with fonts
http://www.xpdfreader.com/about.html (see pdftotext: converts PDF to text)
I made a test with another library called sejda-console:
https://github.com/torakiki/sejda/releases/tag/v2.10.4
Note that last versions are not open source
Below an example script with AutoIt, not finish because sejda-console fails to extract text from your pdf for some issue with fonts
Code: Select all
#include <File.au3>
#include <WinAPIFiles.au3>
Opt("MustDeclareVars", 1)
Opt("TrayIconDebug", 1)
#===== CONFIG =====
Global $iPID, $sOutput = ""
Global $sFileName, $sFileShort, $sFileNameTxt, $aFileList, $aFileList2, $aArray, $bFileOpen
Global $sFolder = @ScriptDir
$aFileList = _FileListToArray($sFolder, "*.pdf", 1, True)
If Not @error Then
For $i = 1 To $aFileList[0]
$sFileName = $aFileList[$i]
;Sedja doesn't like spaces in file names
$sFileShort = FileGetShortName($sFileName)
$iPID = Run(@ComSpec & " /C """ & @ScriptDir & "\sejda-console-2.10.4\bin\sejda-console.bat"" extracttext -f " & $sFileShort & " -o " & $sFolder & " -j overwrite", "", @SW_HIDE, $STDERR_CHILD + $STDOUT_CHILD)
ProcessWaitClose($iPID)
$sOutput = StdoutRead($iPID)
ConsoleWrite($sOutput)
#===== TEXT FILE =====
;sFileNameTxt = StringReplace($sFileShort, ".pdf", ".txt")
$aFileList2 = _FileListToArray($sFolder, "*.txt", 1, True)
If Not @error Then
$sFileNameTxt = $aFileList2[1]
Else
MsgBox(16, "Error", "No txt files were found in the folder")
Exit
EndIf
$aArray = FileReadToArray(sFileNameTxt)
If @error Then
MsgBox(16, "", "Error reading txt file. @error: " & @error) ; An error occurred reading the current script file.
Exit
Else
;File must be close before renaming
$bFileOpen = _WinAPI_FileInUse($sFileName)
If $bFileOpen = 0 Then
;helpfile: AutoIt does not have a 'FileRename' function as you can use FileMove function to rename a file using "Full_Path\Old_Name" and "Fulll_Path\New_Name" as the "source" and dest" parameters.
;FileMove($sFileName, ...ToDo..., $i_Flag)
Else
Consolewrite("Locked file: " & $aFileList[$i] & @CRLF)
EndIf
;FileRecycle(sFileNameTxt)
EndIf
Next
Else
MsgBox(16, "Error", "No pdf files were found in the folder")
Exit
EndIf
-
- Lounger
- Posts: 43
- Joined: 07 Jun 2023, 15:34
Re: Rename pdf with internal content
An example of VBA code to extract text from pdf using pdftotext.exe
tested, ok:
tested ok:
tested, ok:
Code: Select all
Sub test_1_ExecAndCapture()
Dim sFolder As String, sFile As String
sFile = "4437748.pdf"
sFolder = ThisWorkbook.Path & Application.PathSeparator
Dim objShell As Object, objCmdExec As Object, cmd As String, CommandOutput As String
'cmd = sFolder & "pdftotext.exe -layout " & Chr(34) & sFolder & sFile & Chr(34) & " " & Replace(sFile, ".pdf", ".txt")
'the hyphen as the last paramater directs output to stdout which we will capture
cmd = Chr(34) & sFolder & "pdftotext.exe" & Chr(34) & " -layout " & Chr(34) & sFolder & sFile & Chr(34) & " -"
Set objShell = CreateObject("WScript.Shell")
Set objCmdExec = objShell.exec(cmd)
CommandOutput = objCmdExec.StdOut.readAll
Debug.Print CommandOutput
End Sub
Code: Select all
Sub test_2_ExecAndCapture()
On Error GoTo ExceptionHandling
Dim sFolder As String, sFile As String, tFile As String, cmd As String, strOutput As String
sFolder = ThisWorkbook.Path & Application.PathSeparator
sFile = "4437748.pdf"
tFile = Replace(sFile, ".pdf", ".txt")
'cmd = sFolder & "pdftotext.exe -layout " & sFolder & sFile & " - > " & sFolder & tFile
cmd = sFolder & "pdftotext.exe -layout " & sFolder & sFile & " " & sFolder & tFile
'http://www.vbforums.com/showthread.php?589966-VBA-Run-Application-Capture-Output&p=4891449&viewfull=1#post4891449
Dim objShell As Object
Set objShell = CreateObject("WScript.Shell")
'Keywords: Wscript exec hidden, Shellscripting
'https://stackoverflow.com/questions/32297699/hide-command-prompt-window-when-using-exec
'You're allways going to get a window flash with Exec().
'You can use Run() instead to execute the command in a hidden window.
'But you can't directly capture the command's output with Run().
'You'd have to redirect the output to a temporary file that your VBScript could then open, read, and delete.
'tFile = objShell.ExpandEnvironmentStrings("%Temp%") & "\t.txt"
'Pass 0 as the second parameter to hide the window
'objShell.Run "cmd.exe /c start /b tasklist.exe > " & tFile, 0, False
objShell.Run "cmd.exe /c " & cmd, 0, False
'https://stackoverflow.com/questions/10279404/vbscript-how-to-make-program-wait-until-process-has-finished#
Do Until Not Dir(sFolder & tFile) = ""
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))
With CreateObject("Scripting.FileSystemObject")
strOutput = .openTextFile(sFolder & tFile).readAll()
'.DeleteFile tFile
End With
Debug.Print strOutput
CleanUp:
On Error Resume Next
Exit Sub
ExceptionHandling:
MsgBox "Error: " & Err.Description
Resume CleanUp
Resume 'for debugging
End Sub
-
- 5StarLounger
- Posts: 619
- Joined: 14 Nov 2012, 16:06
Re: Rename pdf with internal content
What should be the benefit of this renaming operation ?
-
- 5StarLounger
- Posts: 666
- Joined: 27 Jun 2021, 10:46
Re: Rename pdf with internal content
E><fx: later> Nah, doesn't work ...
<even later> Actually, it does. It just isn't very fast ...
So, this is VBA expected to be hosted in Word (although it could of course be modified to run in any VBA host, or as VBScript)
This code makes a couple of assumptions
1) That all the PDFs are the same format
2) That the new names is intended to be the first 4 characters of the customs reference
It could be called as follows:
<even later> Actually, it does. It just isn't very fast ...
So, this is VBA expected to be hosted in Word (although it could of course be modified to run in any VBA host, or as VBScript)
Code: Select all
Public Function RenamePDF(strFile As String) As String
Dim wrdDoc As Document
Set wrdDoc = Application.Documents.Open(filename:=strFile, ConfirmConversions:=False, AddToRecentFiles:=False) ' sadly this is slow
RenamePDF = Left(wrdDoc.Paragraphs(8).Range.Text, 4)
wrdDoc.Close False
End Function
1) That all the PDFs are the same format
2) That the new names is intended to be the first 4 characters of the customs reference
It could be called as follows:
Code: Select all
Public Sub exampledoit()
Dim filename As String
filename = "d:\downloads\4437748.pdf"
Application.DisplayAlerts = False
MsgBox "New name is: " & RenamePDF(filename) & ".pdf"
Application.DisplayAlerts = True
End Sub
-
- 5StarLounger
- Posts: 619
- Joined: 14 Nov 2012, 16:06
Re: Rename pdf with internal content
If this works:
This should be faster:
Code: Select all
Public Function RenamePDF(strFile As String) As String
Dim wrdDoc As Document
Set wrdDoc = Application.Documents.Open(filename:=strFile, ConfirmConversions:=False, AddToRecentFiles:=False) ' sadly this is slow
RenamePDF = Left(wrdDoc.Paragraphs(8).Range.Text, 4)
wrdDoc.Close False
End Function
Code: Select all
Function F_RnPDF(c00)
with getobject(c00)
c01= Left(.Paragraphs(8).Range.Text, 4)
.Close 0
end with
Name c00 As c01
End Function