Rename pdf with internal content

vaxo
4StarLounger
Posts: 432
Joined: 23 Mar 2017, 19:51

Rename pdf with internal content

Post by vaxo »

Hello Friends, I have pdf document whih has name for example: 4437748 I need script so that it will open pdf file and rename this file name to C853. PS I have 1000 pdf files so this pcoccess needs to be done in batch is there some way?
You do not have the required permissions to view the files attached to this post.

User avatar
stuck
Panoramic Lounger
Posts: 8176
Joined: 25 Jan 2010, 09:09
Location: retirement

Re: Rename pdf with internal content

Post by stuck »


vaxo
4StarLounger
Posts: 432
Joined: 23 Mar 2017, 19:51

Re: Rename pdf with internal content

Post by vaxo »

No, as I need to rename files based on internal content not just rename...

User avatar
stuck
Panoramic Lounger
Posts: 8176
Joined: 25 Jan 2010, 09:09
Location: retirement

Re: Rename pdf with internal content

Post by stuck »

In which case, have you not tired a simple search on:
    rename file based on file content

When I do that, the first hit is something from stackoverflow that might help you.

Ken

User avatar
SpeakEasy
4StarLounger
Posts: 550
Joined: 27 Jun 2021, 10:46

Re: Rename pdf with internal content

Post by SpeakEasy »

>have you not tired a simple search

Thing is this is not as simple as you seem to think. The OP is asking about the contents of a PDF, not a simple text file (for which the solution is indeed easy)

I've got an idea I want to test ...

<fx: later> Nah, doesn't work ...

robertocm
Lounger
Posts: 43
Joined: 07 Jun 2023, 15:34

Re: Rename pdf with internal content

Post by robertocm »

I would try to extract text from pdf with some library, for example this:
http://www.xpdfreader.com/about.html (see pdftotext: converts PDF to text)

I made a test with another library called sejda-console:
https://github.com/torakiki/sejda/releases/tag/v2.10.4

Note that last versions are not open source

Below an example script with AutoIt, not finish because sejda-console fails to extract text from your pdf for some issue with fonts

Code: Select all

#include <File.au3>
#include <WinAPIFiles.au3>

Opt("MustDeclareVars", 1)
Opt("TrayIconDebug", 1)

#===== CONFIG =====
Global $iPID, $sOutput = ""
Global $sFileName, $sFileShort, $sFileNameTxt, $aFileList, $aFileList2, $aArray, $bFileOpen
Global $sFolder = @ScriptDir

$aFileList = _FileListToArray($sFolder, "*.pdf", 1, True)
If Not @error Then
	For $i = 1 To $aFileList[0]
		$sFileName = $aFileList[$i]
		;Sedja doesn't like spaces in file names
		$sFileShort = FileGetShortName($sFileName)

		$iPID = Run(@ComSpec & " /C """ & @ScriptDir & "\sejda-console-2.10.4\bin\sejda-console.bat"" extracttext -f " & $sFileShort & " -o " & $sFolder & " -j overwrite", "", @SW_HIDE, $STDERR_CHILD + $STDOUT_CHILD)
		ProcessWaitClose($iPID)
		$sOutput = StdoutRead($iPID)
		ConsoleWrite($sOutput)

		#===== TEXT FILE  =====
		;sFileNameTxt = StringReplace($sFileShort, ".pdf", ".txt")
		$aFileList2 = _FileListToArray($sFolder, "*.txt", 1, True)
		If Not @error Then
			$sFileNameTxt = $aFileList2[1]
		Else
			MsgBox(16, "Error", "No txt files were found in the folder")
			Exit
		EndIf

		$aArray = FileReadToArray(sFileNameTxt)
		If @error Then
			MsgBox(16, "", "Error reading txt file. @error: " & @error) ; An error occurred reading the current script file.
			Exit
		Else
			;File must be close before renaming
			$bFileOpen = _WinAPI_FileInUse($sFileName)
			If $bFileOpen = 0 Then
				;helpfile: AutoIt does not have a 'FileRename' function as you can use FileMove function to rename a file using "Full_Path\Old_Name" and "Fulll_Path\New_Name" as the "source" and dest" parameters.
				;FileMove($sFileName, ...ToDo..., $i_Flag)
			Else
				Consolewrite("Locked file: " & $aFileList[$i] & @CRLF)
			EndIf
			;FileRecycle(sFileNameTxt)
		EndIf
	Next
Else
	MsgBox(16, "Error", "No pdf files were found in the folder")
	Exit
EndIf

robertocm
Lounger
Posts: 43
Joined: 07 Jun 2023, 15:34

Re: Rename pdf with internal content

Post by robertocm »

An example of VBA code to extract text from pdf using pdftotext.exe

tested, ok:

Code: Select all

Sub test_1_ExecAndCapture()
Dim sFolder As String, sFile As String
sFile = "4437748.pdf"
sFolder = ThisWorkbook.Path & Application.PathSeparator
Dim objShell As Object, objCmdExec As Object, cmd As String, CommandOutput As String
'cmd = sFolder & "pdftotext.exe -layout " & Chr(34) & sFolder & sFile & Chr(34) & " " & Replace(sFile, ".pdf", ".txt")
'the hyphen as the last paramater directs output to stdout which we will capture
cmd = Chr(34) & sFolder & "pdftotext.exe" & Chr(34) & " -layout " & Chr(34) & sFolder & sFile & Chr(34) & " -"
Set objShell = CreateObject("WScript.Shell")
Set objCmdExec = objShell.exec(cmd)
CommandOutput = objCmdExec.StdOut.readAll
Debug.Print CommandOutput
End Sub
tested ok:

Code: Select all

Sub test_2_ExecAndCapture()
On Error GoTo ExceptionHandling

Dim sFolder As String, sFile As String, tFile As String, cmd As String, strOutput As String
sFolder = ThisWorkbook.Path & Application.PathSeparator
sFile = "4437748.pdf"
tFile = Replace(sFile, ".pdf", ".txt")
'cmd = sFolder & "pdftotext.exe -layout " & sFolder & sFile & " - > " & sFolder & tFile
cmd = sFolder & "pdftotext.exe -layout " & sFolder & sFile & " " & sFolder & tFile

'http://www.vbforums.com/showthread.php?589966-VBA-Run-Application-Capture-Output&p=4891449&viewfull=1#post4891449
Dim objShell As Object
Set objShell = CreateObject("WScript.Shell")

'Keywords: Wscript exec hidden, Shellscripting
'https://stackoverflow.com/questions/32297699/hide-command-prompt-window-when-using-exec
'You're allways going to get a window flash with Exec().
'You can use Run() instead to execute the command in a hidden window.
'But you can't directly capture the command's output with Run().
'You'd have to redirect the output to a temporary file that your VBScript could then open, read, and delete.
'tFile = objShell.ExpandEnvironmentStrings("%Temp%") & "\t.txt"
'Pass 0 as the second parameter to hide the window
'objShell.Run "cmd.exe /c start /b tasklist.exe > " & tFile, 0, False
objShell.Run "cmd.exe /c " & cmd, 0, False

'https://stackoverflow.com/questions/10279404/vbscript-how-to-make-program-wait-until-process-has-finished#
Do Until Not Dir(sFolder & tFile) = ""
    DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:01"))

With CreateObject("Scripting.FileSystemObject")
    strOutput = .openTextFile(sFolder & tFile).readAll()
    '.DeleteFile tFile
End With

Debug.Print strOutput

CleanUp:
    On Error Resume Next
    Exit Sub
ExceptionHandling:
    MsgBox "Error: " & Err.Description
    Resume CleanUp
    Resume 'for debugging
End Sub

snb
4StarLounger
Posts: 575
Joined: 14 Nov 2012, 16:06

Re: Rename pdf with internal content

Post by snb »

What should be the benefit of this renaming operation ?

User avatar
SpeakEasy
4StarLounger
Posts: 550
Joined: 27 Jun 2021, 10:46

Re: Rename pdf with internal content

Post by SpeakEasy »

E><fx: later> Nah, doesn't work ...

<even later> Actually, it does. It just isn't very fast ...

So, this is VBA expected to be hosted in Word (although it could of course be modified to run in any VBA host, or as VBScript)

Code: Select all

Public Function RenamePDF(strFile As String) As String
    Dim wrdDoc As Document

    Set wrdDoc = Application.Documents.Open(filename:=strFile, ConfirmConversions:=False, AddToRecentFiles:=False) ' sadly this is slow
    RenamePDF = Left(wrdDoc.Paragraphs(8).Range.Text, 4)
    
    wrdDoc.Close False
End Function
This code makes a couple of assumptions

1) That all the PDFs are the same format
2) That the new names is intended to be the first 4 characters of the customs reference

It could be called as follows:

Code: Select all

Public Sub exampledoit()
    Dim filename As String
    filename = "d:\downloads\4437748.pdf"
    
    Application.DisplayAlerts = False
    MsgBox "New name is: " & RenamePDF(filename) & ".pdf"
    Application.DisplayAlerts = True
End Sub

snb
4StarLounger
Posts: 575
Joined: 14 Nov 2012, 16:06

Re: Rename pdf with internal content

Post by snb »

If this works:

Code: Select all

Public Function RenamePDF(strFile As String) As String
    Dim wrdDoc As Document

    Set wrdDoc = Application.Documents.Open(filename:=strFile, ConfirmConversions:=False, AddToRecentFiles:=False) ' sadly this is slow
    RenamePDF = Left(wrdDoc.Paragraphs(8).Range.Text, 4)
    
    wrdDoc.Close False
End Function
This should be faster:

Code: Select all

Function F_RnPDF(c00)
  with getobject(c00)
    c01= Left(.Paragraphs(8).Range.Text, 4)
    .Close 0
  end with

  Name c00 As c01
End Function