Win32com使用Acrobat Pro将PDF保存为XML X> com_error“-2147467263,'Not implemented'”

  • Python 2.7(r27:82525,Jul 4 2010,09:01:59)[MSC v.1500 32 bit(Intel)] on win32
  • Windows XP SP3
  • Python 2.7 pywin32-218
  • Adobe Acrobat X 10.0.0

我想使用Python来自动化Acrobat Pro将PDF导出到XML。 我已经使用正在运行的程序中的“另存为”对话框手动尝试过,现在想通过Python脚本来完成。 我已经阅读了许多页面,包括Adobe SDK,SDK论坛,VB论坛的一部分,我没有运气。

我在这里读到Blish的问题: 使用pywin32来控制Adobe Acrobat时出现“未执行”exception

和这个网页:timgolden python / win32_how_do_i / generate-a-static-com-proxy.html

我错过了一些东西。 我的代码是:

import win32com.client import win32com.client.makepy win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat') adobe = win32com.client.DispatchEx('AcroExch.App') avDoc = win32com.client.DispatchEx('AcroExch.AVDoc') avDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf', 'C:\Documents and Settings\PC\Desktop') pdDoc = avDoc.GetPDDoc() jObject = pdDoc.GetJSObject() jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00") 

完整的错误是:

 Traceback (most recent call last): File "<pyshell#31>", line 1, in <module> jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00") File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 511, in __getattr__ ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1) com_error: (-2147467263, 'Not implemented', None, None) 

我猜它与make.py有关,但我不明白如何在我的代码中实现它。

我从我的代码中拉出了这一行,并在运行时得到了相同的错误:

 win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat') 

然后,我将这两行从“DispatchEX”更改为“Dispatch”,同样的错误:

 adobe = win32com.client.Dispatch('AcroExch.App') avDoc = win32com.client.Dispatch('AcroExch.AVDoc') 

当我自己运行调度,然后给他们回电,我得到:

 >>> adobe = win32com.client.DispatchEx('AcroExch.App') >>> adobe <win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroApp instance at 0x18787784> >>> avDoc = win32com.client.Dispatch('AcroExch.AVDoc') >>> avDoc <win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroAVDoc instance at 0x20365224> 

这是否意味着我只能打一个电话给Dispatch? 我拉了:

 adobe = win32com.client.Dispatch('AcroExch.App') 

并得到了同样的错误。

这个Adobe网站说:

 AVDoc Product availability: Acrobat, Reader Platform availability: Macintosh, Windows, UNIX Syntax typedef struct _t_AVDoc* AVDoc; A view of a PDF document in a window. There is one AVDoc per displayed document. Unlike a PDDoc, an AVDoc has a window associated with it. 
  • acrobat_sdk / 9.1 / Acrobat9_1_HTMLHelp / API_References / Acrobat_API_Reference / AV_Layer / AVDoc.html#AVDocSaveParams

PDDoc页面说:

 A PDDoc object represents a PDF document. There is a correspondence between a PDDoc and an ASFile. Also, every AVDoc has an associated PDDoc, although a PDDoc may not be associated with an AVDoc. 
  • /9.1/Acrobat9_1_HTMLHelp/API_References/Acrobat_API_Reference/PD_Layer/PDDoc.html

我尝试了下面的代码,也得到了同样的错误:

 import win32com.client import win32com.client.makepy pdDoc = win32com.client.Dispatch('AcroExch.PDDoc') pdDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf') jObject = pdDoc.GetJSObject() jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00") 

同样的错误,如果我改变:

 pdDoc = win32com.client.Dispatch('AcroExch.PDDoc') 

 pdDoc = win32com.client.gencache.EnsureDispatch('AcroExch.PDDoc') 

像这里: win32com.client.Dispatch工程,但不是win32com.client.gencache.EnsureDispatch

user2993272,你差不多在这里了:只需要一行代码,你所拥有的代码应该完美无缺地工作。

我将尽力回答您的问题,并尽可能详细地提供您的问题。

这个线程保持你正在寻找的解决方案的关键: https : //mail.python.org/pipermail/python-win32/2002-March/000260.html

我承认这个帖子并不是最容易找到的(也许Google会根据内容的年龄来评分它)。

具体来说,应用这条建议将为您运行: https : //mail.python.org/pipermail/python-win32/2002-March/000265.html

为了完整起见,这段代码应该可以完成工作,而不需要手动修补dynamic.py(snippet应该可以运行在开箱即用的状态):

 # gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION from win32com.client import Dispatch from win32com.client.dynamic import ERRORS_BAD_CONTEXT import winerror # try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk try: from scandir import walk except ImportError: from os import walk import fnmatch import sys import os ROOT_INPUT_PATH = None ROOT_OUTPUT_PATH = None INPUT_FILE_EXTENSION = "*.pdf" OUTPUT_FILE_EXTENSION = ".txt" def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext): avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat # Open the input file (as a pdf) ret = avDoc.Open(f_path, f_path) assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise? pdDoc = avDoc.GetPDDoc() dst = os.path.join(f_path_out, ''.join((f_basename, f_ext))) # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference" jsObject = pdDoc.GetJSObject() # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml" jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext") pdDoc.Close() avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs) del pdDoc if __name__ == "__main__": assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension> #$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt' ROOT_INPUT_PATH = sys.argv[1] INPUT_FILE_EXTENSION = sys.argv[2] ROOT_OUTPUT_PATH = sys.argv[3] OUTPUT_FILE_EXTENSION = sys.argv[4] # tuples are of schema (path_to_file, filename) matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION)) # Magic piece of code that should get everything working for you! # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html global ERRORS_BAD_CONTEXT ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL) for filename_with_path, filename_without_extension in matching_files: print "Processing '{}'".format(filename_without_extension) acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION) 

我已经在WinPython x64 2.7.6.3,Acrobat X Pro上测试过了