Python dotx Conversion to docx for Automated Documents

While not exactly security related, I’ve had to do some Python dotx conversion to docx files recently.

Python dotx Conversion – Introduction

I’ve been working on a tool (coming soon?) for automating the my pentest engagement organization.

Part of this tool required me to copy over a .dotx template and save it as a .docx file. The reason for this is that our report templates are .dotx by default, but I wanted to start with a blank .docx for each engagement.

Installing python-docx

First, I installed python-docx.

Rays-MBP:tools doyler$ pip install python-docx
Collecting python-docx
  Downloading python-docx-0.8.6.tar.gz (5.3MB)
    100% |�-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-�| 5.3MB 26kB/s
Requirement already satisfied: lxml>=2.3.2 in /usr/local/lib/python2.7/site-packages (from python-docx)
Building wheels for collected packages: python-docx
  Running setup.py bdist_wheel for python-docx ... done
  Stored in directory: /Users/doyler/Library/Caches/pip/wheels/cc/74/10/42b00d7d6a64cf21f194bfef9b94150009ada880f13c5b2ad3
Successfully built python-docx
Installing collected packages: python-docx
Successfully installed python-docx-0.8.6

With this installed, I figured I’d be able to go about implementing it in my script. Unfortunately, python-docx does not yet support dotx files out of the box.

Adding support for dotx files

Based on the above GitHub issue, I needed to make a few simple changes to support dotx files.

First, I added the proper content types to api.py. I added macro enabled templates as well, just in case.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/api.py

...

def Document(docx=None):
    """
    Return a |Document| object loaded from *docx*, where *docx* can be
    either a path to a ``.docx`` file (a string) or a file-like object. If
    *docx* is missing or ``None``, the built-in default document "template"
    is loaded.
    """
    docx = _default_docx_path() if docx is None else docx
    document_part = Package.open(docx).main_document_part
    if document_part.content_type != CT.WML_DOCUMENT_MAIN:
        tmpl = "file '%s' is not a Word file, content type is '%s'"
        raise ValueError(tmpl % (docx, document_part.content_type))
    return document_part.document
    if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, 'application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml', 'application/vnd.ms-word.document.macroEnabled.main+xml']:

Next, I added the DocumentPart for these content types to the PartFactory in init.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/__init__.py

...

PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for['application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml'] = DocumentPart
PartFactory.part_type_for['application/vnd.ms-word.document.macroEnabled.main+xml'] = DocumentPart

Document Creation

With the monkey-patches in place, it was time to write the script to create my document.

This is a very simple excerpt, but it opens my template, sets the content type, and saves the file with the new extension.

from docx import Document
from docx.opc.constants import CONTENT_TYPE as CT
document = Document('appsec/web_application_assessment_report.dotx')
document_part = document.part
document_part._content_type = CT.WML_DOCUMENT_MAIN
document.save('/Users/doyler/Documents/__ENGAGEMENTS/__DEMO.docx')

Rays-MBP:__ENGAGEMENTS doyler$ file __DEMO.docx
__DEMO.docx: Microsoft OOXML

Python dotx Conversion – Conclusion

While it isn’t a huge deal to convert from dotx to docx, that code snippet is making my life easier for now.

It is still not quite ready for release, but here is a screenshot of the output for my newEngagement script.

Let me know if you have any ideas or suggestions before I release it! Note that it is still geared towards my specific uses, but is easily modifiable.

Ray Doyle

Ray Doyle is an avid pentester/security enthusiast/beer connoisseur who has worked in IT for almost 16 years now. From building machines and the software on them, to breaking into them and tearing it all down; he’s done it all. To show for it, he has obtained an OSCE, OSCP, eCPPT, GXPN, eWPT, eWPTX, SLAE, eMAPT, Security+, ICAgile CP, ITIL v3 Foundation, and even a sabermetrics certification!

He currently serves as a Senior Staff Adversarial Engineer for Avalara, and his previous position was a Principal Penetration Testing Consultant for Secureworks.

This page contains links to products that I may receive compensation from at no additional cost to you. View my Affiliate Disclosure page here. As an Amazon Associate, I earn from qualifying purchases.

11 Comments

Hunter

July 9, 2018 / 2:26 pm Reply

I tried this except I tried to convert a .docx to a .dotx. Not working for me (corrupts file) – do you know if there’s a way I can get it to work?
- doyler
  
  July 10, 2018 / 12:40 pm Reply
  
  Hi Hunter,
  
  As far as the other direction, you’d have to reverse the content type from main to application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml
Hunter

July 10, 2018 / 12:01 pm Reply

Has this update been released? Trying to convert .dotx and getting the error

ValueError: file ‘test-outpufft.dotx’ is not a Word file, content type is ‘application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml’
- doyler
  
  July 10, 2018 / 12:41 pm Reply
  
  No, there has been no updates released for the library itself.
  
  You’ll have to manually update the two files yourself like I mentioned in the post. Let me know if that makes sense or if you have any questions/issues.
Hunter

July 11, 2018 / 12:01 pm Reply

I’m getting an error in api.py:

def _default_docx_path():
^
IndentationError: expected an indented block

I’m thinking it’s because of the if statement at the bottom of your first block of code. Here’s what I have in my api.py file:

def Document(docx=None):
“””
Return a |Document| object loaded from *docx*, where *docx* can be
either a path to a “.docx“ file (a string) or a file-like object. If
*docx* is missing or “None“, the built-in default document “template”
is loaded.
“””
docx = _default_docx_path() if docx is None else docx
document_part = Package.open(docx).main_document_part
if document_part.content_type != CT.WML_DOCUMENT_MAIN:
tmpl = “file ‘%s’ is not a Word file, content type is ‘%s'”
raise ValueError(tmpl % (docx, document_part.content_type))
return document_part.document
if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, ‘application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml’, ‘application/vnd.ms-word.document.macroEnabled.main+xml’]:

def _default_docx_path():
“””
Return the path to the built-in default .docx package.
“””
_thisdir = os.path.split(__file__)[0]
return os.path.join(_thisdir, ‘templates’, ‘default.docx’)

What am I doing wrong here? Thanks
- doyler
  
  July 11, 2018 / 4:09 pm Reply
  
  If you are copying and pasting directly from my post, then make sure the spacing/lines are ending up correct.
  
  That line that begins with if should encompass EVERYTHING until the colon. It looks like you have a spacing/indentation issue somewhere in your code.
Hunter

July 13, 2018 / 1:00 pm Reply

It does. The whole if statement up until and including the colon is on one line. The issue is that there’s nothing inside the if statement, so when it gets to def_default_docx_path(): , it sees that it’s not indented ( it expects an indentation since we just did if […]: ). Is there something that’s supposed to be in the if statement?
- doyler
  
  July 13, 2018 / 3:01 pm Reply
  
  The if statement should be what was modified, and the body of that statement should stay the same in the original file.
  
  I’ve only posted my modifications, not the file in its entirety.
  - Hunter
    
    July 18, 2018 / 11:41 am Reply
    
    Hey I just wanted to let you know that I got it to work and it does in fact convert .docx to .dotx. Thanks so much for your help!
    - doyler
      
      July 18, 2018 / 12:37 pm Reply
      
      Awesome, great to hear and glad to help!
      - Hunter
        
        July 18, 2018 / 1:59 pm
        
        Have you considered trying to submit a pull request to github.com/python-openxml/python-docx ? I tried to but I honestly don’t know what I’m doing, and it didn’t get accepted.