Address
304 North Cardinal St.
Dorchester Center, MA 02124

Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM

Python dotx Conversion to docx for Automated Documents

While not exactly security related, I’ve had to do some Python dotx conversion to docx files recently.

Python dotx Conversion – Introduction

I’ve been working on a tool (coming soon?) for automating the my pentest engagement organization.

Part of this tool required me to copy over a .dotx template and save it as a .docx file. The reason for this is that our report templates are .dotx by default, but I wanted to start with a blank .docx for each engagement.

Installing python-docx

First, I installed python-docx.

Rays-MBP:tools doyler$ pip install python-docx
Collecting python-docx
  Downloading python-docx-0.8.6.tar.gz (5.3MB)
    100% |�-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-��-�| 5.3MB 26kB/s
Requirement already satisfied: lxml>=2.3.2 in /usr/local/lib/python2.7/site-packages (from python-docx)
Building wheels for collected packages: python-docx
  Running setup.py bdist_wheel for python-docx ... done
  Stored in directory: /Users/doyler/Library/Caches/pip/wheels/cc/74/10/42b00d7d6a64cf21f194bfef9b94150009ada880f13c5b2ad3
Successfully built python-docx
Installing collected packages: python-docx
Successfully installed python-docx-0.8.6

With this installed, I figured I’d be able to go about implementing it in my script. Unfortunately, python-docx does not yet support dotx files out of the box.

Adding support for dotx files

Based on the above GitHub issue, I needed to make a few simple changes to support dotx files.

First, I added the proper content types to api.py. I added macro enabled templates as well, just in case.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/api.py

...

def Document(docx=None):
    """
    Return a |Document| object loaded from *docx*, where *docx* can be
    either a path to a ``.docx`` file (a string) or a file-like object. If
    *docx* is missing or ``None``, the built-in default document "template"
    is loaded.
    """
    docx = _default_docx_path() if docx is None else docx
    document_part = Package.open(docx).main_document_part
    if document_part.content_type != CT.WML_DOCUMENT_MAIN:
        tmpl = "file '%s' is not a Word file, content type is '%s'"
        raise ValueError(tmpl % (docx, document_part.content_type))
    return document_part.document
    if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, 'application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml', 'application/vnd.ms-word.document.macroEnabled.main+xml']:

Next, I added the DocumentPart for these content types to the PartFactory in init.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/__init__.py

...

PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for['application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml'] = DocumentPart
PartFactory.part_type_for['application/vnd.ms-word.document.macroEnabled.main+xml'] = DocumentPart

Document Creation

With the monkey-patches in place, it was time to write the script to create my document.

This is a very simple excerpt, but it opens my template, sets the content type, and saves the file with the new extension.

from docx import Document
from docx.opc.constants import CONTENT_TYPE as CT
document = Document('appsec/web_application_assessment_report.dotx')
document_part = document.part
document_part._content_type = CT.WML_DOCUMENT_MAIN
document.save('/Users/doyler/Documents/__ENGAGEMENTS/__DEMO.docx')
Rays-MBP:__ENGAGEMENTS doyler$ file __DEMO.docx
__DEMO.docx: Microsoft OOXML

Python dotx Conversion – Conclusion

While it isn’t a huge deal to convert from dotx to docx, that code snippet is making my life easier for now.

It is still not quite ready for release, but here is a screenshot of the output for my newEngagement script.

Python Dotx Conversion - Demo Script

Let me know if you have any ideas or suggestions before I release it! Note that it is still geared towards my specific uses, but is easily modifiable.

11 Comments

  1. I tried this except I tried to convert a .docx to a .dotx. Not working for me (corrupts file) – do you know if there’s a way I can get it to work?

    • Hi Hunter,

      As far as the other direction, you’d have to reverse the content type from main to application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml

  2. Has this update been released? Trying to convert .dotx and getting the error

    ValueError: file ‘test-outpufft.dotx’ is not a Word file, content type is ‘application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml’

    • No, there has been no updates released for the library itself.

      You’ll have to manually update the two files yourself like I mentioned in the post. Let me know if that makes sense or if you have any questions/issues.

  3. I’m getting an error in api.py:

    def _default_docx_path():
    ^
    IndentationError: expected an indented block

    I’m thinking it’s because of the if statement at the bottom of your first block of code. Here’s what I have in my api.py file:

    def Document(docx=None):
    “””
    Return a |Document| object loaded from *docx*, where *docx* can be
    either a path to a “.docx“ file (a string) or a file-like object. If
    *docx* is missing or “None“, the built-in default document “template”
    is loaded.
    “””
    docx = _default_docx_path() if docx is None else docx
    document_part = Package.open(docx).main_document_part
    if document_part.content_type != CT.WML_DOCUMENT_MAIN:
    tmpl = “file ‘%s’ is not a Word file, content type is ‘%s'”
    raise ValueError(tmpl % (docx, document_part.content_type))
    return document_part.document
    if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, ‘application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml’, ‘application/vnd.ms-word.document.macroEnabled.main+xml’]:

    def _default_docx_path():
    “””
    Return the path to the built-in default .docx package.
    “””
    _thisdir = os.path.split(__file__)[0]
    return os.path.join(_thisdir, ‘templates’, ‘default.docx’)

    What am I doing wrong here? Thanks

    • If you are copying and pasting directly from my post, then make sure the spacing/lines are ending up correct.

      That line that begins with if should encompass EVERYTHING until the colon. It looks like you have a spacing/indentation issue somewhere in your code.

  4. It does. The whole if statement up until and including the colon is on one line. The issue is that there’s nothing inside the if statement, so when it gets to def_default_docx_path(): , it sees that it’s not indented ( it expects an indentation since we just did if […]: ). Is there something that’s supposed to be in the if statement?

    • The if statement should be what was modified, and the body of that statement should stay the same in the original file.

      I’ve only posted my modifications, not the file in its entirety.

      • Hey I just wanted to let you know that I got it to work and it does in fact convert .docx to .dotx. Thanks so much for your help!

          • Have you considered trying to submit a pull request to github.com/python-openxml/python-docx ? I tried to but I honestly don’t know what I’m doing, and it didn’t get accepted.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.