Python dotx Conversion to docx for Automated Documents

While not exactly security related, I’ve had to do some Python dotx conversion to docx files recently.

Python dotx Conversion – Introduction

I’ve been working on a tool (coming soon?) for automating the my pentest engagement organization.

Part of this tool required me to copy over a .dotx template and save it as a .docx file. The reason for this is that our report templates are .dotx by default, but I wanted to start with a blank .docx for each engagement.

Installing python-docx

First, I installed python-docx.

Rays-MBP:tools doyler$ pip install python-docx
Collecting python-docx
  Downloading python-docx-0.8.6.tar.gz (5.3MB)
    100% |████████████████████████████████| 5.3MB 26kB/s
Requirement already satisfied: lxml>=2.3.2 in /usr/local/lib/python2.7/site-packages (from python-docx)
Building wheels for collected packages: python-docx
  Running setup.py bdist_wheel for python-docx ... done
  Stored in directory: /Users/doyler/Library/Caches/pip/wheels/cc/74/10/42b00d7d6a64cf21f194bfef9b94150009ada880f13c5b2ad3
Successfully built python-docx
Installing collected packages: python-docx
Successfully installed python-docx-0.8.6

With this installed, I figured I’d be able to go about implementing it in my script. Unfortunately, python-docx does not yet support dotx files out of the box.

Adding support for dotx files

Based on the above GitHub issue, I needed to make a few simple changes to support dotx files.

First, I added the proper content types to api.py. I added macro enabled templates as well, just in case.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/api.py

...

def Document(docx=None):
    """
    Return a |Document| object loaded from *docx*, where *docx* can be
    either a path to a ``.docx`` file (a string) or a file-like object. If
    *docx* is missing or ``None``, the built-in default document "template"
    is loaded.
    """
    docx = _default_docx_path() if docx is None else docx
    document_part = Package.open(docx).main_document_part
    if document_part.content_type != CT.WML_DOCUMENT_MAIN:
        tmpl = "file '%s' is not a Word file, content type is '%s'"
        raise ValueError(tmpl % (docx, document_part.content_type))
    return document_part.document
    if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, 'application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml', 'application/vnd.ms-word.document.macroEnabled.main+xml']:

Next, I added the DocumentPart for these content types to the PartFactory in init.

Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/__init__.py

...

PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart
PartFactory.part_type_for['application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml'] = DocumentPart
PartFactory.part_type_for['application/vnd.ms-word.document.macroEnabled.main+xml'] = DocumentPart

Document Creation

With the monkey-patches in place, it was time to write the script to create my document.

This is a very simple excerpt, but it opens my template, sets the content type, and saves the file with the new extension.

from docx import Document
from docx.opc.constants import CONTENT_TYPE as CT
document = Document('appsec/web_application_assessment_report.dotx')
document_part = document.part
document_part._content_type = CT.WML_DOCUMENT_MAIN
document.save('/Users/doyler/Documents/__ENGAGEMENTS/__DEMO.docx')
Rays-MBP:__ENGAGEMENTS doyler$ file __DEMO.docx
__DEMO.docx: Microsoft OOXML

Python dotx Conversion – Conclusion

While it isn’t a huge deal to convert from dotx to docx, that code snippet is making my life easier for now.

It is still not quite ready for release, but here is a screenshot of the output for my newEngagement script.

Python Dotx Conversion - Demo Script

Let me know if you have any ideas or suggestions before I release it! Note that it is still geared towards my specific uses, but is easily modifiable.

doyler on Githubdoyler on Twitter
doyler
Ray Doyle is an avid pentester/security enthusiast/beer connoisseur who has worked in IT for almost 16 years now. From building machines and the software on them, to breaking into them and tearing it all down; he's done it all. To show for it, he has obtained an OSCP, eCPPT, eWPT, eWPTX, eMAPT, Security+, ICAgile CP, ITIL v3 Foundation, and even a sabermetrics certification!

He currently serves as a Senior Penetration Testing Consultant for Secureworks. His previous position was a Senior Penetration Tester for a major financial institution.

When he's not figuring out what cert to get next (currently GXPN) or side project to work on, he enjoys playing video games, traveling, and watching sports.

Leave a Comment

Filed under Security Not Included

Leave a Reply

Your email address will not be published. Required fields are marked *

*