While not exactly security related, I’ve had to do some Python dotx conversion to docx files recently.
Python dotx Conversion – Introduction
I’ve been working on a tool (coming soon?) for automating the my pentest engagement organization.
Part of this tool required me to copy over a .dotx template and save it as a .docx file. The reason for this is that our report templates are .dotx by default, but I wanted to start with a blank .docx for each engagement.
First, I installed python-docx.
Rays-MBP:tools doyler$ pip install python-docx Collecting python-docx Downloading python-docx-0.8.6.tar.gz (5.3MB) 100% |████████████████████████████████| 5.3MB 26kB/s Requirement already satisfied: lxml>=2.3.2 in /usr/local/lib/python2.7/site-packages (from python-docx) Building wheels for collected packages: python-docx Running setup.py bdist_wheel for python-docx ... done Stored in directory: /Users/doyler/Library/Caches/pip/wheels/cc/74/10/42b00d7d6a64cf21f194bfef9b94150009ada880f13c5b2ad3 Successfully built python-docx Installing collected packages: python-docx Successfully installed python-docx-0.8.6
With this installed, I figured I’d be able to go about implementing it in my script. Unfortunately, python-docx does not yet support dotx files out of the box.
Adding support for dotx files
Based on the above GitHub issue, I needed to make a few simple changes to support dotx files.
First, I added the proper content types to api.py. I added macro enabled templates as well, just in case.
Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/api.py ... def Document(docx=None): """ Return a |Document| object loaded from *docx*, where *docx* can be either a path to a ``.docx`` file (a string) or a file-like object. If *docx* is missing or ``None``, the built-in default document "template" is loaded. """ docx = _default_docx_path() if docx is None else docx document_part = Package.open(docx).main_document_part if document_part.content_type != CT.WML_DOCUMENT_MAIN: tmpl = "file '%s' is not a Word file, content type is '%s'" raise ValueError(tmpl % (docx, document_part.content_type)) return document_part.document if document_part.content_type not in [CT.WML_DOCUMENT_MAIN, 'application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml', 'application/vnd.ms-word.document.macroEnabled.main+xml']:
Next, I added the DocumentPart for these content types to the PartFactory in init.
Rays-MBP:__ENGAGEMENTS doyler$ vi /usr/local/lib/python2.7/site-packages/docx/__init__.py ... PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart PartFactory.part_type_for[CT.WML_DOCUMENT_MAIN] = DocumentPart PartFactory.part_type_for['application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml'] = DocumentPart PartFactory.part_type_for['application/vnd.ms-word.document.macroEnabled.main+xml'] = DocumentPart
With the monkey-patches in place, it was time to write the script to create my document.
This is a very simple excerpt, but it opens my template, sets the content type, and saves the file with the new extension.
from docx import Document from docx.opc.constants import CONTENT_TYPE as CT document = Document('appsec/web_application_assessment_report.dotx') document_part = document.part document_part._content_type = CT.WML_DOCUMENT_MAIN document.save('/Users/doyler/Documents/__ENGAGEMENTS/__DEMO.docx')
Rays-MBP:__ENGAGEMENTS doyler$ file __DEMO.docx __DEMO.docx: Microsoft OOXML
Python dotx Conversion – Conclusion
While it isn’t a huge deal to convert from dotx to docx, that code snippet is making my life easier for now.
It is still not quite ready for release, but here is a screenshot of the output for my newEngagement script.
Let me know if you have any ideas or suggestions before I release it! Note that it is still geared towards my specific uses, but is easily modifiable.