Unzipping, editing and zipping ODT documents in python

This is a python script created with a single purpose: to test unzipping of OpenOffice.org (LibreOffice) word processor .odt document file, searching in its contents for a certain text and replacing it with a substitute, and, eventually, zipping it all back together to form a new .odt document.

If you copy and past this text, remember, that HTML formatting may spoil the code, so please check it for any introduced mistakes. Especially since this is python with its indentation issues.

 

#!/usr/bin/python
#
# Just a test script
# Demonstrates unzipping, editing and zipping of ODT documents in python
# Source ODT file "in.odt" shall exist in "/tmp"
# If ODT file contains string token, it will be replaced with string replacement
#

import os
import zipfile
import fileinput
rootDirURL="/tmp"
tmpDir="~odt_contents"
tmpDirURL=rootDirURL+"/"+tmpDir
zipSourceFile="in.odt"
zipSourceFileURL=rootDirURL+"/"+zipSourceFile
zipOutFile="out.odt"
zipOutFileURL=rootDirURL+"/"+zipOutFile
xmlFile="content.xml"
xmlFileURL=tmpDirURL+"/"+xmlFile
token="TOKEN01"
replacement="TEST SUCCESSFUL"
#
# Unzip ODT
#
print " -- Extracting ---------------------"
print "%s -> %s" % (zipSourceFileURL, tmpDirURL)
zipdata = zipfile.ZipFile(zipSourceFileURL)
zipdata.extractall(tmpDirURL)
#
# Find and replace tokens
#
print " -- Replacing -------------"
print xmlFileURL
for line in fileinput.input(xmlFileURL, inplace=1):
    print line.replace(token,replacement)
# Zip contents of the temporary directory to ODT
# Use file list from the original archive
# This preserves the file structure in the new Zip file
# The most important is that the "mimetype" is the first file in archive
print " -- Compressing --------------------"
print "%s -> %s" % (tmpDirURL , zipOutFileURL)
with zipfile.ZipFile(zipOutFileURL, 'w') as outzip:
    zipinfos = zipdata.infolist()
    for zipinfo in zipinfos:
        fileName=zipinfo.filename # The name and path as stored in the archive
        fileURL=tmpDirURL+"/"+fileName # The actual name and path
        outzip.write(fileURL,fileName)
Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s