Short course: "Open XML II: Editing Documents in the XML"

User avatar
diegol
StarLounger
Posts: 94
Joined: 28 Jan 2010, 04:15
Location: Buenos Aires, Argentina

Short course: "Open XML II: Editing Documents in the XML"

Post by diegol »

Hi there,

I have read this short course from Microsoft and found useful to understand the basics of editing Open XML documents:

http://office.microsoft.com/en-us/train ... rp10357027" onclick="window.open(this.href);return false;

The post-course, quick-reference card:

http://office.microsoft.com/en-us/train ... mode=print" onclick="window.open(this.href);return false;

The subject is still somewhat tricky for me, so hopefully this post will help me get back on track if I ever need to resource to this in the future.

Those who have a little bit more time may want to look at part I first (which kind of makes sense :grin:):

http://office.microsoft.com/en-us/train ... 43529.aspx" onclick="window.open(this.href);return false;

I know the zipping algorithm used by MS is ZIP (as opposed to RAR or others), but after reading part II and quickly going through part I, I am still unsure which compression setting (normal, fast, best, etc) is the correct one for files to work with Office applications. In other words, which compression settings Office applications use. Any light on this will be appreciated.

Hope you find the courses useful too.

:argentina: Diegol

User avatar
HansV
Administrator
Posts: 78475
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Short course: "Open XML II: Editing Documents in the XML

Post by HansV »

As far as I can tell from Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012 and ZIP File Format, Version 6.2.0 (PKWARE). the Open XML file formats use the DEFLATE compression algorithm.
Best wishes,
Hans

User avatar
diegol
StarLounger
Posts: 94
Joined: 28 Jan 2010, 04:15
Location: Buenos Aires, Argentina

Re: Short course: "Open XML II: Editing Documents in the XML

Post by diegol »

Thank you Hans.

After seeing this list of implementations of the DEFLATE algorithm, I first tried downloading PKZIP, but it's not free.

My second choice: 7-Zip. Two things about 7-Zip:

1. Effectively, the application features an implementation of the DEFLATE algorithm.
However, there are several compression settings (see attached image): Store, Fastest, Fast, Normal, Maximum, Ultra. Anyone knows if any of these should be OK?

2. A curious thing - if you right click on a file with an XLSB extension, or DOCX extension, etc., 7-Zip's integrated shell menu includes an option which is not available for other, "normal", file types, namely, "Open archive". So there's a chance 7-Zip's programming team might have undertaken this kind of investigation and provided a friendly user interface.

About to leave for holidays :cool:, so I guess I'll keep looking when I return.
You do not have the required permissions to view the files attached to this post.

:argentina: Diegol

User avatar
HansV
Administrator
Posts: 78475
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Short course: "Open XML II: Editing Documents in the XML

Post by HansV »

I haven't been able to find information about the compression level, sorry.
Best wishes,
Hans

User avatar
diegol
StarLounger
Posts: 94
Joined: 28 Jan 2010, 04:15
Location: Buenos Aires, Argentina

Re: Short course: "Open XML II: Editing Documents in the XML

Post by diegol »

I could not hold myself.

I did a simple test with a Word file - all compression settings I mentioned in my last post seemed to work OK (Word did not complain on file open), provided the algorithm is "DEFLATE". Even "Store", which has no associated compression algorithm. So the list of good compression settings consists of DEFLATE with any of: Fastest, Fast, Normal, Maximum, Ultra; plus Store. I always left the default value unchanged for the Word size setting (ie, 32).

Then I did the same test for an XLSB file using DEFLATE - Normal. Worked OK.

Then I went on to test the other algorithms, always using Normal compression. None worked, meaning Word did complain on file open. These are: Deflate64, BZip2, LZMA, PPMd.

Something curious: when I did the first test for the Word file, the sequence was:
1. Take a small word file I have in use
2. Extract all content to a folder
3. Edit document.xml (within the "word" folder) to change a phrase
4. Repackage and rename to .docx

All aforementioned good compression settings yielded file sizes of 16 KB (Fastest, Fast, Normal) and 15 KB (Maximum, Ultra). That is, with the exception of Store, which generates a 77 KB output.

However, when I open any of these output documents with Word and save, the resulting filesize is 18 KB. So it seems that word uses none of these compression settings exactly.

If the filesize (or a file hash) had been the same, it would have given some additional peace of mind.

So I did the ultimate test: take the 18 KB file Word generated (with the DOCX extension), right click on it, select "Open archive" from 7-Zip's shell menu. Edited one word in document.xml to make it two characters shorter (so there was no way that filesize would stay at 18 KB if 7-Zip performed compression with any of the settings I mentioned above). Saved document.xml. 7-Zip asked if I wanted to replace document.xml, I said yes. Closed the file. Was still at 18 KB!
Word opened it alright. So this gave me hope that 7-Zip understood it was a Word document and used the same compression setting as Office.

But then I repeated the test, taking the 16 KB file compressed by 7-Zip using DEFLATE-Normal. I would have expected that when I closed the file, its filesize would go up to 18 KB. But it stayed at 16 KB. At this point I became dispirited.

In conclusion, I'll use any of the compression settings that seem to work OK (with DEFLATE-Normal being the favorite as normality soothes even the most restless souls), and may also make use of 7-Zip's right click menu for quick edition. It would be nice to know the exact compression setting (and the software implementing it) used by MS in case a compatibility issue might arise, but for the time being this kind of issue does not seem very likely.

Hans, many thanks for your help.

:argentina: Diegol

User avatar
HansV
Administrator
Posts: 78475
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: Short course: "Open XML II: Editing Documents in the XML

Post by HansV »

Thanks for testing - it does look like the Office Open XML file formats support the DEFLATE algorithm with any of its compression levels, but no other compression algorithms.
Best wishes,
Hans