Monday, August 31, 2015

Inspecting ODF round trips for attribute retention

Given an office application one might like to know which attributes are preserved properly across a load and save cycle. For example, is the background color or margin size mutated just by loading and saving an ODF file with OfficeAppFoo version 0.1.

The odfautotests project includes many tests on simple ODF documents to see how well each office application preserves the information in the document. Though testing ODF attribute preservation might not be as simple as one might first imagine. Consider the below document with a single paragraph using a custom style:

<office:text>
  <text:p text:style-name="style">hello world</text:p>
</office:text>

In the styles.xml file one might see something like the following:

<style:style 
  style:display-name="TestStyle" 
  style:family="paragraph" 
  style:name="style" 
  style:parent-style-name="standard">
     <style:text-properties fo:background-color="transparent" />

</style:style>

This input is obviously designed to see how well the fo:background-color style information is preserved by office applications. One thing to notice is that the style:family attribute in the above is paragraph.

If one loads and saves a document with the above fragments in it using LibreOffice 4.3.x then they might see something like the following in the output ODF file. In content.xml:

<text:p text:style-name="TestStyle">hello world</text:p>

And in the styles.xml file the background-color attribute is preserved:

<style:style style:name="TestStyle"
     style:family="paragraph"
     style:parent-style-name="standard">
      <style:text-properties fo:background-color="transparent"/>
</style:style>

One can test if the attribute has been preserved using XPath selecting on the @style-name of the text:p and then making sure that the matching style:style has the desired fo:background-color sub attribute.

The XPath might look something like the below, which has been formatted for display:

//s:style[
  @s:display-name='TestStyle' 
  or (not(@s:display-name) and @s:name='TestStyle')]
/s:text-properties/@fo:background-color

Performing the load and save using Word 2016 is quite interesting. The resulting content.xml file might have:

<style:style style:name="P1"
   style:parent-style-name="TestStyle"
   style:master-page-name="MP0"
   style:family="paragraph">
     <style:paragraph-properties fo:break-before="page"/>
</style:style>
...
<office:text text:use-soft-page-breaks="true">
  <text:p text:style-name="P1">hello world</text:p>
</office:text>

and in styles.xml the background-color setting is pushed up to the paragraph style level.

<style:style style:name="TestStyle"
   style:display-name="TestStyle"
   style:family="paragraph">
      <style:text-properties fo:hyphenate="false"/>
</style:style>

<style:default-style style:family="paragraph">
...
<style:text-properties ... fo:background-color="transparent"

So to see if the output ODF has the fo:background-color setting one has to consider not just the directly used style "P1" but also parent style elements which might contain the attribute instead. In this case it was pushed right up to the paragraph style.

For the Word output the above XPath doesn't necessarily work. If the attribute we are looking for has been pushed up to paragraph then we should look for it there instead. Also, if we are looking at the paragraph level then we need to be sure that there is no attribute directly at the lower, TestStyle, level. Also it helps to ensure in the selection that the paragraph is really a parent of the TestStyle, or P1 in the above.

After a bit of pondering I found an interesting solution that can evaluate using plain XPath1.0. To test the value I pick off the fo:background-color from both the TestStyle and also the paragraph level. If those values are passed to concat() then, if the attribute is only at the TestStyle or paragraph level we get something that can be used to test the value. If the attribute appears at both levels are are in trouble.

For example:

<style:style style:name="TestStyle"
<style:text-properties ... fo:background-color="transparent"  />
</style:style>
<style:default-style style:family="paragraph">
<style:text-properties ... fo:background-color="#FF0000"/>
</style:style>

Considering the semantic XPath query of concat( TestStyle/@fo:background-color, paragraph/@fo:background-color ) the result would be  transparent#FF0000 which would not match a string comparison with 'transparent'.

The trick is to use an array selector on the second item in the concat() call. If we only return the paragraph/@fo:background-color value if there is no value associated with the TestStyle then the concat will effectively only return one or the other (directly on TestStyle or nothing on TestStyle and the attribute from paragraph).

With this the query can allow the office application to move the information to a parent of the style and still match for a test.