Skip to content

MarkupCompatibilityProcessSettings corrupt document for ligatures set in document text #2086

@fsberuk

Description

@fsberuk

Describe the bug
Opening a Docx-Document with OpenSettings using MarkupCompatibilityProcessSettings corrupts the document on saving if:

  • the document has a ligature setting (w14:ligatures) in the document text
  • the FileFormatVersions of the MarkupCompatibilityProcessSettings is set to a version that supports ligatures (Office2013 and above)
  • the MarkupCompatibilityProcessMode is set to ProcessAllParts

No further operation is needed on the document, just opening and save.

In the original document the document.xml contains a ligature setting (e.g. <w14:ligatures w14:val="standardContextual"/>).
In the document.xml of the saved document the ligature setting has been changed to <w14:ligatures/> which lost its required value attribute and is therefore malformatted.
See Open Specifications - 2.6.1.17 ligatures

To Reproduce
LigaturesHistorical.docx

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;

namespace BreakLigatures
{
    internal class Program
    {
        static void Main(string[] args)
        {
            var inputFullFilePath = args.Length > 0 ? args[0] : throw new ArgumentException("Input file path is required as the first argument.");
            var outputFullFilePath = args.Length > 1 ? args[1] : throw new ArgumentException("Output file path is required as the second argument.");

            using var documentStream = new MemoryStream(File.ReadAllBytes(inputFullFilePath));

            var openSettings = new OpenSettings()
            {
                MarkupCompatibilityProcessSettings = new MarkupCompatibilityProcessSettings(MarkupCompatibilityProcessMode.ProcessAllParts, FileFormatVersions.Office2013),
            };

            using var document = WordprocessingDocument.Open(documentStream, true, openSettings);
            {
                // do nothing, just save
                document.Save();
            }

            File.WriteAllBytes(outputFullFilePath, documentStream.ToArray());
        }
    }
}

Steps to reproduce the behavior:

  1. Use attached LigaturesHistorical.docx and continue at 8 or proceed to generate own document
  2. Open MS Word and create a new blank document
  3. Insert some text
  4. Select some text
  5. Right click selected text and select "Font..." from the menu
  6. In tab "Advanced" set "Ligatures" setting to a different value
  7. Save document
  8. Run attached program with the document
  9. Open resulting document in MS Word
  10. See error "Word found unreadable content..."

Observed behavior
Opening a document with a ligature setting using MarkupCompatibilityProcessSettings and saving the document produces an invalid document.

Expected behavior
Opening a document with a ligature setting using MarkupCompatibilityProcessSettings and saving the document should produce a valid document.
If the file format version supports the feature the ligature setting should be retained as is.

Desktop (please complete the following information):

  • OS: Windows 10/11
  • Office version 16.0.19929.20136
  • .NET Target: .Net Core 8
  • DocumentFormat.OpenXml Version: 3.3.0 .. 3.5.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions