Word z HTML v Jave

Tento krátky článok ukazuje ako pomocou knižnice docx4j vytvoriť dokument typu docx (Word).

Potrebné knižnice môžeme nájsť na webe Docx4Java.org alebo v centrálnom repozitári mavenu (tuto cestu som si zvolil aj ja).

Maven dependency:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j</artifactId>
    <version>6.0.1</version>
    <type>jar</type>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-ImportXHTML</artifactId>
    <version>6.0.1</version>
</dependency>

Vzorové použitie

Vytvorenie docx dokumentu s jednou tabuľkou:

package net.adamjak.tomas.htmltodocx;

import java.io.File;
import javax.xml.bind.JAXBException;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart;

public class Main {

    private static final String HTML_CODE = "<html><head></head><body><h2>HTML Table</h2><table><tr><th>Company</th><th>Contact</th><th>Country</th></tr><tr><td>Alfreds Futterkiste</td><td>Maria Anders</td><td>Germany</td></tr><tr><td>Centro comercial Moctezuma</td><td>Francisco Chang</td><td>Mexico</td></tr><tr><td>Ernst Handel</td><td>Roland Mendel</td><td>Austria</td></tr><tr><td>Island Trading</td><td>Helen Bennett</td><td>UK</td></tr><tr><td>Laughing Bacchus Winecellars</td><td>Yoshi Tannamuri</td><td>Canada</td></tr><tr><td>Magazzini Alimentari Riuniti</td><td>Giovanni Rovelli</td><td>Italy</td></tr></table></body></html>";
    private static final String USER_DIR = System.getProperty("user.dir");
    private static final File OUTPUT_FILE = new File(USER_DIR + "/output.docx");

    public static void main(String[] args) throws Docx4JException, JAXBException {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();

        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();

        XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);

        wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(HTML_CODE, null));
        wordMLPackage.save(OUTPUT_FILE);
    }
}

Zdroje:

[1] https://www.docx4java.org/trac/docx4j
[2] https://github.com/plutext/docx4j-ImportXHTML/blob/master/src/samples/java/org/docx4j/samples/ConvertInXHTMLDocument.java