...
==== Document properties ====
Microsoft Office HTML Example
* **size**: Page size.
* **margin** : Document margins.
* **mso-page-orientation**: portrait or landscape.
==== Page declaration ====
You are supposed to put pages (or group of pages) in a "section" (in a ''DIV''), like this:
Microsoft Office HTML Example
I'm page 1.
I'm page 2.
* **@page** is used to set properties of the whole document.
* Each **@page SectionX** can be used to change the properties of a group of pages.
* Each page or group of pages must be put in a (such as in our example. Feel free to create as many 'sections' as needed.
Caveat: Changing things like page orientation for a group or a specific page does not seem to work. FIXME
==== Standard HTML/CSS elements ====
Word will accept most standard HTML and CSS features, such as headings (h1,h2,h3...), lists (ul,li), tables, colors... Go experiment yourself. Here are some examples:
Microsoft Office HTML Example
Title level 1
Title level 2
Title level 3
Text in level 3
2nd title level 2
Another level 3 title
List:
- element 1
- element 2
- element 3
- element 4
- element 5
- element 6
- element 7
- element 8
- element 9
- element 10
Column A Column B Column C
A1 B1 C1
A2 B2 Test with looooong text: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla sed sapien
ac tortor porttitor lobortis. Donec velit urna, vulputate eu egestas eu, lacinia non dolor. Cras lacus diam, tempus
sed ullamcorper a, euismod id nunc. Fusce egestas velit sed est fermentum tempus. Duis sapien dui, consectetur eu
accumsan id, tristique sit amet ante. C2
A3 B3 C3
Rendering in Word:
{{ :wordgen:wordgen_basic_html.png }}
Note that these elements can later be styled, either using inline CSS ('''') or using CSS stylesheets (eg. you can style all h1 elements).
==== Forcing display mode on opening ====
Microsoft Office HTML Example
...
This will force the "Page" display mode when the file is opened. This section **must** be put just after the ''title'', otherwise it will not work.
You can use 80 or 90 for Zoom if you want two pages to fit on screen.
==== Page break ====
==== Tables and pagebreaks ====
=== Prevent a table cell from spanning over multiple pages ===
Put in your stylesheet:
td { page-break-inside:avoid; }
or apply to only specific cells:
...
=== Prevent tables from spanning over multiple pages ===
Put in your stylesheet:
tr { page-break-after:avoid; }
or apply to all TR of a table:
...
==== A note about computed field ====
Computed fields include TOC (table of content), page refences and so on.
With Word, when you open a document, all computed fields are **not** updated by default. This has to be done manually by typing CTRL+A (to select the whole document) then press F9.
Thus, all computed fields you insert in your document will not show up unless the user manually updates them. This a problem related to Word itself. There is not simple solution to this.
==== TOC (Table of content) ====
Table of content - Please right-clic and choose "Update fields".
As you can't predict the page numbers, the TOC needs to be manually updated by the user (Not a heavy burden, and I guess it's possible to automate this by including a script in the file. I'll investigate that later.)
TOC before update (upon file opening):
{{ :wordgen:wordgen_toc1.png }}
TOC after update: It reflects the different heading levels (h1,h2,h3...).
{{ :wordgen:wordgen_toc2.png }}
If you want to customize the TOC, have a look at the [[http://office.microsoft.com/en-us/word-help/field-codes-toc-table-of-contents-field-HP005186201.aspx|Microsoft documentation]] about this dynamic field.
==== Bookmarks and references ====
You can reference another chapter or page rather easily. Here is an example: Set a bookmark in a document, and display the page where this bookmark is located.
Bookmarks: Simply put a html anchor:
Appendix
Then the reference:
For more information, see appendix at page
==== Header and footer ====
Headers and footers must be put in a separate file, in a subdirectory. Example:
* **''mydocument.htm''** : The main document
* **''mydocument_files\headerfooter.htm''** : The header and footer.
__Note__: It is important that the subdirectory name starts with the main document name (**mydocument**.htm -> **mydocument**_files), otherwise Word will display a warning.
Microsoft Office HTML Example
I'm page 1.
I'm page 2.
Note that the file ''filelist.xml'' does //not// need to be present, but its declaration in the main document is mandatory.
Header
After opening the document, don't forget to go in "Page" display mode to see headers/footers.
This is what you get:
{{ :wordgen:wordgen_headerfooter.png }}
==== Images ====
As for the Header/Footer, images must be put in a subdirectory. Then you just use the standard ''
'' html tag. Example:
* ''mydocument.htm'' : The main document
* ''mydocument_files/logo_google.png'' : The image to include.
Microsoft Office HTML Example
Here is an image:
Result in Word:
{{ :wordgen:wordgen_image.png }}
==== Styling ====
You can include a CSS stylesheet in the main html file: Word will use it.
You can style standard HTML elements (''h1,h2,h3...''), but you can also apply styles with the ''class'' attribute.
==== Exploring other Word documents features ====
If you are trying to find the HTML code corresponding to a Word feature, here's my advice:
* Create a blank document in Word.
* Type a few words and use the feature you need.
* Save as HTML page (Save as => Other format => Web page (*.htm,*.html) (//**NOT**// filtered))
Then open with you favorite text editor. You are most likely to find the relevant HTML/CSS code.
===== Creating a MIME (mhtml) file =====
Once you have created your html document with its associated files (headers/footer, images...), you need to pack them in a single mhtml (MIME) file.
Let's take an example: A document with header/footer and an image. The files contained in this zip ({{:wordgen:mime_example.zip}}) are:
* ''mydocument.htm'' : The main document
* ''mydocument_files/headerfooter.htm'' : The header and footer
* ''mydocument_files/smiley.gif'' : An image.
Building a MIME file only requires to encode these files in base64 and add a header for each one:
MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_ZROIIZO.ZCZYUACXV.ZARTUI"
------=_NextPart_ZROIIZO.ZCZYUACXV.ZARTUI
Content-Location: file:///C:/mydocument.htm
Content-Transfer-Encoding: base64
Content-Type: text/html; charset="utf-8"
PGh0bWwgeG1sbnM6bz0ndXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTpvZmZpY2U6b2ZmaWNlJyB4
bWxuczp3PSd1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTp3b3JkJyB4bWxucz0naHR0
cDovL3d3dy53My5vcmcvVFIvUkVDLWh0bWw0MCc+DQo8aGVhZD48dGl0bGU+TWljcm9zb2Z0IE9m
ZmljZSBIVE1MIEV4YW1wbGU8L3RpdGxlPg0KPCEtLVtpZiBndGUgbXNvIDldPg0KPHhtbD48dzpX
b3JkRG9jdW1lbnQ+PHc6Vmlldz5QcmludDwvdzpWaWV3Pjx3Olpvb20+MTAwPC93Olpvb20+PHc6
RG9Ob3RPcHRpbWl6ZUZvckJyb3dzZXIvPjwvdzpXb3JkRG9jdW1lbnQ+PC94bWw+DQo8IVtlbmRp
Zl0tLT4NCjxsaW5rIHJlbD1GaWxlLUxpc3QgaHJlZj0ibXlkb2N1bWVudF9maWxlcy9maWxlbGlz
dC54bWwiPg0KPHN0eWxlPjwhLS0gDQpAcGFnZQ0Kew0KICAgIHNpemU6MjFjbSAyOS43Y210OyAg
LyogQTQgKi8NCiAgICBtYXJnaW46MWNtIDFjbSAxY20gMWNtOyAvKiBNYXJnaW5zOiAyLjUgY20g
b24gZWFjaCBzaWRlICovDQogICAgbXNvLXBhZ2Utb3JpZW50YXRpb246IHBvcnRyYWl0OyAgDQoJ
bXNvLWhlYWRlcjogdXJsKCJteWRvY3VtZW50X2ZpbGVzL2hlYWRlcmZvb3Rlci5odG0iKSBoMTsN
Cgltc28tZm9vdGVyOiB1cmwoIm15ZG9jdW1lbnRfZmlsZXMvaGVhZGVyZm9vdGVyLmh0bSIpIGYx
OwkNCn0NCkBwYWdlIFNlY3Rpb24xIHsgfQ0KZGl2LlNlY3Rpb24xIHsgcGFnZTpTZWN0aW9uMTsg
fQ0KcC5Nc29IZWFkZXIsIHAuTXNvRm9vdGVyIHsgYm9yZGVyOiAxcHggc29saWQgYmxhY2s7IH0N
Ci0tPjwvc3R5bGU+DQo8L2hlYWQ+DQo8Ym9keT4NCjxkaXYgY2xhc3M9U2VjdGlvbjE+DQpJJ20g
cGFnZSAxIDxpbWcgc3JjPSJteWRvY3VtZW50X2ZpbGVzL3NtaWxleS5naWYiPg0KPGJyIGNsZWFy
PWFsbCBzdHlsZT0nbXNvLXNwZWNpYWwtY2hhcmFjdGVyOmxpbmUtYnJlYWs7cGFnZS1icmVhay1i
ZWZvcmU6YWx3YXlzJz4NCkknbSBwYWdlIDIuDQo8L2Rpdj4NCjwvYm9keT4NCjwvaHRtbD4NCg0K
DQo=
------=_NextPart_ZROIIZO.ZCZYUACXV.ZARTUI
Content-Location: file:///C:/mydocument_files/headerfooter.htm
Content-Transfer-Encoding: base64
Content-Type: text/html; charset="utf-8"
PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiIHhtbG5zOm89InVy
bjpzY2hlbWFzLW1pY3Jvc29mdC1jb206b2ZmaWNlOm9mZmljZSIgeG1sbnM6dz0idXJuOnNjaGVt
YXMtbWljcm9zb2Z0LWNvbTpvZmZpY2U6d29yZCIgeG1sbnM6bT0iaHR0cDovL3NjaGVtYXMubWlj
cm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIj0geG1sbnM9Imh0dHA6Ly93d3cudzMub3Jn
L1RSL1JFQy1odG1sNDAiPg0KPGJvZHk+DQoNCjxkaXYgc3R5bGU9Im1zby1lbGVtZW50OmhlYWRl
cjsiIGlkPSJoMSI+DQo8cCBjbGFzcz1Nc29IZWFkZXI+SGVhZGVyPC9wPg0KPC9kaXY+DQoNCjxk
aXYgc3R5bGU9J21zby1lbGVtZW50OmZvb3RlcicgaWQ9ZjE+DQo8cCBjbGFzcz1Nc29Gb290ZXI+
PHNwYW4gY2xhc3M9U3BlbGxFPkZvb3Rlcjwvc3Bhbj4gcGFnZSA8IS0tW2lmIHN1cHBvcnRGaWVs
ZHNdPjxzcGFuDQpjbGFzcz1Nc29QYWdlTnVtYmVyPjxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpm
aWVsZC1iZWdpbic+PC9zcGFuPjxzcGFuDQpzdHlsZT0nbXNvLXNwYWNlcnVuOnllcyc+oDwvc3Bh
bj5QQUdFIDxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpmaWVsZC1zZXBhcmF0b3InPjwvc3Bhbj48
L3NwYW4+PCFbZW5kaWZdLS0+PHNwYW4NCmNsYXNzPU1zb1BhZ2VOdW1iZXI+PHNwYW4gc3R5bGU9
J21zby1uby1wcm9vZjp5ZXMnPjE8L3NwYW4+PC9zcGFuPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+
PHNwYW4NCmNsYXNzPU1zb1BhZ2VOdW1iZXI+PHNwYW4gc3R5bGU9J21zby1lbGVtZW50OmZpZWxk
LWVuZCc+PC9zcGFuPjwvc3Bhbj48IVtlbmRpZl0tLT48c3Bhbg0KY2xhc3M9TXNvUGFnZU51bWJl
cj4vPC9zcGFuPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4gY2xhc3M9TXNvUGFnZU51bWJl
cj48c3Bhbg0Kc3R5bGU9J21zby1lbGVtZW50OmZpZWxkLWJlZ2luJz48L3NwYW4+IE5VTVBBR0VT
IDxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpmaWVsZC1zZXBhcmF0b3InPjwvc3Bhbj48L3NwYW4+
PCFbZW5kaWZdLS0+PHNwYW4NCmNsYXNzPU1zb1BhZ2VOdW1iZXI+PHNwYW4gc3R5bGU9J21zby1u
by1wcm9vZjp5ZXMnPjE8L3NwYW4+PC9zcGFuPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4N
CmNsYXNzPU1zb1BhZ2VOdW1iZXI+PHNwYW4gc3R5bGU9J21zby1lbGVtZW50OmZpZWxkLWVuZCc+
PC9zcGFuPjwvc3Bhbj48IVtlbmRpZl0tLT4NCjwvcD4NCjwvZGl2Pg0KDQo8L2JvZHk+DQo8L2h0
bWw+
------=_NextPart_ZROIIZO.ZCZYUACXV.ZARTUI
Content-Location: file:///C:/mydocument_files/smiley.gif
Content-Transfer-Encoding: base64
Content-Type: image/gif
R0lGODlhEgASAOeQAEM0EP/lIf/mIv/jH//mI//kIP/lIvC2AN3e3f/kH9LQynBeMPvYD/zZEens
7/fLAPvXDffMAM7MxJ2Sd//iHeyrAOecAGBLF//kIfrTCOfq7dfX0//iHv7gG3dmO3hnPPC3AGRP
Hf3fGfjMAJCEY+miAMzKwfvXDl9JFPK9AKZxBKCXfP7hHaGXfffKAPbJAP3dF++yAO+xAPbIAJCD
YtfX0mhSHW9TB52TeG1NB8KLAmlTHvTEAK1+A//nI/7iHpySdt6uAWdQHK+KBPrSB+jr7vPBANeu
AtSmAvzbE/7iHXhmPG5OB8ukAuemAJ6UePvZEG5QB/bKAHFTB5hrBKZsA9+0AfjOAsqEAvzbFM6G
AZViBKyBBPPCAPPGAOyqAG9NB/TCAN+XAP7fGfG8AJKGZv7jH/7fGrt3Ap+VesKMAvTMBnZlOvTD
AO25ALGMA/3cFmBKFuaZAM+NAcvJwHVjN/LEAKRrBJpwBPjPBu2uAJ99BPK/AGxLB6+EA+qpALGN
A6h2BP7gGqVqA/XHAP3gGuOVAOCTAN+VAHNiNc3Lw3NXB8yLAvO/AGhRHfC4AP//////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////yH5BAEAAP8ALAAAAAASABIA
AAj+AP8JHNjiw4ULH1oMXDhwwo0hRwjxQMIlygSGAhfsWQOhQYMTRFy4wbOAoY0mEOCcoUChAwwG
I1Ko2TGQBCAIYzgMGFCgwABBDB6ACFRG4KI8MHQCKIAhAIABIjKE+cPkH443EDrwBBAggAGuLKC8
kKEiTR0rDShsNSBAAAADCZI8ODBnyQUvaZkGEECgr4AEWSIcEBMijp0TS73y9UFAgBkAUmIgCpEo
SIanTfn2DVAIQJsKWtgA8eOCwVPFAgIoARDhUYkqT/5NITPCNIsECX6IAHClUQVDYATS6JHiwWUA
yFnz0WNhEImBQnQcMDLjQYQXXUB8sYDGEcMFVJwiyDhwIEaFEoe2lMQ4IYcKRhbkYLnT5yLGgSs8
oEDhYQXGgAA7
------=_NextPart_ZROIIZO.ZCZYUACXV.ZARTUI--
Please note:
* The boundary can be any string you want, as long as you can't find it in data. (Using a dot is a safe bet because base64 data can't contain dots.)
* Take extra care of dashes before and after the boundary marker.
* Take extra care of empty lines. They are important.
* You can use other encodings than base64 (such as quoted-printable), but base64 is a safe bet and works on all kind of files.
This file can be renamed to .doc and opened in Word:
{{ :wordgen:wordgen_mime.png }}
**Amusing fact:** mhtml (MIME) documents are usually smaller than their true .doc counterparts. For example, the previous ''mime_example.doc'' (which is a MIME/mhtml file) is **5216 bytes** long. Resaved in true .doc format, it's **24064 bytes**.
==== A basic MIME 1.0 class helper ====
Here is a basic MIME 1.0 class which will help you generate the mhtml files:
class mime10class
{
private $data;
const boundary='----=_NextPart_ERTUP.EFETZ.FTYIIBVZR.EYUUREZ';
function __construct() { $this->data="MIME-Version: 1.0\nContent-Type: multipart/related; boundary=\"".self::boundary."\"\n\n"; }
public function addFile($filepath,$contenttype,$data)
{
$this->data = $this->data.'--'.self::boundary."\nContent-Location: file:///C:/".preg_replace('!\\\!', '/', $filepath)."\nContent-Transfer-Encoding: base64\nContent-Type: ".$contenttype."\n\n";
$this->data = $this->data.base64_encode($data)."\n\n";
}
public function getFile() { return $this->data.'--'.self::boundary.'--'; }
}
It's rather simple: Add files with ''addFile()'', then get the final mhtml/MIME1.0 document with ''getFile()''.
Example:
header('Content-Type: application/msword');
header('Content-disposition: filename=mydocument.doc')
$doc = New mime10class();
$doc->addFile('mydocument.htm','text/html; charset="utf-8"','Hello, world !');
$doc->addFile('subdir\anotherfile.htm','text/html; charset="utf-8"','Hi there.');
echo $doc->getFile();
===== Sending the file to the client =====
The .mht file must be served to the client with the following HTTP headers:
Content-Type: application/msword
Content-disposition: attachment; filename=myfile.doc
Yes, we use ".doc" in order not to confuse the final user. Word will recognize this is a .mht file and will open it accordingly.
Note that:
Content-disposition: attachment; filename=myfile.doc
will force download, but:
Content-disposition: filename=myfile.doc
will allow the user to choose between "open" or "save".
===== Examples =====
Full, working HTML documents which load correctly in Microsoft Word, using many Word features: Styling, tables, headers & footers, page format, table-of-content, images...
FIXME
===== php code examples =====
FIXME
===== Performances =====
I have implemented this system in a professional environment, and we are able generate a **70 pages** database-driven dynamically-filled document with lots of tables and references in less than **5 seconds**. (The document contains no images; The server runs Apache+php+Oracle. Document templates are written in Smarty.)
The performances are //excellent//, much better than what I expected.
===== Ideas worth pondering =====
* Using a [[http://en.wikipedia.org/wiki/Lightweight_markup_language|markup language]] (Markdown ? Textile ?...) which generates HTML may ease the creation and maintenance of templates.
* With php, the use of a templating engine like [[http://www.smarty.net/about_smarty|Smarty]] should help to easily create and maintain document templates without touching php code too much. Template inclusion can help to create - for example - standard headers for all documents.
* Serving files using gzip compression may improve user experience (our ''mime_example.doc'' above goes from 5216 bytes to 2569 bytes with default gzip compression).
~~DISCUSSION:closed~~