CO33: 18 December 2002
Copyright © 2002 by Kevin Sharpe. All rights reserved. Submitted
for publication.
WORD TO WEB
by
Kevin Sharpe
The Graduate College, Union
Institute and University, Cincinnati, Ohio, USA
Harris Manchester College, Oxford University, Oxford, UK
10 Shirelake Close, Oxford OX1 1SN, United Kingdom
kevin.sharpe@tui.edu
www.ksharpe.com
ABSTRACT. I describe my method for converting work written with Microsoft Word into a form suitable for my web site.
Like many others, I write articles, papers, and books using Microsoft’s Word program. I also maintain my web site, www.ksharpe.com, which holds many of my writings in a database. My task is this: To write in Word and quickly and easily change the resulting work into an html form that the web database accepts. Most writers – including many Union learners, alums, and faculty – will soon want or need to carry out a procedure like this, if they don’t already. Adapting to and using online publishing technology offers an exciting challenge.
We ideally require a simple procedure that works automatically or nearly so. I originally thought I would quickly find or develop such a routine but, like my experience with most computer software and hardware, filling my need became complex and time-consuming.
This essay shares what I have learned. Hopefully, readers can answer my remaining questions.
KEY WORDS. Computer software; World Wide Web; writing method.
Like many others, I write articles, papers, and books using Microsoft’s Word program. I also maintain my web site, www.ksharpe.com, which holds many of my writings in a database. So I write in Word and copy the resulting work into the web database.
Most writers will soon want or need to carry out a procedure like this, if they don’t already. Adapting to and using online publishing technology offers an exciting challenge. It also offers opportunities for interacting with the community at large. For instance, yesterday I received two emails from scholars who had visited my site and browsed the database. One, a friend of David Bohm on whom I have written extensively, took issue with the word ‘religion’ as I apply it to Bohm. The second came from a friend I knew at Auckland University 20 years ago, but with whom I had lost contact. He used a typo on the site as an excuse to be in touch.
The site offers a search engine so that scholars can peruse my writings for my thoughts on specific topics. They can then dialogue with me if they wish. My writings on many of my pet topics appear high up on search engines, linked to my site, because of the way I present them.
How can I write in Word and quickly and easily convert the resulting work into an html form that the web database accepts?
We ideally require a simple procedure that works automatically or nearly so. I originally thought I would quickly find or develop such a routine but, like my experience with most computer software and hardware, filling my need became complex and time-consuming.
This essay shares what I have learned.
Resources
I approach the task well prepared. Installed in my up-to-the-minute laptop are the Word program I mentioned above, Microsoft FrontPage (a web site program I purchased to help with my task), and EditPad Pro (a handy text editing program I downloaded from the web). I’m also a mathematician and feel confident about solving simple programming questions.
My web site uses the program Cold Fusion and holds my writing files in a database that builds on Microsoft Access. The site programming prescribes the basic formatting (fonts, colors, and point size). I supply my writing without specifying this but with other formatting in html code. Further, the database limits the size of a file. I want to minimize the amount of html formatting I supply, so that:
· each article or chapter or paper is as small as possible (each format command takes up space, even if the final product hides the codes, and longer files take longer to download);
· each piece of writing fits in the database as one file rather than as a series of linked files (I believe readers prefer to download a piece all at once).
History
Easy, I thought. Take a Word file, save it html format (‘File,’ ‘Save as,’ or ‘File,’ ‘Save as a web page’), and copy it to the site’s database. (Actually, I need to strip the file’s header and ending routines – keeping only what lies between <body> and </body> – before I save it in the database.) Not so. The Word html file starts with a large and hidden style sheet that specifies the file’s typeface, size, color, and other format features. The code within the text itself mostly refers to the style sheet. Removing it to save space and to avoid specifying such things as font can mess with what the file looks like on the web: some of the code shows, indents go haywire, endnotes disappear, and so on. It also ends up with a lot of unwanted formatting code hidden but still within the file. I prefer that the body of the file specify all the code and do so in as little space as possible.
My programmer friends said I should write in all the html formatting codes from scratch – long hand – with a text editing program. I followed their advice and first of all removed excess code from Word html files. In the process, I learned more about html than I ever wanted to and the thought of writing complex code for dozens of endnotes per paper overwhelmed me. It took miles too long. An automatic, simple, and quick program or process for converting the files must exist.
Maybe, I thought, the Microsoft FrontPage program would import a Word document and convert it to html without the excess code of the equivalent Word html file. FrontPage also places the html code in the body of the text. The manual for FrontPage describes such an import function: go to ‘open file,’ select the desired Word document, and from the options choose ‘convert.’ I still can’t find the ‘convert’ command. I could find a similar one: open a new file in FrontPage (‘control’ + ‘n’) and import the Word document into it with ‘Insert,’ ‘File,’ and then select the file. This converts the file to RTF format and from that to html. However, it loses some of the formatting – including all endnotes – in the process.
I surfed the web for programs to convert Word to html. I found two. One, a freebee, loses sections of the document. The other, HotMetal, seems wonderful in the downloaded trial version, but fails to convert. I must purchase the program to enjoy conversion. The manufacturer’s assistants informed me by email that it wouldn’t convert my Word program anyway; they hadn’t developed an update to cover this too new a program (it may exist by the time this essay goes to press). A friend told me that Microsoft offers a modification for Word that performs just what I want. I downloaded it from the Microsoft web site at http://office.microsoft.com/downloads/2000/Msohtmf2.aspx and installed it onto my computer. (Word XP includes this in its ‘save as’ option, as ‘web page, filtered.’) It works, but the resulting files still require a style sheet in the head and lots of formatting code in the body. I want to end the process with a smaller file than this Word add-on achieves. I also tried DreamWeaver, but the version I used also leaves an excess of code.
So I turned back to FrontPage. As a version Microsoft companion product to Word, it should do what I want. This seems a small request.
Results
New Works
I experimented. I discovered that I could save my Word document in RTF format (in Word, choose ‘File’ then ‘Save as’) and then insert this into a new FrontPage file (‘Insert,’ ‘File,’ and select the file). (I need to close the Word RTF file for FrontPage to convert it.) Most of my formatting carries over in this procedure without the mass of Word’s style sheet and accompanying codes. I then fiddled with the format I use for a Word document so the converted file automatically carries the formatting I want as I want it into FrontPage.
The following lists the procedure I developed:
1. Write everything in Word as a RTF file. This shows
more-or-less what the final web product will look like and it can be saved as a
‘.doc’ file if needed that way. It will also print out as though written as a
document. The Word default style needs setting up (format a document with the
styles and then, through ‘Format,’ ‘Style,’ ‘Organizer,’ copy the styles to the
‘Normal.dot’ template) in specific ways
for the formats to flow through to the web pages (downloading the file, http://www.ksharpe.com/writer/KSNormalStyle.doc,
will save doing most of what follows):
· Set your favorite font and point size (for me, Times New Roman at 12 point). This doesn’t affect the font and point of the final web appearance.
· Set the normal paragraph as not indented. (I abandoned my previous style of indenting the first line of a paragraph by 0.5 inches. Html places a line between paragraphs and I decided that this suffices. Anyway, a style that indents the paragraph’s first line may appear as not indented when converted to the web form.)
· Define H3 and H4 as headers (‘Format,’ ‘Style,’ ‘New’). Set H3 as ‘Body text’ plus ‘Bold’ plus ‘All capitals,’ and H4 as ‘Body text’ plus ‘Bold.’ I use only these two headers and as either centered or flush left: H3 centered for the title of the work, and H4 centered as the next level of headers and flush left for the next level down. H3 and H4 convert automatically to headers Heading 3 and Heading 4 in the web product. The predefined browser style sets how they appear, but they retain – which is what I want – the same point size as the body text. (You can use the other H1 to H6 header styles, if you wish, but they may convert to different point sizes.)
· Use ‘ – ‘ (space + En dash + space) as the dash. My prior preference, the longer Em dash, doesn’t carry through automatically. (I created a macro for ‘ – ‘ that works when I type ‘Control’ + ‘-’.)
· Set bulleted and numbered lists as flush left, text to start at 0.5 inches and continuing lines at 0.25 inches. These convert to bulleted and numbered lists indented to one blockquote in the web product. A greater indentation in the RTF file may convert to double or more blockquote indentations.
· Set indented quotes at 0.25 inches. This translates to a single blockquote in the web version.
· Set the style for references as normal. (I used to have it normal plus a hanging indent of 0.5 inches, but this offset the whole of the reference section when it reached the web stage.)
· Set hyperlinks as default paragraph font, blue color, and no underline. (I don’t underline them because they stand out too much in the printed version.)
· Set followed hyperlinks as default paragraph font, violet color, and no underline.
· Decide how you want to handle notes. I only use endnotes and I set the endnote text style as ‘Normal.’ (Like the references, I used to have it normal plus a hanging indent of 0.5 inches, but again I didn’t like the way this offset the whole notes section when it reached the web stage.) I also place at the end of the text a centered subtitle, ‘Endnotes,’ in the H4 heading style.
· Set endnote references as ‘Hyperlink,’ ‘No underline,’ ‘Not superscript/subscript,’ and font color blue. I like to enclose note references in square brackets, but I don’t know how to automate this and in the meantime need to insert them manually.
· I like the Word feature of linking an endnote reference in the text to the endnote itself, and from the endnote back to the original reference in the text. My procedure can also do this but I have yet to automate it:
a. Copy the endnotes to the end of the text.
b. Create a caption (‘Insert,’ ‘Caption,’ ‘New Label’) as ‘[N’ (without the quotation marks).
c. To create an endnote, invoke the label ‘[N’ (‘Insert,’ ‘Caption,’ select the label ‘[N’); this way, the endnote number counter automatically adjusts itself.
d. Close off the label with a ‘]’ when the endnote reference number appears.
e. Bookmark the endnote reference number x as ‘ny’ (select ‘[N x]’ in the text, then ‘Insert,’ and ‘Bookmark’ as ‘ny’), where y is a number not used before in the bookmarks and hyperlinks. (Note that, because you will sometimes insert new endnotes between existing endnotes, the number y may not coincide with x.)
f. Go to the copied endnote at the end of the file.
g. Create a caption (‘Insert,’ ‘Caption,’ ‘New Label’) as ‘[‘ (without the quotation marks).
h. Invoke the label ‘[N’ (‘Insert,’ ‘Caption,’ select the label ‘[N’).
i. Close the caption with ‘]’, add a space for good looks, and remove Word’s extra space after the ‘[‘.
j. Bookmark the endnote number x as ‘my’ (Select ‘[x]’ in the text, then ‘Insert,’ and ‘Bookmark’ as ‘my’), where y is the same number as the endnote reference bookmark.
k. Hyperlink ‘[x]’ (‘Control’ + ‘k’) to the bookmark ‘nx.’
l. Click on the now hyperlinked ‘[x].’ This takes the cursor to the endnote reference in the text.
m. Hyperlink ‘[N x]’ (‘Control’ + ‘k’) to the bookmark ‘mx.’
· Single curly quote marks (‘smart quotes,’ as Word calls them) carry through to single curly quotes in the Word version. I have yet to figure out how to carry through double smart quotes.
2. Having written the work with the above formatting, save it (as an RTF file) and close it.
3. Open a new file in FrontPage and import into it the RTF file saved in Word (‘Insert,’ ‘File,’ and select the file). This converts it to html with the formatting intact.
Old Works
To create a web version of existing Word documents can present its own problems. Convert them to RTF files (‘File,’ ‘Save as’) and then work (in Word) to mold them to the above format described in (1) above. A hint:
· Save the file as RTF and convert the style of the file to the default one (‘Format,’ ‘Style,’ ‘Organizer,’ then copy all the style items from ‘Normal.doc’ to the style list for the file; close the style organizer).
Experiments
I would like (end)notes to appear
on a separate web page that opens as a child window when clicking on a note
reference in the text. A hyperlink should carry the note back to its reference.
I would also like a references or bibliography page that opens as another child
window when clicking on a bibliographic reference in the notes. I have yet to
make all of this work easily.
Conclusion
Programs keep changing, as do writers’ needs. An ongoing version of this article on my web site, http://www.ksharpe.com/articles/articledetail.cfm?article_id=182, keeps track of my learning and what I hear from others.
I’m anxious to:
· conquer the translation of the double smart quotes,
· automate the endnote routine including the addition of square brackets around endnote numbers,
· create an automatic notes-reference system in child windows, and
· stop the procedure leaving the code ‘align=left’ at the beginning of each regular paragraph.
I would welcome any ideas.