Jump to content

Help:WordToWiki

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by John of Reading (talk | contribs) at 15:22, 12 February 2019 (Reverted 1 edit by 130.93.32.118 (talk): This is the Emglish-language Wikipedia. (TW)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Microsoft Word

VisualEditor

VisualEditor, the WYSIWYG editor deployed on multiple Wikipedia allows for the copying/pasting of content from Word documents into a wiki page. Most formatting is kept intact – including tables. However, images and advanced formatting will need to be cleaned up upon import.

Word2MediaWikiPlus

The following extension from 2007, unmaintained as of 2017, may still work: Word2MediaWikiPlus Tested with Office 365 word, conversion works despite getting a warning several times.

Download it from: http://sourceforge.net/projects/word2mediawikip/files/word2MediaWikiPlus/1.0.0/Word2MediaWikiPlus-1.0.0.zip/download

Alternative Solution

Microsoft released an add-in that allows you to save your Microsoft Office Word 2007 or above documents straight into MediaWiki.

  1. Download the "Microsoft Office Word Add-in For MediaWiki" from Microsoft Download Center, and install it.
  2. Save the document as "MediaWiki (*.txt)" file type.
  3. Copy the text from the (*.txt) file into your Wiki page

Note that this extension does not work for Word 2013 by default, however it can be made to work with a registry change. See this page.

Possible issues with alternative solution

  • This add-in requires Windows as an operating system; it won't work with Mac OS X
  • This Microsoft add-in does not handle images. A placeholder is emitted.
  • End notes and footnotes can't be converted. Including them in a document will throw an error.
  • If you attempt to resolve the previous issue by inserting <ref> tags, upon conversion Word will replace the angled brackets with < and >
  • Some text will be enclosed by <nowiki> and </nowiki> tags.
  • Not supported for Office/Word 2013, see Word Add-in For MediaWiki not supported in Word 2013?

Nevertheless, for those who are unfamiliar with MediaWiki Markup Language and who are working on simple articles, the Microsoft Office Word Add-in For MediaWiki can be a useful tool.

Two-stage conversion from Word to MediaWiki

The following methods both perform: Word → HTML → MediaWiki.

Quick

  1. Open your document in Word, and "save as" an HTML file.
  2. Open the HTML file in a text editor and copy the HTML source code to the clipboard.
  3. Paste the HTML source into the large text box labeled "HTML markup:" on the html to wiki page.
  4. Click the blue Convert button at the bottom of the page.
  5. Select the text in the "Wiki markup:" text box and copy it to the clipboard.
  6. Paste the text to a Wikipedia article.

Automated scripts

The conversion can also be done using a combination of two scripts and two software packages.

  1. The following two software packages must be installed:
  2. Write the bash script "doc2mw", and the perl script "html2mw", both shown below.
  3. Call doc2mw passing the word document as parameter. i.e.
> doc2mw my_word.doc
doc2mw
a bash script taking a single parameter, which calls wvHtml followed by html2mw.
 #!/bin/bash
 #       doc2mw - Word to MediaWiki converter
 
 FILE=$1
 TMP="$$-${FILE}"
 
 if [ -x "./html2mw" ]; then
         HTML2MW='./html2mw'
 else
         HTML2MW='html2mw'
 fi
 
 wvHtml --targetdir=/tmp "${FILE}" "${TMP}" 
 # but see also AbiWord: http://www.abisource.com/help/en-US/howto/howtoexporthtml.html
 
 # Remove extra divs
 perl -pi -e "s/\<div[^\>]+.\>//gi;" "/tmp/${TMP}"
 
 ${HTML2MW} "/tmp/${TMP}"
 rm "/tmp/${TMP}"
html2mw
a perl script called by doc2mw, which uses HTML::WikiConverter to convert html -> mediawiki.
 #!/usr/bin/perl
 #       html2mw - HTML to MediaWiki converter
 
 use HTML::WikiConverter;
 
 my $b;
 while (<>) { $b .= $_; }
 
 my $w = new HTML::WikiConverter( dialect => 'MediaWiki' );
 
 my $p = $w->html2wiki($b);
 
 # Substitutions to get rid of nasty things we don't need
 $p =~ s/<br \/>//g;
 $p =~ s/\&nbsp\;//g;
 print $p;

Disclaimer: These scripts are probably not the best way to do this, only a possible way to do this. Please feel free to improve them.

OpenOffice or LibreOffice

LibreOffice (LO) Writer can send Word documents directly: go file/export/save as type Mediawiki. (for Linux user it can be necessary to install the library libreoffice-wiki-publisher)

OpenOffice versions 3.3 and later can send documents in formats it supports (including Microsoft Word) directly to a MediaWiki, but this does not seem to work under Windows 7. (At least for the German version of OpenOffice 3.3.0 you need to install the ‘Sun Wiki Publisher’-extension first! Server url: http://en.wikipedia.org/w/ ) Once you have added the MediaWiki-server of your choice, future submissions can happen automatically.

  1. Open the document in OpenOffice or LibreOffice Writer.
  2. Go to File → Send-To → To MediaWiki or File → Export → Save file as: Mediawiki
  3. Select your MediaWiki-server (or click on the button "Add..." to add a new site).
  4. Select a title and summary for your article, check the box if it's a minor revision.
  5. Click the send button.

Alternatively the manual 'export-function' can be used: File → Export → choose ‘MediaWiki (.txt)’-format. LibreOffice Writer 5 can export as a MediaWiki .txt file under Windows 10 if the appropriate 32- or 64-bit Java Runtime Environment (JRE) has been installed and enabled in LO. The document to be converted has to use styles, etc.; for example headers must be in Heading 2 style to be bracketed by "==" when converted.

Pandoc

Pandoc is a command-line utility that can convert from and to many document formats. Once installed, converting from Word to Mediawiki looks like this:

$ pandoc -t mediawiki mydocument.docx > mydocument.wiki

See also