Copy from MS Word, Paste into a Rich Text WYSIWYG editor

This title will send chills up the spine of web developers & content authors everywhere.  Web Developers fear the bloated markup caused by this action.  Content authors fear the difficulty of mixing their favorite authoring environment with their CMS’s editor.

Why is copy & paste a problem on the web?

The problem isn’t copy & paste.  The problem is WHAT is being copied & pasted.

Plain text (content without any styling) is completely safe to paste into a Rich Text editor.  However, rich text content consists of 1) text and 2) styling.

This sentence has a bolded word.

In this example, my Rich Text editor added hidden markup around the word “bolded”.  This markup instructs the web browser to apply special styling.  If this content is copied & pasted into another program then this hidden styling is included.

And despite what you think, this isn’t what you want…

MS Word is not good at creating web sites

There are plenty of choices for accessing the web (PC, Mac, phones, iPad, IE, Chrome, Firefox, Opera, etc).  Ideally, a web site needs to function reliably in all of these environments.

To address this challenge, web developers establish styling for the entire web site.  This styling, in addition to creating a consistent visual experience, enables the web site to be adapted for each device or browser.

By importing styling from MS Word authors are circumventing their web site’s styling.

Insidious hidden styling that accompanies copy & paste actions from MS Word

As a result, styling that worked wonderfully in one environment (MS Word) will behave very poorly in another environment (your web site).   Even if it looks okay during publishing, this imported styling will create insidious long-term issues for the web site.

What’s the solution to copy & paste?

As described above, the embedded styling (found in copy & pasted content) is the problem.  Consequently, the solution is simple and obvious:

Copy the text, but remove the styling.

Towards this end, special ‘paste’ buttons are popular with many Rich Text editors:

8 icons devoted to copying & pasting in a Rich Text WYSIWYG editor

However, this is a ridiculous waste of toolbar real estate.  The Rich Text editor should automatically clean pasted content.  The alternative is educating end-users regarding which of these 8 buttons they should click.

All major Rich Text solutions (TinyMCE, CKEditor, RadEditor) have options for automatically cleaning pasted rich text content.

This solution has a downside though:

When styling is removed the content will look radically different.  This requires content authors to reapply missing styling within the Rich Text editor.  By doing this, authors are replacing MS Word styling with web friendly styling.

This solution is unrealistic, content authors will revolt

Everything I’ve written is well known to developers.  Furthermore, features for automatically detecting and cleaning dirty content are widely available.

However, these features are often disabled in the face of user revolt.

Content authors will revolt if you strip away their MS Word styling (or remove font colors)

It’s normal for content authors to react negatively when their nicely formatted MS Word document turns to garbage in the CMS.  These reactions are given credibility since their actions worked fine in another CMS or Rich Text editor.

So…just disable the feature that strips MS Word styling and make them happy…

This will eventually ruin the web site, but the customer is always right.  Right?

Is Clippy the solution to our problems?

This post has now come full circle and we’re no closer to a real-world solution:

  1. Developers remove pasted styling to protect the web site
  2. Authors create content in their preferred writing environment.
  3. Authors want to move this content to the web site.
  4. Copy & paste is a logical choice.
  5. Authors are confused when everything goes to hell.
  6. Authors complain to developers.
  7. Developers allow pasted styling to make authors stop complaining.

However, as I look over this cascade of events, I see an opportunity for intervention at stage #5.  Education (as much as technology) is the problem.

To address this, here is what I propose:

Rich Text dialog window when pasting from MS Word.  No, it's not like Clippy.

I was chatting with a colleague about this dilemma and showed him this mockup.  He replied with “you want Clippy” and then smiled.  This reply severely shook my faith in my proposal.  I certainly have no desire to interact with Clippy…

Microsoft Clippy - Alive, well and now in your CMSHowever, there is a lot I like about this proposal:

  • It doesn’t involve an animated character
  • It empowers authors to make their own choice
  • It educates authors about the consequences
  • It only displays when relevant
  • It contains useful information
  • It will go away

None of these things could be said about Clippy.

If you build it, they will come!

Everything described happens because authors avoid writing content in their CMS’s Rich Text editor.  The hacky style stripping & modal windows are completely unnecessary if authors simply type the content in the CMS.

Towards that end, I’m very interested in creating an attractive web-based authoring experience.  Why are authors avoiding web-based authoring tools in favor of off-line tools? How can we change this behavior?

There are some big players (Google Documents, Word Live) that are also wrestling with this challenge.  This topic is covered in another post.

  • FredCK

    Interesting article Gabe. Actually, we are following your blog with great pleasure recently.

    CKEditor already does the Word cleanup automatically, either on CTRL+V or through the toolbar “Paste”. There are people that still want to the “Paste from Word” option though, so we have the dedicated button there.

    We’re considering merging all paste buttons into a single one, which defaults to “Paste” but can also do “Paste from Word” and “Paste as Plain Text”. This looks like a good solution for the mess, don’t you think?

  • Arif

    Here the problem I came up with CKEditor (actually I saw it in all editor I’ve tried – that does the cleaning) is – they remove my real formattings and styles; only keep a few of them. Like I’ve some centered, bold & a bit larger font size text in a docx file. After I copy-paste these to ckeditor it lost my actual font size; not sure if there is any special case with my doc. I can send the doc if you are interested to….

    note: this is not happening, when editor does not cleanup the content pasted from word

    Any ideas or help?

    • ben miedema

      I see that it has been a while since anyone has commented. Gabe, I agree with you 100%. C&P from MS Word does leave a lot to be desired. I have created an MS Word marketing campaign that messes up when transferring text information to my e-mailer. It could still work, but a better solution would be to transfer my MS Word document to plaintext via Notepad and then using a Texr Editor like TextPad to add html to create Rich Text. I wish there was a short cut vehicle. Any advice?

  • Kevin White

    The classic perfect-world solution vs. real-world results. Unfortunately, I think any solution that relies on the user making the correct, best solution is working on a flawed premise. People are lazy and are generally going to choose the path of least resistance.

    My gut feel is 90% of our users will choose “Leave it Alone”. 50% of those people will then end up contacting us to fix the mysterious problems that are occuring on their website (the other 50% are the ones that don’t even bother to look at their content on the actual page after they hit “Save”). I also anticipate a need to provide an interface for undoing the “Remember” selection, as they don’t realize what is really going to happen to their content after they make their selection (and will not wait to try it once to find out).

    That said, I still think this is a step in the right direction and is a better solution/start than the current “State of the Paste”.

    The better solution (and I am certainly not volunteering to build it, just to use it after the smart minds make it), would seem to be to automatically detect the pasting of Word content and fix it, keeping a vast majority of the styling in place (just replacing it with the correct HTML/CSS styling). To the user, nothing has happened. They hit paste and the stuff appears in the editor with no visible difference from the way it looked in Word.

    • Alex Dovey

      I agree, clients would never check the ‘clean’ button. The better solution is almost fool proof now with wordpress (tinymce) loading copy from a new version of MS word to the content editor is pretty good.  

      Any older versions and it all breaks loose.

  • Eric Crawford

    Experienced content folks I work with tend to ignore all the special copy/paste options and do this for consistency:

    1 copy/ paste into notepad to remove stylejunk
    2 copy/paste from notepad into HTML window
    3 Reapply necessary tags/style.

    If you want to create a fancy paste styled text button, it should be customizable or just bring in basic tags like , <a>, <strong>, etc. and nothing more (especially without the tags everywhere to apply and reapply font-size, font-weight, font-face, etc.)

    • Eric Crawford

      Oops, my post lost the tags I wanted to include: <a> <p> <strong>

      • Gabe Sumner

        Hey Eric, thanks for the comment. I fixed your post. I love that (even here) we’re struggling with the editor. :)

  • Mike Johnston

    Hi Gabe,

    I think you’ve made some excellent points here and really hit the mark on this topic. This is definitely a frustration point for alot of users, including myself. I would suggest one addition, however. You mentioned “Even if it looks okay during publishing, this imported styling will create insidious long-term issues for the web site.” but didn’t actually highlight what those issues may be.

    That information could be valuable to people unaccustomed to these issues and I’d suggest you add some examples. Great article, thanks for letting me know about it.


    • Gabe Sumner

      Hey Mike, thanks for the comment.

      Regarding the long-term issues of copy & paste, I had about 2 paragraphs of content typed that I removed because it felt like a large tangent in the midst of the article. I was planning to turn “insidious long-term issues” into a link [later] to another article.

      Here is one example though:

      Web site redesigns are very challenging when the web site is littered with in-line styling. This task will involve messy site-wide search & replace operations. Whereas external styling makes this a very quick site-wide change.

      I’ll try to touch on this subject in a future post. Thank you so much for commenting.

  • Renaudgarnier

    Thanks for these thinks everybody. I’m working with CKeditor these days to allow visitors of my site to copy and paste there movie script on the site. These documents are not so complicated, I just need to keep a margin-left for dialogs. Other settings could change, it’s not my primary trouble. 

    The thing that’s surprising me today is the difference between web browser. I use to check my site development and changes with Safari 5.1 on a Mac and since this morning I can copy from Pages or MSWord and Paste to CKEditor without losing any style of my document… So I keep encoding the site during hours when I just realize that this fact is an exception. Chrome doesn’t keep the style and mozilla doesn’t paste at all !! Even with the PasteFromWord option !!! So… I’ve tried this on Windows with Safari,Chrome,Internet Explorer and Mozilla… Same thing, mozilla doesn’t accept the paste and the 3 others lose the style of my page.

    So I first tried to find out a browser accepting paste and style to advise on the different OS, but I can’t tell users to download a browser just for a paste option. Uploading msword documents to convert them to HTML looks like a tragedy on Google searches. 

    Now, I’m just very confused, I’m not enough experienced to find the easier solution (if one is easier !) between working on a script to convert .doc to .html on server side OR trying to understand why paste is not working the same on different browsers !

  • Alex Dovey

    I remember reading about a New York newspaper where they’ve actually switched their entire workflow to have the editors write their articles on the website CMS first and then they have a stylist rework the same content for print. 

    Makes so much sense to me. (I’m a web developer!)

  • Dianne

    Has anyone experimented to see if any particular font works better than others when you do a copy paste from Word into a RTE?  I post content for several of our authors at my employment and I do at times get spurious changes to h1 as the most common oops.  I would like to standardize the font/size they use if there is a preference. 

  • Danbax

    Hum, lazy authors spending hours laboring over microsoft word with years of material stacked up ready to share on the new web and mobile readers.  I don’t think lazy really fits.  If you are looking at this from developers point of view, well you are writing code and not many authors of “words” care about that stuff.  Somewhere, or somehow the two need to meet.  I have found a web builder — Ewisoft Web Builder that has an intern editor but contains an option to default to you Micrsoft Word 10 (or earlier) processor.  You go into your web page, enter Word as your choice and up pops MS Word and when you save what you have written saved to the Web Page.  The web page has the appearance, fonts, and edititing as seen in the original Word Doc.  This is a great feature for anyone writing more than 11 words at a time.  I do cheat, however, and do up articles on MS Word independant of my Web page and then do a copy and paste and it seems to work, but adds some extra spacing and has trouble transposing some special fonts.  I work on MS Word because I am writing a lot, and I am in different locations and it is not convienent to contunually have to go into my web page to access my MS Word editor.  Solution?  Create a a Web editor that will handle “copy paste” from Word into a web page with unlimited pages.   I have copied and paste to a webpage then gone back in and have to rework everything, taking out extra spacing etc.  I my case I was trying to upgrade to the slicker, modern websites with the ability to play youtube and audio but the base material being primarily technical studies of several pages, with varying fonts and sizing, etc.  Lazy?  I just hate doing thing twice, after already having edited before.  I do not see what the problem is, there is no one building web software that is comapitable with and external Word processor?  Ewisoft is the only choice?  They seem to be doing it — maybe they will update and chatch up with the mobile society we are living in.  Peace, Dan

  • Nanashi

    I love that this article talks about text formatting problems and yet has so many problems with special characters–to the point I have problems reading it.

  • Frank

    Really good point: As a CMS admin I struggle with authors request for Word formattings vs. good content publishable to all channels. I would like to stress, that Word formats for printing only, while a CMS want text formatted accordingly to structure, like in HTML. But this deep difference is very difficult to sell. Maybe the need to address mobile devices very well could convince authors, I’ll try this chance.

  • Frank

    I see the buttons I allow in a RTE as a filter. Roughly defines every button i a RTE a tag. Allowing a button is allowing a tag.
    A paste in from Word or elsewhere has to strip every tag which is not allowed in toolbar – and there has to be no exceptions.
    That is my wish to all RTE products, but often disappointed.
    If I need to implement such a paste I would use this toolbar buttons whitelist approach. But often its rather a blacklist behavior unfortunaty.

  • Ryan

    This article has broken UTF-8 characters throughout it when viewed from Chrome.
    Guess you can’t even win the text-cleanup battle on a blog post :)

  • Green

    not working?