HTML to FlowDocument Converter

I have an existing application that I wrote that stores “notes” in HTML format. I actually used the embedded HTML editor in IE to allow the user to create text – similar to what a RichText control would be used for but outputting to HTML rather than (the even more horrid) RTF.

I’m now in the process of upgrading that application to WPF, so its only natural that I want to display these notes using one of the WPF FlowDocument viewer controls. The problem I encountered was how to convert my HTML to something that could be nicely displayed in the FlowDocument?

Step 1 – Converting HTML to XAML

The solution was to download Microsoft’s sample HtmlToXaml Converter (which actually allows conversion in both directions). Its apparently not foolproof but its certainly more than enough to convert my very simple HTML to the corresponding FlowDocument.

Using the HtmlToXamlConverter classes ConvertHtmlToXaml we can take a HTML string and convert to a XAML string, e.g. from:

    <p>The <b>Markup</b> that is to be converted.</p>

to:

    <FlowDocument>
<Paragraph>The <Run FontWeight="bold">Markup</Run> that is to be converted.</Paragraph>
</FlowDocument>

Step 2 – Converting XAML markup into a FlowDocument instance

This was certainly a great start but it converts HTML text to XAML text – not actual objects. So the next hurdle was how to convert the XAML document markup at runtime and insert within a FlowDocument?

The solution was fairly easy to find thanks to Google and Ronald Clifford – although I can’t say I think its obvious.

    FlowDocument flowDocument = new FlowDocument();
string xaml = "<p>The <b>Markup</b> that is to be converted.</p>";
using (MemoryStream msDocument = new MemoryStream((new ASCIIEncoding()).GetBytes(xaml)))
{
TextRange textRange = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd);
textRange.Load(msDocument, DataFormats.Xaml);
}

Step 3 – Using DataBinding

So things were almost coming together. The last step was to perform this conversion as easily as possible – which meant using DataBinding. I had a list of items displayed, each with a field containing a string field that contained the HTML markup (as populated from a database using LINQ). So I wanted to be able to bind the FlowDocumentScrollViewer to the field containing the HTML and for things to work themselves out automatically. This required yet another IValueConverter class to use on the data binding.

    public class HtmlToFlowDocumentConverter : IValueConverter
    {
public object Convert(object value, Type targetType, object parameter, 
System.Globalization.CultureInfo culture) { if (value != null) { FlowDocument flowDocument = new FlowDocument(); string xaml = HtmlToXamlConverter.ConvertHtmlToXaml(value.ToString(), false); using (MemoryStream stream = new MemoryStream((new ASCIIEncoding()).GetBytes(xaml))) { TextRange text = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd); text.Load(stream, DataFormats.Xaml); } return flowDocument; } return value; } public object ConvertBack(object value, Type targetType, object parameter,
System.Globalization.CultureInfo culture) { throw new NotImplementedException(); } }

Note: The HtmlToXmlConverter class above is simply the Microsoft sample.

The Result

This means I can now use this converter anywhere in my XAML that I wish to display HTML content within a FlowDocument control.

    <Page.Resources>
        <conv:HtmlToFlowDocumentConverter x:Key=”htmlToXamlConverter”/>
    </Page.Resources>
    …
    <FlowDocumentScrollViewer Document=”{Binding Path=Events/Diaries/Comment,
                              Converter
={StaticResource htmlToXamlConverter}}”/>

12 thoughts on “HTML to FlowDocument Converter”

  1. I want to consume a rss feed but I can not visualize the images.
    How can I do this (I couldn’t figure out what I need to modify at HtmlToXmlConverter class?

    Thanks !!

  2. Hi, I’m trying step 2 but get error “The tag ‘p’ does not exist in XML namespace ”. I guess the namespace is null? Thanks.

  3. Well, now I understand that xaml (

    TheMarkup…) was just an example and is not the actual xaml to be converted; boy to I feel stupid for my previous post. Anyway, I’m trying to set the text of a RichTextBox this way and it just remains blank. I’m using the RichTextBox’s Document (which is a FlowDocument) instead of declaring a FlowDocument variable as you have above. But, again, the RichTextBox just stays blank. Any help appreciated. Thanks.

  4. Sorry, I tried to paste the Xaml string and it hosed up my post. Please disregard my first two posts/questions. I realize that the xaml in step 2 is just an example and is not the actual xaml.
    What I really need help on is I’m trying to set the text of a RichTextBox in the manner of your code above and it just remains blank. I’m using the RichTextBox’s Document (which is a FlowDocument) instead of declaring a FlowDocument variable as you have above. But, again, the RichTextBox just stays blank. Thanks much.

  5. Wow Phil – I feel like I missed a whole conversation here with your four comments . I sometimes call them “cardboard cut-out” conversations, where a developer comes to you seeking advice on some techinical problem in an area you dealt with a while back. By the time they’ve finished explaining the problem and you’re just grasping where they are at – they suddenly have a Eureka moment – often caused simply by them voicing the question and being forced to explain their approach. They offer thanks and then leave without you having had any input to the conversation whatsoever (as if you could as well have been a cardboard cut-out).

    Anyhow – I’m certainly pleased if anything I wrote in the post was of any use to you whatsoever, and I certainly appreciate you taking the time to leave some comments – especially this one which hopefully will be of use to others.

  6. There’s a lot of custom code here that will have bugs and inevitably fall out of date. RichTextBox supports copy / paste from HTML- could you not leverage that functionality to do conversions?

  7. The solution was to download Microsoft’s sample HtmlToXaml Converter (which actually allows conversion in both directions). Its apparently not foolproof but its certainly more than enough to convert my very simple HTML to the corresponding FlowDocument. I’m using the RichTextBox’s Document (which is a FlowDocument) instead of declaring a FlowDocument variable as you have above. But, again, the RichTextBox just stays blank

  8. Hi,

    You might want to have a look at Chris Lovett’s SGMLReader class, as this is a slot-in replacement for an XMLReader AND will read an HTML file. This then can be piped into an XSL stylesheet and transformed directly into XAML.

Comments are closed.