HTML to FlowDocument Converter

I have an existing application that I wrote that stores “notes” in HTML format. I actually used the embedded HTML editor in IE to allow the user to create text – similar to what a RichText control would be used for but outputting to HTML rather than (the even more horrid) RTF.

I’m now in the process of upgrading that application to WPF, so its only natural that I want to display these notes using one of the WPF FlowDocument viewer controls. The problem I encountered was how to convert my HTML to something that could be nicely displayed in the FlowDocument?

Step 1 – Converting HTML to XAML

The solution was to download Microsoft’s sample HtmlToXaml Converter (which actually allows conversion in both directions). Its apparently not foolproof but its certainly more than enough to convert my very simple HTML to the corresponding FlowDocument.

Using the HtmlToXamlConverter classes ConvertHtmlToXaml we can take a HTML string and convert to a XAML string, e.g. from:

    <p>The <b>Markup</b> that is to be converted.</p>

to:

    <FlowDocument>
<Paragraph>The <Run FontWeight="bold">Markup</Run> that is to be converted.</Paragraph>
</FlowDocument>

Step 2 – Converting XAML markup into a FlowDocument instance

This was certainly a great start but it converts HTML text to XAML text – not actual objects. So the next hurdle was how to convert the XAML document markup at runtime and insert within a FlowDocument?

The solution was fairly easy to find thanks to Google and Ronald Clifford – although I can’t say I think its obvious.

    FlowDocument flowDocument = new FlowDocument();
string xaml = "<p>The <b>Markup</b> that is to be converted.</p>";
using (MemoryStream msDocument = new MemoryStream((new ASCIIEncoding()).GetBytes(xaml)))
{
TextRange textRange = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd);
textRange.Load(msDocument, DataFormats.Xaml);
}

Step 3 – Using DataBinding

So things were almost coming together. The last step was to perform this conversion as easily as possible – which meant using DataBinding. I had a list of items displayed, each with a field containing a string field that contained the HTML markup (as populated from a database using LINQ). So I wanted to be able to bind the FlowDocumentScrollViewer to the field containing the HTML and for things to work themselves out automatically. This required yet another IValueConverter class to use on the data binding.

    public class HtmlToFlowDocumentConverter : IValueConverter
    {
public object Convert(object value, Type targetType, object parameter, 
System.Globalization.CultureInfo culture) { if (value != null) { FlowDocument flowDocument = new FlowDocument(); string xaml = HtmlToXamlConverter.ConvertHtmlToXaml(value.ToString(), false); using (MemoryStream stream = new MemoryStream((new ASCIIEncoding()).GetBytes(xaml))) { TextRange text = new TextRange(flowDocument.ContentStart, flowDocument.ContentEnd); text.Load(stream, DataFormats.Xaml); } return flowDocument; } return value; } public object ConvertBack(object value, Type targetType, object parameter,
System.Globalization.CultureInfo culture) { throw new NotImplementedException(); } }

Note: The HtmlToXmlConverter class above is simply the Microsoft sample.

The Result

This means I can now use this converter anywhere in my XAML that I wish to display HTML content within a FlowDocument control.

    <Page.Resources>
        <conv:HtmlToFlowDocumentConverter x:Key=”htmlToXamlConverter”/>
    </Page.Resources>
    …
    <FlowDocumentScrollViewer Document=”{Binding Path=Events/Diaries/Comment,
                              Converter
={StaticResource htmlToXamlConverter}}”/>

Blast from the Past – TI-99/4A

A few weeks ago my parents were packing away their Christmas decorations in their loft and whilst they were up there they bought down a whole bunch of old paraphernalia that we’d stored up there when we were kids.

Amongst the assorted goodies was an amazing collection of Lego, a racing car set that is compatible with the one my son got for Christmas and to my utter delight one Texas Instruments TI-99/4A home computer in perfect working order. Packed away complete with a tape recorder (persistent storage) and PAL UHF converter (sound/video output).

For those that aren’t familiar with this amazing piece of technology – it has some fairly modest specs – 16k RAM, 32k ROM, 40×24 character output with each character formed from the standard 8 by 8 gird (e.g. 320 x 200 pixels). The ROM comes complete with TI-Basic although things only really got interesting when you had the TI-Extended Basic module installed. Now I remember programming this little beast for many, many hours – back in the good old days when BASIC required line numbers, multiple statements per line (as you can still do today with the colon) and you could only edit one line at a time. That’s right, editing was performed using the LIST {line number range} then entering the line number you wanted to edit and pressing the down arrow to enter line edit mode. Amazing considering I had programs with several thousands of lines of code – each averaging 3 or 4 statements.

So I power it up – tune the TV to UHF channel 37 and hey-presto the screen is displayed in all its 16 colour glory. Hmm… this can’t be to hard – so I write a quick "program" – obviously starting with "Hello World" via the PRINT statement. Next I try a FOR loop with a PRINT and I’m greeted by the following error message:

* CAN’T DO THAT

Priceless. I try a couple of other attempts and am rewarded with:

* INCORRECT STATEMENT
* STRING-NUMBER MISTMATCH

DSC07847

What an awesome machine… and great looking to boot. So much better looking than the Vic-20 and Commodore 64 that were very cheap looking.

DSC07848

——————-

Of course having a very strong fan base this machine also has a emulator that is alive and well. I dare you to give Parsec a go!

Thumbnail Solutions

So based on last nights performance troubles loading thumbnail images I spent a couple of minutes “googling” and discovered the following useful links:

http://blogs.msdn.com/dditweb/archive/2007/08/22/speeding-up-image-loading-in-wpf-using-thumbnails.aspx

http://msdn2.microsoft.com/en-us/library/system.windows.media.imaging.bitmapframe_members.aspx

I tried the solution in the first link which involved creating a converter to use so that binding to a URI can result in a BitmapSource that uses the DecodePixelWidth/Height property.

    public class UriToBitmapConverter : IValueConverter
    {
public UriToBitmapConverter()
{
DecodeResolution = 100;
}
public int DecodeResolution { get; set; }
public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
{
BitmapImage bi = new BitmapImage();
bi.BeginInit();
bi.DecodePixelWidth = DecodeResolution;
bi.CacheOption = BitmapCacheOption.OnLoad;
bi.UriSource = new Uri( value.ToString() );
bi.EndInit();
return bi;
}
public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
{
throw new Exception("The method or operation is not implemented.");
}
}

This converter is then hooked up in the XAML something like as follows.

<Window x:Class="ThumbnailLoading.Window1"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:local="clr-namespace:ThumbnailLoading"
    Title="Window1" Height="300" Width="300">
    <Window.Resources>
        <local:PhotoCollection x:Key="photos" x:Name="photos"/>
        <local:UriToBitmapConverter x:Key="uriToBitmapConverter" DecodeResolution="60"/>
    </Window.Resources>
    <StackPanel>
        <ListView ItemsSource="{StaticResource photos}">
             <ListView.ItemTemplate>
                <DataTemplate>
                    <Image Width="60" 
Source
="{Binding Path=Uri, Converter={StaticResource uriToBitmapConverter}}"/> </DataTemplate> </ListView.ItemTemplate> </ListView> </StackPanel> </Window>

Whilst this is an improvement as I suggested in the previous post from the folks at DDITDev – it is still too slow.

So I spent some time reading the MSDN documentation. The BitmapFrame class seemed to have a lot of cool functionality – including the ability to access image metadata. They even have properties for the commonly used properties such as Camera Model and Rating – very nice! However, of most interest was the Thumbnail property which suggested that it would return the thumbnail stored with the image. I used the Binding converter idea and created my own converter as follows.

    public class UriToThumbnailConverter : IValueConverter
    {
public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
{
BitmapFrame bi = BitmapFrame.Create(new Uri(value.ToString()), 
BitmapCreateOptions.DelayCreation,
BitmapCacheOption.OnDemand); return bi.Thumbnail; } public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture) { throw new Exception("The method or operation is not implemented."); } }

Then made the following changes to the XAML to declare and use the new binding converter.

<local:UriToThumbnailConverter x:Key=”uriToThumbnailConverter”/>

<Image Width=”60″ Source=”{Binding Path=Uri, Converter={StaticResource uriToThumbnailConverter}}”/>

This actually seemed to do the trick. I provided some statistics below which show the performance benefits. Of course in reality – as with PhotoPlay – any expensive image loading would normally happen on a background thread so maybe the binding converter isn’t really the best way to manage this.

  Bind direct to Uri Bind to Uri using DecodePixelWidth Bind to Uri using embedded Thumbnail
Loading 301 approx 280kb (1.1 megapixel) images 1 minutes
1.2 Gb
30 seconds
120 Mb
4 seconds
180 Mb
Loading 1102 approx 2Mb (5 megapixel) images Yeah right! 4 minutes 35 seconds
201 Mb
14 seconds
310 Mb

Note that the DecodePixel width method using the UriToBitmapConverter actually uses less memory because the embedded Thumbnails are actually bigger that the decode width of 60 pixels that I specified.

I also found this Microsoft sample application WPF Photo Viewer Demo which shows how Thumbnails and metadata can be extracted from digital photos.

Thumbnail Images

When I wrote my little PhotoPlay applet I quickly realised that loading large numbers of images into memory was slow and enormously expensive in terms of memory use. I remember the first time I pointed PhotoPlay at my year 2004 photo folder – I think I managed to kill the task at around 3Gb of virtual memory use.

So I did a quick search and found a great bit of code from Kourosh Derakshan that allows you to use the thumbnail images that are embedded into photos by almost all digital cameras. The images are actually stored as metadata against the real image along with all the other EXIF metadata tags such as camera model, flash type, focal length etc.

Using this code I was able to load 100s of photos in several seconds, as opposed to minutes, and the memory usage was amazingly small. Of course the gotcha is that if the image doesn’t have a thumbnail this method won’t help you. If the image didn’t come from a camera then chances are it won’t have a thumbnail – so really this is for digital photos only. [The main reason the thumbnail image is stored is because this is the image that the cameras use themselves when you browse photos on the camera.]

The key to this code is the System.Drawing.Image.FromStream() method. The overload used has a parameter “validateImageData” that if set to false means the image isn’t loaded into memory. Not obvious and certainly not apparent from reading the documentation! You end up with an Image instance that still allows you access the metadata properties without having any overhead of processing the image itself.

    // Written by Kourosh Derakshan - minor mods by Nigel Spencer.
        /// <summary>
        /// Gets the thumbnail from the image metadata. Returns null if no thumbnail
        /// is stored in the image metadata
        /// </summary>
        /// <param name="path">The full path to the image.</param>
        /// <returns>The thumbnail image contained within the metadata (EXIF) of the image, or 
/// <see langword="null"/> if no thumbnail data is present.</returns> /// <remarks> /// The ExifTag metadata is copied from the original image to the thumbnail that is returned. /// </remarks> public static Image GetThumbnail (string path) { FileStream fileStream = null; try { fileStream = File.OpenRead(path); // Last parameter tells GDI+ not the load the actual image data Image originalImage = Image.FromStream(fileStream, false, false); // GDI+ throws an error if we try to read a property when the image // doesn't have that property. Check to make sure the thumbnail property // item exists. bool propertyFound = false; for (int i = 0; i < originalImage.PropertyIdList.Length; i++) if (originalImage.PropertyIdList[i] == (int)ExifTags.ThumbnailData) { propertyFound = true; break; } if (!propertyFound) return null; PropertyItem thumbnailPropertyItem = originalImage.GetPropertyItem((int)ExifTags.ThumbnailData); // The image data is in the form of a byte array. Write all // the bytes to a stream and create a new image from that stream byte[] imageBytes = thumbnailPropertyItem.Value; MemoryStream stream = new MemoryStream(imageBytes.Length); stream.Write(imageBytes, 0, imageBytes.Length); Image thumbnailImage = Image.FromStream(stream); // Copy all the original properties to the thumbnail. for (int i = 0; i < originalImage.PropertyIdList.Length; i++) { PropertyItem itemToCopy = originalImage.GetPropertyItem(originalImage.PropertyIdList[i]); thumbnailImage.SetPropertyItem(itemToCopy); } originalImage.Dispose(); return thumbnailImage; } finally { if (fileStream != null) { fileStream.Dispose(); } } }

So this method has been very useful to me, but in the last week I’ve been trying to do something similar in WPF. My first attempt was to just jump straight in and bind a ListBox to some images. This was incredibly slow to load – I mean horrendous! I wanted to use the GetThumbnail method but of course WPF doesn’t understand System.Drawing.Image – I need to use System.Windows.Media.ImageSource instead. I’ve messed briefly with DecodePixelWidth and Height but these only seem to control the size of the image that is held in memory (which is good) but it still takes forever to produce the thumbnail from the full image. I also noticed that the BitmapFrame class, not the BitmapSource seems to be more helpful because it allows Thumbnail and metadata access.

Well – its late – so it’ll have to be a job for tomorrow. Hopefully I’ll find an equivalent way of doing this in WPF that works and be able to blog the solution. Assuming I can get it working in WPF I’ll also include some performance comparisons against the method shown here.