Recently, I’ve been working a little on an application that allows users to save, tag, bookmark links for later reading - that kind of stuff. Obviously, Web API facilitates those types of apps really well, as data can be exposed in a magnitude of formats. So I had this crazy idea - CLR to Kindle? Why not.
Unfortunately MOBI format (used by Kindle) is not that easy to support from C#, as to my knowledge there is no ready-made DLL port or SDK available. On the other hand, Amazon has created a proprietary tool called [Kindlegen][1], which is a command line tool, and allows you to convert HTML into MOBI. We’ll use that - it’s a hacky solution but it sure is a lot of fun.
Kindlegen π
Of course to start off you need to have Kindlegen. You can get it from the [Amazon website][1]. It is very simple to use - just takes a name of the HTML file as an argument and generates the MOBI file in the same folder.
C:/Tools/kindlegen>kindlegen.exe "name\_of\_the_html.html"
Another useful link would be the [Amazon Publishing Guidelines][2]. It contains all kinds of information about how to format the HTML file in order for the generated ebook to be of highest quality. I will not focus on that at all here, as that’s not the scope of the article. In fact I’ll just use some HTML copied from this very blog, and as you’ll see Kindlegen works with pretty much anything (it just might not be perfectly sharply formatted).
Application π
Our application will be a simple Web API application, off the MVC4 template in VS2010. You should copy the Kindlegen tool to the root of the website, into “kindlegen” folder.
My model is similar to what I used in other tutorials:
public class Url : IMobi
{
public int ID { get; set; }
public string Address { get; set; }
public string Title { get; set; }
public string Description { get; set; }
public string Text { get; set; }
public DateTime CreatedAt { get; set; }
public string CreatedBy { get; set; }
public string HTMLRepresentation { get; } //TODO
}
Url is a typical article type of entity I mentioned before, saved by the user. Notice it implements an IMobi interface, because that will be our contract for serializing to MOBI.
public interface IMobi
{
int ID { get; }
string HTMLRepresentation { get; }
}
The IMobi interafce defines only two things it needs for creating MOBI output - unique ID (which we’ll use for naming the file) and an HTML representation of the CLR type - which will be flushed into the MOBI ebook.
Our HTMLRepresentation property getter on the Url class could take all shapes or forms - in my example it will compose some simple HTML out of the model’s properties such as Title, Description, Text, timestamps and so on. You might do that using simple string formatting/concatenation and build up an HTML structure like that, or use Razor templating engine or any other templating solution you are happy with.
public string HTMLRepresentation
{
get
{
return string.Format(@"
</p>
# {0}
### {1}
#### By {2} - on {3}
{4}
</body>
</html>", Title, Description, CreatedBy, CreatedAt.ToShortDateString(), Text);
}
}
Notice that the IMobi interface can also be implemented on collections/aggregate types to create sets of articles rather than serializing just a single article.
Formatter π
As with any customized returned type in Web API, we"ll use a MediaTypeFormatter.
The overview of the formatter:
public class MobiMediaTypeFormatter : MediaTypeFormatter
{
const string supportedMediaType = "text/html";
public MobiMediaTypeFormatter()
{
this.AddQueryStringMapping("format", "mobi", new MediaTypeHeaderValue(supportedMediaType));
}
public override void SetDefaultContentHeaders(Type type, HttpContentHeaders headers, MediaTypeHeaderValue mediaType)
{
if (CanWriteType(type) && mediaType.MediaType == supportedMediaType)
{
headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
headers.ContentDisposition = new ContentDispositionHeaderValue("attachment");
headers.ContentDisposition.FileName = "ebook.mobi";
}
else
{
base.SetDefaultContentHeaders(type, headers, mediaType);
}
}
public override bool CanReadType(Type type)
{
return false;
}
public override bool CanWriteType(Type type)
{
if(typeof(IMobi).IsAssignableFrom(type))
return true;
return false;
}
public override Task WriteToStreamAsync(Type type, object value, Stream writeStream, HttpContent content, TransportContext transportContext)
{
//TODO
}
}
So we have everything included here, except the actual serialization proccess (writing to stream).
Few notes:
- We never deserialize from MOBI, so CanReadType property is always false
- We support text/html >media type and QueryStringMapping because if our controller is i.e. UrlController we want users to be able to type into browser api/url/1?format=mobi and get the file directly in the browser (without gaving to use any proprietary Content Type headers).
- For the same reason we set the “attachment” content disposition in the default headers
- Obviously we only support types implementing IMobi
Creating ebooks π
OK for the last piece, or the show time, if you will, we will write the stream to the MOBI ebook. We will use a little hack/trick that allows us to run command line tools (such as Kindlegen) from the C# code.
public override Task WriteToStreamAsync(Type type, object value, Stream writeStream, HttpContent content, TransportContext transportContext)
{
var mobiConvertibleObject = value as IMobi;
var serverPath = HttpContext.Current.Server.MapPath("~/kindlegen");
var tcs = new TaskCompletionSource
<object>
();</p>
<p>
var filepath = Path.Combine(serverPath, mobiConvertibleObject.ID + ".html");<br /> var mobipath = Path.Combine(serverPath, mobiConvertibleObject.ID + ".mobi");
</p>
<p>
if (!File.Exists(filepath))<br /> {<br /> using (StreamWriter outfile = new StreamWriter(filepath, false, System.Text.Encoding.UTF8))<br /> {<br /> outfile.Write(mobiConvertibleObject.HTMLRepresentation);<br /> }<br /> }
</p>
<p>
if (!File.Exists(mobipath))<br /> {<br /> var kindleGen = new Process();<br /> kindleGen.StartInfo.UseShellExecute = false;<br /> kindleGen.StartInfo.RedirectStandardOutput = true;<br /> kindleGen.StartInfo.FileName = Path.Combine(serverPath, "kindlegen.exe");<br /> kindleGen.StartInfo.Arguments = string.Format(""{0}"", filepath);<br /> kindleGen.Start();
</p>
<p>
var output = kindleGen.StandardOutput.ReadToEnd();<br /> kindleGen.WaitForExit();<br /> if (!output.Contains("Error(kindlegen)"))<br /> {<br /> if (!File.Exists(mobipath))<br /> throw new HttpResponseException(HttpStatusCode.InternalServerError);<br /> }<br /> else<br /> {<br /> throw new HttpResponseException(HttpStatusCode.InternalServerError);<br /> }<br /> }<br /> using (var filestream = new FileStream(mobipath, FileMode.Open))<br /> {<br /> filestream.CopyTo(writeStream);<br /> }<br /> tcs.SetResult(null);<br /> return tcs.Task;<br /> }<br /> ```
</p>
<p>
So what happens here - step by step:<br /> 1. We cast the object to <i>IMobi</i>, then we get the path of the /kindlegen/ folder (which if you remember we copied to our web app).<br /> 2. We need an HTML file, in order to be able to generate MOBI (we need it to be able to invoke the command line tool, in-memory representation is not enough) - so we check if the HTML file already exists on the disk, if not, we write it.<br /> 3. Then we check if the MOBI file already exists (perhaps it was generated earlier?). If not, we start a new process and pass the name of our HTML file to it as an argument. If the process does not return <i>Error(kindlegen)</i> char sequence, everything should be fine.<br /> 4. We grab the MOBI file from the disk and flush its Stream to the response stream
</p>
<p>
One note here, is that this approach causes both the HTML and the MOBI file to be generated only once, all subsequent requests for the same model will result in returning the same files from the disk (think of them as "immutable"). There is nothing stopping you from doing otherwise, and regenrate the file everytime, or perhaps use a CRC to check if the HTML representation has changed (i.e. someone modified the text).
</p>
<h3>
Wiring up
</h3>
<p>
Final step is to wire up the formatter:<br /> ```csharp
<br /> config.Formatters.Add(new MobiMediaTypeFormatter());<br /> ```
</p>
<p>
If I now request: <i>http://localhost:56660/api/url/1</i> (a normal API request), I get a predictable output:
</p>
<p>
<a href="/images/2012/09/normal_view.png"><img src="/images/2012/09/normal_view-1024x384.png" alt="" title="normal_view" width="584" height="219" class="aligncenter size-large wp-image-521" /></a>
</p>
<p>
But if I request: <i>http://localhost:56660/api/url/1?format=mobi</i>, I get a file downlaod dialogue:
</p>
<p>
<a href="/images/2012/09/mobi_download.png"><img src="/images/2012/09/mobi_download.png" alt="" title="mobi_download" width="620" height="475" class="aligncenter size-full wp-image-518" srcset="/images/2012/09/mobi_download.png 625w, /images/2012/09/mobi_download-300x229.png 300w" sizes="(max-width: 620px) 100vw, 620px" /></a>
</p>
<p>
I can download the file, and open in Calibre, an excellent ebook management tool:
</p>
<p>
<a href="/images/2012/09/mobi_in_calibre.png"><img src="/images/2012/09/mobi_in_calibre.png" alt="" title="mobi_in_calibre" width="620" height="657" class="aligncenter size-full wp-image-519" srcset="/images/2012/09/mobi_in_calibre.png 661w, /images/2012/09/mobi_in_calibre-283x300.png 283w" sizes="(max-width: 620px) 100vw, 620px" /></a>
</p>
<p>
Finally, I can obviously sent it to my Kindle:
</p>
<p>
<a href="/images/2012/09/mobi_on_kindle.jpg"><img src="/images/2012/09/mobi_on_kindle.jpg" alt="" title="mobi_on_kindle" width="620" height="827" class="aligncenter size-full wp-image-520" srcset="/images/2012/09/mobi_on_kindle.jpg 620w, /images/2012/09/mobi_on_kindle-225x300.jpg 225w" sizes="(max-width: 620px) 100vw, 620px" /></a>
</p>
<p>
Of course the formatting is not perfect, but that's all subject to adjusting the output HTML.
</p>
<h3>
Summary
</h3>
<p>
Generating MOBI out of CLR types via Web API + Kindlegen was just one of the crazy ideas I had this weekend. I hope you enjoyed the article, because I had a lot of fun playing around with this. And now, Sunday Football - so see you next time!
</p>
[1]: http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211
[2]: http://www.amazon.com/gp/redirect.html/ref=amb_link_359603402_4?location=http://kindlegen.s3.amazonaws.com/AmazonKindlePublishingGuidelines.pdf&token=321FBC360D6D2CE41E4ED829508B1F8017D89641&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-4&pf_rd_r=1B74VGTBXDC16DGXXMK1&pf_rd_t=1401&pf_rd_p=1342417002&pf_rd_i=1000765211