Fixing Manning EPUB Code Spans

I've long been a fan and vocal advocate of Manning Publications. Manning books have high-quality content, on par with books from O'Reilly. Broadly speaking, a Manning book has the personal feel of a user manual, while a O'Reilly book gives the impression of an indispensible reference. When Manning first started offering DRM-free electronic versions of their books in PDF format back in 2004, I quickly sent an enthusiatic thank you over to Marjan Bace and the company. Later additions of EPUB and MOBI formats have been equally welcome. Unfortunately MOBI support has not kept up with later Kindle reading devices, and the last two years of produced EPUB files have a large (literally!) problem with inline code snippets which makes reading unbearable.

Rocky MOBI Road

Manning's most recent support of Amazon.com's MOBI format has been less than stellar. Around 2018 Manning switched to a new vendor for typesetting at least some of the books, and all sorts of problems appeared, including code blocks that were not in monospace and instead were underlined. I contacted Manning's production manager and provided descriptions, screen shots, and suggestions.

The most blatent code block formatting issues went away, but other problems continued to appear from time to time, from large margins to content cruft to incorrect metadata. At the end of 2022 Manning MOBI books show problems across the board, including:

Worst of all, when Amazon.com updated the firmware on its latest Kindle ebook reading devices (including Kindle Oasis and Kindle Scribe), it became impossible to zoom in on images in Manning MOBI books. When you touch and hold an image to view it by itself, Kindle cuts off part of the image with the text "Back to loc XXX" and "Stay here". It seems that Kindle somehow thinks that the image is a link and viewing it has navigated the user to another location. Even if the user clicks "stay here", it is no longer possible to zoom the image or move it around, and part of the image is cut off from the message that appeared. This does not happen with MOBI books from other publishers; something about how Manning is processing embedded images for MOBI files makes Kindle think that they are links.

I contacted Manning in July 2022, providing complete reproduction instructions. They completely ignored me until I contacted them again in November, four months later. Rather than actually investigating the problem with an actual Kindle, it appears they only looked at the book on a software simulater and threw me a weak "unable to reproduce". After I asked on which Kindle model they had tried to reproduce the problem, the product developer told me his "[K]indle is broken" (seriously?!) and then refused to answer me further about investigations.

Exit MOBI, Enter EPUB

In the meantime Amazon.com has decided to stop using as a publishing format MOBI and instead (finally!) switch to the industry-standard EPUB format. (See The MOBI File is Dead: Long Live the EPUB.) Now Amazon.com supports raw EPUB files in its Send to Kindle service. Manning has long provided EPUB versions along with MOBI. It turns out that uploading EPUB rather than MOBI for Kindle solves all of the MOBI formatting issues listed above, such as misaligned bullets and missing background shading!

Manning's MOBI formats are broken, and Amazon.com no longer supports them. Manning should stop producing MOBI files. There is no point in producing an error-filled product that has been abandoned anyway.

EPUB Inline Code Formatting Problems.

Unfortunately all is not roses with Manning's EPUB files. All of the latest Manning EPUB files since around 2020 inexplicably make inline code spans huge, resulting in paragraphs that are visually jarring and hard to read. Here is a snippet from the EPUB stylesheet of Algorithms and Data Structures for Massive Datasets for example:

.fm-code-in-text {
    font-family: monospace;
    font-size: 1.29167em;
    line-height: 1.4
    }

Here's an example of the EPUB source code that uses this:

<span class="fm-code-in-text">(comment-id</span> <span class="fm-code-in-text">-&gt;</span>
    <span class="fm-code-in-text">frequency</span>
Oversized inline code in _Algorithms and Data Structures for Massive Datasets_
Oversized inline code in Algorithms and Data Structures for Massive Datasets.

From the source code you can see that the automated conversion did a poor job semantically, not realizing that whitespace between code spans should be included in the span. But that's not the biggest problem. For some reason Manning decided that that inline code should be around 30% larger than the surrounding text! The only explanation I can think of is that Manning's production team has only been testing books on an old Kindle software simulator and have no idea how the books like on recent Kindle devices. The result is that reading Algorithms and Data Structures for Massive Datasets is looks like that shown in the figure when read on a modern Kindle.

I again contacted Manning, providing screen shots, descriptions, source code, comparisons, and suggestions. I offered to provide my expertise for free, and even to have teleconference calls with the production team, in order to fix this problem for all their customers. But Manning has stopped responding to me altogether, both for the MOBI and EPUB issues.

Fixing Manning EPUBs with PowerShell

Since all Manning EPUB ebooks for the past couple of years are thus basically unreadable on the latest Kindle devices, it looks like I'll have to do the job of their production team and fix them myself. The overall process looks like this:

  1. Explode the EPUB files to a temporary directory. (EPUB files are just ZIP archives with additional restrictions.)
  2. Find the main stylesheet, named stylesheet.css.
  3. Change the .fm-code-in-text CSS styles to simply {font-family: monospace;}.

There are a few details to be aware of:

.fm-code-in-text {
    font-family: monospace;
    font-size: 1.29167em;
    line-height: 1.3
    }
.fm-code-in-text1 {
    font-family: monospace;
    font-size: 1em
    }
.fm-code-in-text2 {
    font-family: monospace;
    font-size: 1.2em;
    line-height: 1.3
    }
.fm-code-in-text3 {
    font-family: monospace;
    font-size: 1em;
    line-height: 1.3
    }
.calibre16 {
    font-family: "Liberation Mono", monospace
    }

Utility versus sed versus PowerShell

What is the best way to perform this processing? One approach would be to create some full-fledged EPUB production pipeline. That might be useful at some point if it turns out I need to give complete makeovers to Manning EPUBs in the future. But for the moment just a quick fix is something that should be able to be accomplished using some simple script.

My first inclination would be to use Bash and something like sed, because *nix tools excel at something like this, right? Not so much, it turns out. Unzipping and rezipping seems complex but surmountable in Bash. But Using sed to Replace a Multi-Line String seems way too complicated. I should be able to just read the file into memory and do a multiline replace. Maybe this is doable with some Linux tool.

Surprisingly and happily, PowerShell provides ZIP support out of the box and Multi-line Regular Expression Replace in Powershell is pretty straightforward and easy. I was wanting to improve my PowerShell skills anyway, so that's the way I went.

Script for Fixing EPUB

I provide the full PowerShell scripts at the end, but I'll first step through the major sections. I wanted to start with some FooBar.epub file and produce some FooBar-fixed.epub file with the inline code style corrected. I set up my the following parameters thusly in my Powershell script

[CmdletBinding()]
Param(
  [Parameter(Mandatory, HelpMessage="Input Archive Path")] [String] $inputArchive,
  [Parameter(HelpMessage="Output Archive Path")] [Alias("o")] [String] $outputArchive,
  [Parameter(HelpMessage="Base Filename Fix Suffix")] [String] $outputFilenameSuffix = '-fixed'
)

$ErrorActionPreference = "Stop"

$inputArchive = Convert-Path $inputArchive # get absolute form of input archive

if(!$outputArchive) { # determine default output archive if not specified
  $newFilename = [IO.Path]::GetFileNameWithoutExtension($inputArchive)
      + $outputFilenameSuffix + [IO.Path]::GetExtension($inputArchive)
  $outputArchive = Join-Path ([IO.Path]::GetDirectoryName($inputArchive)) $newFilename
}

The idea is to have the script automatically determine the output filename by using the base name FooBar and tacking the -fixed suffix on the end, before the extension. Note that I could have hard-coded the .epub extension, but I prefer to use the appropriate PowerShell path-handling functions so that the script will work with any extension (e.g. .zip). I can override the suffix if I want to, but it defaults to -fixed. To process FooBar.epub and produce FooBar-fixed.epub using the default arguments, I can simply use:

manning-fix-inline-code-font-size 'FooBar.epub'

I then extract the archive to temporary directory in the same directory as the input EPUB file, adding a .tmp extension. Note the use of try { … } finally { … }. I always want to remove the temporary directory, even if something goes wrong.

## uncompress archive to temporary directory in same directory as input archive
$tempArchiveDir = $inputArchive + '.tmp'
Expand-Archive -LiteralPath $inputArchive -DestinationPath $tempArchiveDir -Force

try {

  …

}
finally {

  ## remove temporary directory
  Remove-Item -LiteralPath $tempArchiveDir -Force -Recurse

}

Now for the actual processing, inside the try { … } block. First I need to find the stylesheet, and produce an error if not found.

  ## process `stylesheet.css`
  $stylesheetFileInfo = Get-ChildItem -Path $tempArchiveDir -Recurse -Filter 'stylesheet.css' -File
  if(-not $stylesheetFileInfo) {
    throw "Archive $inputArchive does not contain expected stylesheet ``stylesheet.css``."
  }
  $stylesheetFile = $stylesheetFileInfo.FullName

Next is to read all the stylesheet text, do the replacement, and write it back to the file:

  $oldStylesheetText = [IO.File]::ReadAllText($stylesheetFile)
  $newStylesheetText = ($oldStylesheetText -replace '(?ms)(?<selector>\.fm-code-in-text\d?)\s*{[^}]*}', '${selector} {font-family: monospace;}')
if([object]::ReferenceEquals($oldStylesheetText, $newStylesheetText)) {
    Write-Warning "Archive stylesheet ``$([IO.Path]::GetRelativePath($tempArchiveDir, $stylesheetFile))`` did not contain inline code needing fixing."
    return; # skip further processing of the archive
  }
  Set-Content -LiteralPath $stylesheetFile -Value $newStylesheetText

As explained in Multi-line Regular Expression Replace in Powershell, I need to use (?ms) in my regular expression to match multiple lines. (I couldn't find where this is officially documented, unfortunately.) I use ?<selector> to name the matching group containing the style name so that I can use it in the replacement, because there may be multiple selectors with slightly different names as explained above. See Regular expressions substitutions for more information on regex in PowerShell.

If the EPUB file is an older Manning file without the huge font problem, there won't be a match and we need to stop processing with a warning. How to find out if a "-replace" actually did anything is a little tricky (and roundabout): apparently the documentation promises to return the same reference if there was no match, so we can use if([object]::ReferenceEquals($oldStylesheetText, $newStylesheetText)). (The other approach would be to compare the entire string contents.)

After processing we need to recompress the archive.

  ## recompress archive
  Compress-Archive -Path (Join-Path $tempArchiveDir '*') $outputArchive

  return $outputArchive

Tweaking Wrapper Script

The reason we return the path to the output archive is so that we can use it in further processing. I wanted to create a general script for fixing the EPUB files, but I also wanted to performan additional processing to help with my workflow. (After all I have years of Manning books I now need to delete from Amazon.com cloud storage and then send fixed EPUB versions of them to Kindle.) I wanted to automatically supply a different suffix of tweaked, but then copy the fixed version to a separate staging directory for uploading without the suffix (so that Send to Kindle would not show all my Manning books with "[tweaked]" in the title).

$tweakedEpub = manning-fix-inline-code-font-size $inputArchive -outputFilenameSuffix ' [tweaked]'
if($tweakedEpub) {
  Copy-Item -LiteralPath "$tweakedEpub" -Destination "$(Join-Path 'C:\todo\send-to-kindle' ([IO.Path]::GetFileName($inputArchive)))"
}

Outliers

There are a few Manning books that seemed to have been produced using a different workflow. Notably the "Grokking" series, such as Grokking Simplicity, have inline code style classes with completely different names. For the most part code styles either leave the font size unchanged, or make it smaller (which is less intrusive visually than making it bigger). There are only a few books from this series anyway, so for the moment I'm leaving them as-is.

However Grokking Streaming Systems has the large-font problem described here, but uses a different style name:

.codechar {
    font-family: monospace;
    font-size: 1.29167em;
    line-height: 1.4
    }
.codechar1 {
    font-family: monospace;
    font-size: 1.2em;
    line-height: 1.4
    }

I could update the script to cover these additional names, but if it turns out only to apply to one or two books, I'll probably just convert them by hand.

Data Wrangling with JavaScript uses a totally different production layout, but the inline code is left to the same size. It would appear that someone actually went through and put together the stylesheet by hand to some extent, or at least tweaked some web stylesheet. It doesn't need fixing, but it's interesting to ponder what the production history might have been.

code, var, span.CodeInText {
    font-family: monospace;    
    font-size: inherit;
}

Full Script Source Code

The following is the complete source code for the scripts. They should work as written. manning-fix-inline-code-font-size.ps1 can be used alone. manning-tweak-epub.ps1 is optional and serves only to show how the workflow can be expanded.

manning-fix-inline-code-font-size.ps1

#!/usr/bin/env pwsh
# Manning Fix Inline Code Size by Garret Wilson
# Copyright © 2022 GlobalMentor, Inc.
#
# Fixes the inline code size in a Manning EPUB document `stylesheet.css` file.
# The following CSS:
# ```css
# .fm-code-in-text {
#     font-family: monospace;
#     font-size: 1.29167em;
#     line-height: 1.3
#     }
# .fm-code-in-text1 {
#     font-family: monospace;
#     font-size: 1em
#     }
# .fm-code-in-text2 {
#     font-family: monospace;
#     font-size: 1.2em;
#     line-height: 1.3
#     }
# .fm-code-in-text3 {
#     font-family: monospace;
#     font-size: 1em;
#     line-height: 1.3
#     }
#```
#Is converted to:
#```css
# .fm-code-in-text {font-family: monospace;}
# .fm-code-in-text1 {font-family: monospace;}
# .fm-code-in-text2 {font-family: monospace;}
# .fm-code-in-text3 {font-family: monospace;}
#```

[CmdletBinding()]
Param(
  [Parameter(Mandatory, HelpMessage="Input Archive Path")] [String] $inputArchive,
  [Parameter(HelpMessage="Output Archive Path")] [Alias("o")] [String] $outputArchive,
  [Parameter(HelpMessage="Base Filename Fix Suffix")] [String] $outputFilenameSuffix = '-fixed'
)

$ErrorActionPreference = "Stop"

$inputArchive = Convert-Path $inputArchive # get absolute form of input archive

if(!$outputArchive) { # determine default output archive if not specified
  $newFilename = [IO.Path]::GetFileNameWithoutExtension($inputArchive) + $outputFilenameSuffix + [IO.Path]::GetExtension($inputArchive)
  $outputArchive = Join-Path ([IO.Path]::GetDirectoryName($inputArchive)) $newFilename
}

## uncompress archive to temporary directory in same directory as input archive
$tempArchiveDir = $inputArchive + '.tmp'
Expand-Archive -LiteralPath $inputArchive -DestinationPath $tempArchiveDir -Force

try {

  ## process `stylesheet.css`
  $stylesheetFileInfo = Get-ChildItem -Path $tempArchiveDir -Recurse -Filter 'stylesheet.css' -File
  if(-not $stylesheetFileInfo) {
    throw "Archive $inputArchive does not contain expected stylesheet ``stylesheet.css``."
  }
  $stylesheetFile = $stylesheetFileInfo.FullName
  $oldStylesheetText = [IO.File]::ReadAllText($stylesheetFile)
  # see [Multi-line Regular Expression Replace in Powershell](https://www.apharmony.com/software-sagacity/2014/08/multi-line-regular-expression-replace-in-powershell/)
  # see [Regular expressions substitutions](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_comparison_operators?view=powershell-7.3#regular-expressions-substitutions)
  $newStylesheetText = ($oldStylesheetText -replace '(?ms)(?<selector>\.fm-code-in-text\d?)\s*{[^}]*}', '${selector} {font-family: monospace;}')
  # the [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.replace#system-text-regularexpressions-regex-replace(system-string-system-string-system-string)) promises to return the same reference if there was no match
  if([object]::ReferenceEquals($oldStylesheetText, $newStylesheetText)) {
    Write-Warning "Archive stylesheet ``$([IO.Path]::GetRelativePath($tempArchiveDir, $stylesheetFile))`` did not contain inline code needing fixing."
    return; # skip further processing of the archive
  }
  Set-Content -LiteralPath $stylesheetFile -Value $newStylesheetText

  ## recompress archive
  Compress-Archive -Path (Join-Path $tempArchiveDir '*') $outputArchive

  return $outputArchive
}
finally {

  ## remove temporary directory
  Remove-Item -LiteralPath $tempArchiveDir -Force -Recurse

}

manning-tweak-epub.ps1

#!/usr/bin/env pwsh
# Manning Tweak EPUB by Garret Wilson
# Copyright © 2022 GlobalMentor, Inc.
#
# Calls `manning-fix-inline-code-size.ps1` with default ` [tweaked]` base filename suffix.

[CmdletBinding()]
Param(
  [Parameter(Mandatory, HelpMessage="Input Archive Path")] [String] $inputArchive
)

$ErrorActionPreference = "Stop"

$tweakedEpub = manning-fix-inline-code-font-size $inputArchive -outputFilenameSuffix ' [tweaked]'
if($tweakedEpub) {
  Copy-Item -LiteralPath "$tweakedEpub" -Destination "$(Join-Path 'C:\todo\send-to-kindle' ([IO.Path]::GetFileName($inputArchive)))"
}