Notes to self

Ruby for ebook publishing

A lot of times, people ask what’s Ruby good for apart from Rails. Ruby is great for various tasks from several different domains, and today, I would like to share how anybody can use Ruby in publishing ebooks.

Since I used some Ruby tasks in publishing my first-ever ebook Deployment from Scratch, it crossed my mind to write down why I think Ruby is great for publishing ebooks.

PDF publishing

There is a whole Ruby toolkit to publish technical content in AsciiDoc called Asciidoctor. It’s a great toolkit to produce PDF, EPUB 3, or even manual pages.

Here’s a list of what Asciidoctor can do for you in terms of a PDF (stolen from their page):

  • Custom fonts (TTF or OTF)
  • Full SVG support (thanks to prawn-svg)
  • PDF document outline (i.e., bookmarks)
  • Title page
  • Table of contents page(s)
  • Document metadata (title, authors, subject, keywords, etc.)
  • Configurable page size (e.g., A4, Letter, Legal, etc)
  • Internal cross-reference links
  • Syntax highlighting with Rouge (preferred), Pygments, or CodeRay
  • Cover pages
  • Page background color or page background image with named scaling
  • Page numbering
  • Double-sided (aka prepress) printing mode (i.e., margins alternate on recto and verso pages)
  • Customizable running content (header and footer)
  • “Keep together” blocks (i.e., page breaks avoided in certain block content)
  • Orphaned section titles avoided
  • Autofit verbatim blocks (as permitted by base_font_size_min setting)
  • Table border settings honored
  • Font-based icons
  • Auto-generated index
  • Automatic hyphenation (when enabled)
  • Permissive line breaking for CJK languages
  • Compression / optimization of output file

If you are thinking of publishing your first technical ebook, it’s strong contender. Just get familiar with the limitations before starting. You would use AsciiDoc the same way as Markdown, although the syntax is different:

= Hello, AsciiDoc!
Doc Writer <doc@example.com>

An introduction to http://asciidoc.org[AsciiDoc].

== First Section

* item 1
* item 2

[source,ruby]
puts "Hello, World!"

You then save AsciiDoc content with the .adoc extension and convert it by running asciidoctor (default backend generates HTML):

$ gem install asciidoctor-pdf
$ asciidoctor -b docbook5 mysample.adoc
$ asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc

And because this post is about Ruby, you can call it from Ruby:

require 'asciidoctor'

Asciidoctor.convert_file 'mysample.adoc'

And also work with the generated content directly:

html = Asciidoctor.convert_file 'mysample.adoc', to_file: false, header_footer: true
puts html

My journey went from an old gitbook version that could still generate a PDF from Markdown to Pandoc to keep the Markdown I had and enhance it with LaTex. For anything new, I would look into Asciidoctor first. You can start with their AsciiDoc Writer’s Guide.

As a side note, Asciidoctor uses the Prawn toolkit, which you can use directly for several different things. I used Prawn to build InvoicePrinter for example.

Text transformations

Since I ended up with Pandoc and a mixture of Markdown and LaTex, my default EPUB version didn’t look good. See, in my text, I might have the following:

### Third headline

Text paragraph.

\cat{Something interesting that Tiger shares.}

I made the fictional character Tiger the Cat to make the heavily technical book feel ligher and entertaining. I needed to draw a box with a Tiger picture and text next to it. And I made a LaTex \cat macro to do just that.

To my surprise, the conversion to EPUB worked, but the result was horrible. So I needed to replace this macro with an HTML snipped for the EPUB version before the transformation happened.

So I wrote a little script to find these occurrences and produce new sources for the EPUB:

#!/usr/bin/ruby
require 'fileutils'

FileUtils.rm_rf 'epub_chapters'
FileUtils.mkdir 'epub_chapters'

Dir.glob('chapters/*.md') do |file|
  chapter = File.read file
  chapter.gsub!(
    /^\\cat{(.*)}$/,
    '<div class="cat"><div class="tiger"><img src=".."" /></div>\1</div>'
  )
  name = File.basename file
  File.write "epub_chapters/#{name}", chapter
end

There are many ways to pre- or post-process text, but this was my quick way to fix the EPUB version.

Landing page

Your book might sell better with an attractive landing page. If you plan on a full-blown website, I recommend looking into Ruby static site generator called Jekyll for which I wrote some tips.

When I built my simple landing page, I realized that my chapter list went outdated while I continued to publish beta content. To that end, I decided to keep the short chapter description within the chapter Markdown source like this:

# Processes

<!--
headline: A closer look at Linux processes. CPU and virtual memory, background processes, monitoring, debugging, systemd, system logging, and scheduled processes.
-->

Running a web application is essentially running a suite of related programs concurrently as processes. Spawning a program process can be as simple as typing its name into a terminal, but how do we ensure that this program won't stop at some point? We need to take a closer look at what Linux processes are and how to bring them back to life from failures.
...

And I wrote a Ruby script that takes this meta information, makes HTML out of it, and updates the landing page:

#!/usr/bin/ruby
require 'redcarpet'

BOOK_DIR="/home/strzibny/Projects/deploymentfromscratch"
CHAPTER_DIR="#{BOOK_DIR}/chapters"

class Chapter
  attr_reader :index, :title, :headline, :html

  def initialize(index:, title:, headline:, html:)
    @index = index
    @title = title
    @headline = headline
    @html = html
  end
end

chapters = []

Dir.glob("#{CHAPTER_DIR}/*.md").sort.drop(1).each.with_index(1) do |file, index|
  content = File.read(file)
  title = content.scan(/^\# (.*)$/).first&.first
  headline = content.scan(/^headline: (.*)$/).first&.first

  if title
    html = Redcarpet::Markdown.new(Redcarpet::Render::HTML.new).render(content)
    chapters << Chapter.new(index: index, title: title, headline: headline, html: html)
  end
end

# ...and later...

BOOK_PAGE = "../index.html"

sections = chapters.map do |chapter|
  <<-EOF
    <div class="chapter">
      <strong>#{chapter.index}. #{chapter.title}</strong>
      <p>
        #{chapter.headline}
      </p>
    </div>
  EOF
end.join("\n")

page = File.read BOOK_PAGE
new_page = page.gsub(
  /<!--CHAPTERS START-->.*<!--CHAPTERS END-->/m,
  "<!--CHAPTERS START-->\n#{sections}\n<!--CHAPTERS END-->"
)
File.open(BOOK_PAGE, "w") { |file| file.puts new_page }

So remember, you can work with your sources and automate the landing page management. I used redcarpet gem for Markdown processing, and they are also other useful gems like front_matter_parser.

PDF previews

While I was writing the alpha and beta releases of Deployment from Scratch, I wanted to send a preview from time to time. The obvious way is to limit the pages you render and perhaps use a PDF editor to insert something else. Or you can use Ruby.

Ruby ecosystem features a nice PDF toolkit called HexaPDF that can be used to cut the pages you want and interleave them with other pages (an introduction, a call to action, a reminder, or final words for the preview). An example:

#!/usr/bin/ruby
require 'hexapdf'

demo = HexaPDF::Document.open("output/book.pdf")

preview = HexaPDF::Document.new

demo.pages.each_with_index { |page, page_index|
  if [0].include? page_index
    blank = preview.pages.add.canvas
    blank.font('Amiri', size: 25, variant: :bold)
    blank.text("This is a preview of Deployment from Scratch", at: [20, 800])
    blank.font('Amiri', size: 20)
    blank.text("Follow the book updates at https://deploymentfromscratch.com/.", at: [20, 550])
    blank.text("Write me what you think at strzibny@gmail.com.", at: [20, 500])
    blank.text("Or catch me on Twitter at https://twitter.com/strzibnyj.", at: [20, 450])
    blank.font('Amiri', size: 10)
    blank.text("Copyright by Josef Strzibny. All rights reserved.", at: [20, 20])
  end
}

preview.write("output/preview.pdf", optimize: false)

If you don’t need to add custom content, you can also use HexaPDF from a command line to just merge various pages from one or many PDFs:

$ hexapdf merge output/toc.pdf --pages 1-10 output/book.pdf --force

Image previews

I covered cutting out PDF previews, but I also wanted to include nice little image previews for my landing page. To that end, I separated the final PDF into individual PDF pages and converted them to images.

Although there are various PDF utilities, it’s easy to stick with HexaPDF for the first part of the job:

#!/usr/bin/ruby
require 'fileutils'
require 'hexapdf'

FileUtils.rm_rf 'preview'
FileUtils.mkdir 'preview'

file = "output/deploymentfromscratch.pdf"

pdf = HexaPDF::Document.open(file)

pdf.pages.each_with_index do |page, index|
  target = HexaPDF::Document.new
  target.pages << target.import(page)
  target.write("preview/#{index+1}.pdf", optimize: true)
end

Once I have individual PDFs, I go through them again and convert them to images with Ruby binding to vips:

#!/usr/bin/ruby
require 'fileutils'
require 'vips'

Dir.glob('preview/*.pdf') do |file|
  im = Vips::Image.new_from_file file, scale: 2.5
  im.write_to_file("#{file}.jpg")
end

You can notice I had to increase the scale, otherwise the result is of poor quality.

Once I have individual images I just insert them to my landing page. But you can also extend your Ruby task to do it for you automatically.

Customers’ management

I built a waitlist of more than 600 people before releasing Deployment from Scratch, and many of the people on the list became customers. But you see, I use Gumroad for selling the book and Mailchimp for the waitlist. Two different products and two separate lists.

What if I want to send a reminder or special offer to people that didn’t buy the book yet? I certainly don’t want to bother my current customers with an email they don’t need. Or what if I want to find out the total conversion rate of the waitlist?

Both tools offer to export the dataset, so all we need is a little bit of Ruby:

#!/usr/bin/ruby
require 'csv'

# Customers from Gumroad
customers = 'customers_sep14_2021.csv'

# Waitlist
list = 'subscribed_segment_export_48995a2a64.csv'

customer_rows = CSV.read(customers)
buyer_emails = []

(1..customer_rows.count-1).each do |num|
  email = customer_rows[num][4]
  buyer_emails << email if email
end

list_emails = []
list_rows = CSV.read(list)

(1..list_rows.count-1).each do |num|
  email = list_rows[num][0]
  list_emails << email if email
end

# Who didn't bought the book yet
not_bought = list_emails - buyer_emails

puts not_bought.count

If this is something you might want to do often, you can extend this to use the APIs directly without going through the manual download of the dataset.

Maintainable tasks

Although I started with a Makefile to stay on top of all these tasks, if Make is not in your blood, there is nothing easier than writing these tasks as Rake tasks:

task :generate_pdf do
  `asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc`
end

task :prepare_preview do
  # ..
end

And call it by running rake:

$ rake generate_pdf

Build environment

It’s a good idea to keep your book production environment intact. For one, new versions of various tools can break your original workflow and rendering. Or you might forget all the LaTex packages that got installed in the process.

As with other projects, you can use Vagrant, and it’s Ruby-powered Vagrantfile to keep your project in the same state and survive the unexpected. A starting Vagrantfile can be kept simple with just a little bit of Ruby and Bash:

Vagrant.configure(2) do |config|
  config.vm.box = "fedora-33-cloud"

  config.vm.synced_folder ".", "/vagrant", type: :nfs, nfs_udp: false

  config.vm.provision "shell", inline: <<-SHELL
sudo dnf update -y || :

# Install dependencies...
sudo dnf install ruby pandoc -y || :
SHELL

end

Conclusion

So there you have it – yet another domain where Ruby can help you, and perhaps even shines above the competition. A whole publishing toolkit, a Make-like build utility, PDF toolkits, and Ruby’s power to write simple scripts for text manipulation.

Check out my book
Deployment from Scratch is unique Linux book about web application deployment. Learn how deployment works from the first principles rather than YAML files of a specific tool.
by Josef Strzibny
RSS