Automated document publishing workflow
Wrote JavaScript

Automating PDF accessibility work with JavaScript


In our work at the Information Access Group, we need to make sure every PDF we publish is accessible. After exporting a PDF from Microsoft Word or Adobe InDesign, there are a number of things that we need to fix up before we can share the PDF.

In 2019, I created a small web app, using the open-source pdfassembler library, that performs a few repetitive tasks in the structure of the PDF file. This includes:
  • "stripping" the table structure of layout tables - this means moving the content out of <Table> tag
  • setting empty paragraphs as artifacts
  • setting table header rows.
By creating this utility, I learned a lot about how PDF files work, including the difference between structure and content, annotations and bookmarks, and some idiosyncrasies of the PDF document format.

This tool has also allowed me to write short scripts that perform certain tasks in PDFs - working with the structure of a PDF programmatically has helped in a number of large document projects, for example changing footer content in every page of a 500 page document. When you're working on huge documents, this feels like a superpower!