Explore

Using the PDF Extract Pack

The PDF Extract Pack lets you extract text from PDF files in Coda. PDF Extract does not connect to any external services and uses Mozilla's pdf.js library to process PDFs, directly on Coda’s servers. It uses the following open-source libraries: pdf.js-extract, pdf.js, and pureimage.

Installation

Find the PDF Extract Pack in the “Insert” menu in Coda, or alternatively add to doc here:

⁠

PDF Extract Pack, extend Coda with PDF Extract - Coda Extract text and data from PDF files. This Pack does not connect to any external services and uses Mozilla's pdf.js library to process PDFs. Built with the fol coda.io⁠

⁠

Usage

First, upload your PDF file to Coda. You can do this by adding a file column to a table, and then dragging and dropping your file there. External files are not currently supported.

From there, you get a number of formulas to work with:

Extract() extracts all text from the PDF file (optionally from a range of pages)

Info() returns basic information about the file

ExtractFull() extracts detailed text information from the PDF files, including textbox positions

Note that PDF files, due to their portability and depending how they were created, may not extract text elegantly or show content in the expected order. However, if you’re working with a bulk of PDF files in the same format, you may be able to use heuristics and Coda formulas to extract the text you need.

Demo

⁠

Show full text

File

Info

Extract

ExtractFull

bitcoin.pdf

Untitled

Bitcoin: A Peer-to-Peer Electronic Cash S

a-lincoln-anthology-003-the-gettysburg-address.pdf

4224

The Gettysburg Address

Abraham Lincoln

Heartbleed-Story.pdf

Heartbleed_story_2014-09-24

Bugs in software and software libraries c

There are no rows in this table

⁠

Known issues

When you upload a file to Coda, it goes through virus scans; after those scans complete, formulas are not automatically recalculated. To work around this, you can try cutting and pasting the PDF file contents in the same sell (to “refresh the cell”) or alternatively updating the formula (e.g., add a space)

Large files (beyond Coda’s default limits) may not be supported; you may want to split them first into smaller PDFs, or otherwise optimize those files to shrink their images

This Pack only supports extracting text. Building another Pack with more advanced functionality like editing may be possible with

https://pdf-lib.js.org/⁠

⁠

Files on external websites (i.e., not uploaded to Coda) are not supported, since the Pack does not have access to external content. You’ll need to upload your files to Coda first

Support

Feel free to reply to this

community post⁠

. Additional contact information is in the Pack listing.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.