The PDF Extract Pack lets you extract text from PDF files in Coda. PDF Extract does not connect to any external services and uses Mozilla's pdf.js library to process PDFs, directly on Coda’s servers. It uses the following open-source libraries: pdf.js-extract, pdf.js, and pureimage.
Find the PDF Extract Pack in the “Insert” menu in Coda, or alternatively add to doc here:
First, upload your PDF file to Coda. You can do this by adding a file column to a table, and then dragging and dropping your file there. External files are not currently supported.
From there, you get a number of formulas to work with:
Extract() extracts all text from the PDF file (optionally from a range of pages)
Info() returns basic information about the file
ExtractFull() extracts detailed text information from the PDF files, including textbox positions
Note that PDF files, due to their portability and depending how they were created, may not extract text elegantly or show content in the expected order. However, if you’re working with a bulk of PDF files in the same format, you may be able to use heuristics and Coda formulas to extract the text you need.
When you upload a file to Coda, it goes through virus scans; after those scans complete, formulas are not automatically recalculated. To work around this, you can try cutting and pasting the PDF file contents in the same sell (to “refresh the cell”) or alternatively updating the formula (e.g., add a space)
Large files (beyond Coda’s default limits) may not be supported; you may want to split them first into smaller PDFs, or otherwise optimize those files to shrink their images
This Pack only supports extracting text. Building another Pack with more advanced functionality like editing may be possible with