Category: Linux Terminal

Added: 3rd of August 2020

Viewed: 3,401 times


Convert a PDF document to text or html using pdftotext, pdftohtml in Linux using the terminal

Sometimes you might want to export text from a PDF file to a .txt file.

For a single page it is probably quicker to select all then copy and paste the text, but if you have pages of information you can use a package called pdftotext.

To check if you already have pdftotext installed on your Ubuntu based distribution, open your terminal and enter the following

pdftotext -v


Here is the output from my system

pdftotext version

To export text from your PDF file to .txt file enter the following command, filename being the name of the .pdf you want to export
pdftotext filename.pdf


You can also export the text and images in your PDF file to an .html document using the following command.
pdftohtml -c filename.pdf


Why would I use this
You might need to convert a pdf to html so that it can be uploaded to your webspace as well as providing a downloadable version in pdf format.