Category: Linux Terminal
Added: 3rd of August 2020
Viewed: 3,401 times
Convert a PDF document to text or html using pdftotext, pdftohtml in Linux using the terminal
Sometimes you might want to export text from a PDF file to a .txt file.
For a single page it is probably quicker to select all then copy and paste the text, but if you have pages of information you can use a package called pdftotext.
To check if you already have pdftotext installed on your Ubuntu based distribution, open your terminal and enter the following
pdftotext -v
Here is the output from my system
To export text from your PDF file to .txt file enter the following command, filename being the name of the .pdf you want to export
pdftotext filename.pdf
You can also export the text and images in your PDF file to an .html document using the following command.
pdftohtml -c filename.pdf
Why would I use this
You might need to convert a pdf to html so that it can be uploaded to your webspace as well as providing a downloadable version in pdf format.