Collate front and back scans of documents with double-sided pages

By | January 17, 2022

I have several longish (25-100 sheets of paper each) documents to scan to PDF. The pages of the documents are printed on both sides. I have access to a very fast multi-sheet scanner but unfortunately it only scans one side of the pages. I can drop the document on the scanner and scan the front side of all the pages into a PDF file, then flip the stack and scan the back side of all the pages into a second PDF file.

This scanning process leaves me with 2 documents that need to be “zipped” together, one page from the first document, on from the second, another from the first, another from the second, etc. And the second document with the back-side scans is in reverse order so pages from that document need to be pulled from the end instead of from the beginning.

Here is a graphical illustration of the situation:

Collating single-sided scans into a properly ordered document

The last time I had to scan a document like this, I did the collating by hand since it was only one document. This time there are quite a few documents so doing it manually will be not only tedious, but also a waste of time. Anyway, that’s why we have things like Python.

Here is the basic outline of the Python solution:

  • Open the pdf files and get the number of pages in each.
  • Both pdfs should have an equal number of pages. If they don’t exit without doing anything.
  • Create a new, empty pdf document
  • Make a loop over the number of pages. Each time through the loop:
    • Add the next page from the beginning of ‘fronts’ file to the collated document
    • Add the next page from the end of the ‘backs’ file to the collated document.
  • Write out the new collated file
  • Close all the files.

Before I started coding, I did a quick search and found Merging multiple PDFs into a single PDF using a Python script and, since that was about 80% of what I needed, I used the code I found there as a starting point.

The following code does the trick:

import PyPDF2 

# Open and read the files
frontsFile = open('scannedfronts.pdf', 'rb')
backsFile = open('scannedbacks.pdf', 'rb')
frontsReader = PyPDF2.PdfFileReader(frontsFile)
backsReader = PyPDF2.PdfFileReader(backsFile)
 
if (frontsReader.numPages != backsReader.numPages):
	exit()

numPages = frontsReader.numPages

# Make the PdfFileWriter object which will contain the collated document
collatedWriter = PyPDF2.PdfFileWriter()
 
# Loop through the fronts and backs and add the pages to the new document
for pageNum in range(numPages):
    nextFront = frontsReader.getPage(pageNum)
    nextBack = backsReader.getPage(numPages - 1 - pageNum)
    collatedWriter.addPage(nextFront)
    collatedWriter.addPage(nextBack)
 
# Write out the collated document
pdfOutputFile = open('collated.pdf', 'wb')
collatedWriter.write(pdfOutputFile)
 
# Close the files
pdfOutputFile.close()
frontsFile.close()
backsFile.close()

At some point I might polish this up and make it into a nice command-line script to be used on arbitrarily-named files. If I do that, I will post it here.