Scanning in Linux
Proccessing
Unpaper:
https://github.com/unpaper/unpaper
OCR
Tesseract:
https://medium.com/geekculture/tesseract-ocr-understanding-the-contents-of-documents-beyond-their-text-a98704b7c655
https://github.com/tesseract-ocr/tesseract
https://github.com/Pankrat/pdf-ocr-overlay
https://groups.google.com/g/tesseract-ocr/c/h2rlLcZjZAU
Convert to PDF
ImageMagick:
magick convert page*.png output.pdf