Step 1: Installing the pre requisites
- Install Python 3.9 64 Bit (If you have any other version already, please install this additionally and install the pip packages manually to this version)
- I am insisting on python 3.9 because Windows OCR works only on Python 3.9 and not later
- Ensure that you use “Add Python to PATH” – Or else things dont work.
- Install Tesseract OCR
- https://github.com/tesseract-ocr/tesseract/releases/download/5.5.0/tesseract-ocr-w64-setup-5.5.0.20241111.exe
- Ensure that it is installed into “C:\Program Files\Tesseract-OCR”
- Install Poppler –
- Link –https://blog.withkarthik.com/wp-content/uploads/2025/03/poppler-24.08.0.zip
- Extract the contents into “C:\Program Files\”
- It should be ensured that the path is something like the one mentioned in the screenshot below

4. Open the command prompt and run the following code and wait for few minutes so that the required packages get installed
pip install pillow pandas clipboard opencv-python numpy pdf2image asyncio pytesseract pymupdf psutil winrt
Step 2 – Getting the code Running
- Create a New folder in any place where you want to run this code (Please note that you have atleast twice the Space of the files that you are importing)
- Download the below ZIP File and extract all the contents into that Folder
https://blog.withkarthik.com/wp-content/uploads/2025/06/Audit-Vouching-Code.zip - Run the “Audit Compiled Codecpython-39.pyc”
Functionalities Explained
- Import the required files

2. Select the files to import

3. The rest steps are dependant on you. The UI is mostly self explanatory and hence you can see for yourself what you need
Templates – GSTR 7A and PF Payment Challan
- Will share the regex as a separate post
Other Important Aspects:
- Currently for some reason ,the import folder option does not work, it makes the program freeze, so please use the import files option
- Select the Native PDF engine for normal computer scanned PDFS.
- For Other scanned PDFs, use windows OCR or Tesseract as the case may be. Both have their own advantages and disadvantages. So, i usually use a combination of them both
<Note>
1. Ensure that the files are of similar nature. It vouches every file that is imported
2. Depending upon the functionality, check if the file is scanned or it is normal computer generated invoice. Regex matching works only on the computer generated stuffs