Phase 2 – OCR Receipt Extraction

🎯 What You’ll Learn in This Phase

What OCR is and how it works
Installing and using pytesseract
Loading a receipt image in Python
Extracting and cleaning text

✅ STEP 1: What is OCR?

OCR (Optical Character Recognition) helps Python "read" words from images. Like scanning this:

Rice      ₦2000
Milk      ₦1500
Soap      ₦500

Python will turn this into usable text automatically. 🔥

✅ STEP 2: Install Required Tools

🛠 1. Install Tesseract OCR

Windows users: Install from this link
Remember the path: C:\Program Files\Tesseract-OCR\tesseract.exe

Mac users:

brew install tesseract

🛠 2. Install Python Libraries:

pip install pytesseract opencv-python pillow

✅ STEP 3: Code to Extract Text

📁 Create a new file: receipt_reader.py

import pytesseract
import cv2
from PIL import Image

# 1. Set the path to Tesseract (Windows only)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# 2. Load the image
img = cv2.imread('receipt.jpg')

# 3. Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 4. Extract text
text = pytesseract.image_to_string(gray)

# 5. Show result
print("Extracted Text:\n")
print(text)

🧠 Explanation of Each Line

Line	What It Means
import pytesseract	Loads the OCR engine
import cv2	OpenCV for image loading
from PIL import Image	Pillow helps handle images
tesseract_cmd = ...	Tells Python where Tesseract is (Windows)
cv2.imread	Loads the receipt image
cv2.cvtColor	Converts to grayscale for better reading
image_to_string	Runs OCR to extract text
print(text)	Displays the output

✅ STEP 4: Test It

Use a real receipt
Take a clear photo and save as receipt.jpg
Run your code:

python receipt_reader.py

✅ Sample Output

Sunlight Detergent  ₦2,000
Indomie Noodles     ₦3,500
Coca-Cola           ₦1,200
Toothpaste          ₦850
USB Cable           ₦1,500

🎉 Success! Python can now read receipts. Next: Parsing line-by-line.

Back to Home