Phase 2 – Extracting Text from a Receipt Image (OCR)

🔥 Built for greatness. Learn like you're in a private Python masterclass.

🎯 What You’ll Learn in This Phase

✅ STEP 1: What is OCR?

OCR (Optical Character Recognition) helps Python "read" words from images. Like scanning this:

Rice      ₦2000
Milk      ₦1500
Soap      ₦500

Python will turn this into usable text automatically. 🔥

✅ STEP 2: Install Required Tools

🛠 1. Install Tesseract OCR

Windows users: Install from this link
Remember the path: C:\Program Files\Tesseract-OCR\tesseract.exe

Mac users:

brew install tesseract

🛠 2. Install Python Libraries:

pip install pytesseract opencv-python pillow

✅ STEP 3: Code to Extract Text

📁 Create a new file: receipt_reader.py

import pytesseract
import cv2
from PIL import Image

# 1. Set the path to Tesseract (Windows only)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# 2. Load the image
img = cv2.imread('receipt.jpg')

# 3. Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 4. Extract text
text = pytesseract.image_to_string(gray)

# 5. Show result
print("Extracted Text:\n")
print(text)

🧠 Explanation of Each Line

Line What It Means
import pytesseractLoads the OCR engine
import cv2OpenCV for image loading
from PIL import ImagePillow helps handle images
tesseract_cmd = ...Tells Python where Tesseract is (Windows)
cv2.imreadLoads the receipt image
cv2.cvtColorConverts to grayscale for better reading
image_to_stringRuns OCR to extract text
print(text)Displays the output

✅ STEP 4: Test It

python receipt_reader.py

✅ Sample Output

Sunlight Detergent  ₦2,000
Indomie Noodles     ₦3,500
Coca-Cola           ₦1,200
Toothpaste          ₦850
USB Cable           ₦1,500

🎉 Success! Python can now read receipts. Next: Parsing line-by-line.