🎯 What You’ll Learn in This Phase
- What OCR is and how it works
- Installing and using pytesseract
- Loading a receipt image in Python
- Extracting and cleaning text
✅ STEP 1: What is OCR?
OCR (Optical Character Recognition) helps Python "read" words from images. Like scanning this:
Rice ₦2000
Milk ₦1500
Soap ₦500
Python will turn this into usable text automatically. 🔥
✅ STEP 2: Install Required Tools
🛠 1. Install Tesseract OCR
Windows users: Install from this link
Remember the path: C:\Program Files\Tesseract-OCR\tesseract.exe
Mac users:
brew install tesseract
🛠 2. Install Python Libraries:
pip install pytesseract opencv-python pillow
✅ STEP 3: Code to Extract Text
📁 Create a new file: receipt_reader.py
import pytesseract
import cv2
from PIL import Image
# 1. Set the path to Tesseract (Windows only)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# 2. Load the image
img = cv2.imread('receipt.jpg')
# 3. Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 4. Extract text
text = pytesseract.image_to_string(gray)
# 5. Show result
print("Extracted Text:\n")
print(text)
🧠 Explanation of Each Line
Line | What It Means |
---|---|
import pytesseract | Loads the OCR engine |
import cv2 | OpenCV for image loading |
from PIL import Image | Pillow helps handle images |
tesseract_cmd = ... | Tells Python where Tesseract is (Windows) |
cv2.imread | Loads the receipt image |
cv2.cvtColor | Converts to grayscale for better reading |
image_to_string | Runs OCR to extract text |
print(text) | Displays the output |
✅ STEP 4: Test It
- Use a real receipt
- Take a clear photo and save as
receipt.jpg
- Run your code:
python receipt_reader.py
✅ Sample Output
Sunlight Detergent ₦2,000
Indomie Noodles ₦3,500
Coca-Cola ₦1,200
Toothpaste ₦850
USB Cable ₦1,500
🎉 Success! Python can now read receipts. Next: Parsing line-by-line.