Phase 3 – Parsing Receipt Text into Structured Data

🔥 Crystal-clear structure. Know exactly where to paste code — even if you're brand new.

✅ Overview

We’ll build receipt_reader.py in 3 clear parts + 1 optional export step.

🧱 PART 1: Import & Setup

Paste this at the top of your Python file:

import pytesseract
import cv2
import csv
from PIL import Image

# Set path for Windows
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

Explanation: This loads your OCR and file handling tools.

🧱 PART 2: Load Image and Extract Text

Paste this after Part 1:

img = cv2.imread('receipt.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
raw_text = pytesseract.image_to_string(gray)
lines = raw_text.split('\n')
lines = [line.strip() for line in lines if line.strip() != '']

print("Cleaned Lines:")
for line in lines:
    print(line)

Explanation: Reads your image and cleans the text line-by-line.

🧱 PART 3: Extract Items and Prices

Paste this after Part 2:

expenses = []

for line in lines:
    parts = line.rsplit('₦', 1)
    if len(parts) == 2:
        item = parts[0].strip()
        price_str = parts[1].replace(',', '').strip()
        try:
            price = int(price_str)
            expenses.append({'item': item, 'price': price})
        except ValueError:
            pass

Explanation: This turns lines into structured data (dictionary format).

🧱 PART 4 (Optional): Show and Save Results

print("\nStructured Data:")
for e in expenses:
    print(f"{e['item']} - ₦{e['price']}")

with open('expenses.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Item', 'Price'])
    for e in expenses:
        writer.writerow([e['item'], e['price']])

Explanation: Prints the results and saves a file expenses.csv

✅ Final File Summary

# PART 1
import pytesseract, cv2, csv
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

# PART 2
img = cv2.imread('receipt.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
raw_text = pytesseract.image_to_string(gray)
lines = [line.strip() for line in raw_text.split('\n') if line.strip()]

# PART 3
expenses = []
for line in lines:
    parts = line.rsplit('₦', 1)
    if len(parts) == 2:
        item = parts[0].strip()
        price_str = parts[1].replace(',', '').strip()
        try:
            price = int(price_str)
            expenses.append({'item': item, 'price': price})
        except ValueError:
            pass

# PART 4
print("\nStructured Data:")
for e in expenses:
    print(f"{e['item']} - ₦{e['price']}")

with open('expenses.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Item', 'Price'])
    for e in expenses:
        writer.writerow([e['item'], e['price']])

📦 File/Folder Structure:

receipt_reader.py
receipt.jpg        ← your actual image file
expenses.csv       ← will be created after run

✅ Run It:

python receipt_reader.py

You should see cleaned data printed + expenses.csv saved in your folder.