NITDA/NCAIR ULA - Python Receipt Manager Tutorial

Comprehensive Tutorial

Introduction

This tutorial will teach you how to build a professional receipt management system using Python and Object-Oriented Programming (OOP) principles. The system will:

Extract text from receipt images using Optical Character Recognition (OCR)
Categorize purchased items automatically
Generate expense reports in CSV format
Follow NITDA/NCAIR coding standards for maintainability

Key Concepts

Object-Oriented Programming

We use Python classes to organize our code into logical components (Receipt and ReceiptManager) for better maintainability.

OCR with Tesseract

We leverage pytesseract library to extract text from receipt images, converting visual data into processable text.

Data Processing

We clean and structure the extracted data using regular expressions and string manipulation.

File Handling

We save the processed data to CSV files with proper error handling and file existence checks.

Step-by-Step Implementation

1. Setting Up the Receipt Class

The Receipt class handles all operations related to a single receipt:

load_image() - Reads the receipt image file
extract_text() - Uses OCR to extract text from the image
parse_items() - Processes the extracted text to identify items and prices
categorize() - Automatically categorizes items based on keywords

2. Implementing the ReceiptManager

The ReceiptManager coordinates the overall process:

add_receipt() - Adds receipt files to the processing queue
process_receipts() - Processes all queued receipts and saves to CSV
run() - Provides the user interface for interaction

3. Running the Application

The main execution block creates a ReceiptManager instance and starts the program:

if __name__ == "__main__":
    manager = ReceiptManager()
    manager.run()

NITDA/NCAIR Coding Standards

Modular Design: Code is organized into logical classes and methods
Error Handling: Proper checks for file existence and data validity
Documentation: Clear structure makes the code self-documenting
Maintainability: Easy to extend with new features or modify existing ones
User Feedback: Clear status messages guide the user through the process

Complete Implementation

Python Receipt Manager with OCR

Below is the complete implementation following NITDA/NCAIR standards:

import pytesseract
import cv2
import csv
import re
import os
from datetime import datetime

# Configure Tesseract OCR path (adjust if needed)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Predefined categories
CATEGORY_MAP = {
    "tooth": "Personal Care", "paste": "Personal Care", "soap": "Personal Care",
    "detergent": "Cleaning", "omo": "Cleaning", "colgate": "Personal Care",
    "sunlight": "Cleaning", "indomie": "Food", "noodle": "Food", "maggie": "Food",
    "rice": "Food", "milk": "Beverage", "sugar": "Food", "bread": "Food",
    "mayonnaise": "Food", "oil": "Food", "pack": "General", "cream": "Personal Care",
    "shampoo": "Personal Care", "brush": "Personal Care", "chocolate": "Snack",
    "stew": "Food", "meat": "Food", "fish": "Food", "lorem": "Food",
    "ipsum": "Food", "dolor sit amet": "Food", "consectetur": "Snack", "adipiscing elit": "Snack"
}

class Receipt:
    def __init__(self, filename):
        self.filename = filename
        self.items = []

    def load_image(self):
        image = cv2.imread(self.filename)
        if image is None:
            raise FileNotFoundError(f"❌ Could not read image: {self.filename}")
        return image

    def extract_text(self, image):
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        return pytesseract.image_to_string(gray)

    def parse_items(self, text):
        lines = [line.strip() for line in text.split('\n') if line.strip()]
        for line in lines:
            if any(word in line.lower() for word in ['total', 'cash', 'change', 'receipt']):
                continue

            match = re.search(r'(.+?)\s+([₦N]?\s?[\d.,]+)[\)]?$', line)
            if match:
                item_name = match.group(1).strip()
                price_str = match.group(2).replace('₦', '').replace('N', '').replace(',', '.').strip()
                try:
                    price = round(float(price_str))
                    category = self.categorize(item_name)
                    self.items.append((item_name, price, category))
                except:
                    continue

    def categorize(self, item_name):
        item_name = item_name.lower()
        for keyword, category in CATEGORY_MAP.items():
            if keyword in item_name:
                return category
        return "Uncategorized"

class ReceiptManager:
    def __init__(self):
        self.receipts = []
        self.output_file = "expenses.csv"

    def add_receipt(self, filename):
        if os.path.isfile(filename):
            self.receipts.append(filename)
            print(f"✅ Receipt added: {filename}")
        else:
            print("❌ File not found.")

    def process_receipts(self):
        if not self.receipts:
            print("⚠ No receipts added yet.")
            return

        file_exists = os.path.isfile(self.output_file)
        with open(self.output_file, 'a', newline='') as file:
            writer = csv.writer(file)
            if not file_exists:
                writer.writerow(['Receipt', 'Item', 'Price', 'Category', 'Date'])

            for filename in self.receipts:
                try:
                    receipt = Receipt(filename)
                    img = receipt.load_image()
                    text = receipt.extract_text(img)
                    receipt.parse_items(text)

                    now = datetime.now().strftime("%Y-%m-%d %H:%M")
                    for item_name, price, category in receipt.items:
                        writer.writerow([filename, item_name, price, category, now])

                    print(f"✅ Processed: {filename}")
                except FileNotFoundError as e:
                    print(e)

        self.receipts.clear()
        print("📁 All receipts processed. Check 'expenses.csv' for your report.")

    def run(self):
        print("🔹 PYTHON RECEIPT MANAGER 🔹")
        print("🔁 Developed by Nuhu @ NITDA/NCAIR\n")

        while True:
            print("\nChoose an option:")
            print("1. Add receipt image")
            print("2. Process and generate report")
            print("3. Exit")

            choice = input("Enter your choice (1/2/3): ").strip()

            if choice == '1':
                img_name = input("🖼 Enter image file name (e.g., receipt1.jpg): ").strip()
                self.add_receipt(img_name)

            elif choice == '2':
                self.process_receipts()

            elif choice == '3':
                print("👋 Exiting. Goodbye!")
                break

            else:
                print("❌ Invalid choice. Please enter 1, 2, or 3.")

# Run the program
if __name__ == "__main__":
    manager = ReceiptManager()
    manager.run()

How to Use This Program

Install required packages: pip install pytesseract opencv-python
Download and install Tesseract OCR
Save receipt images in the same folder as the script
Run the script and follow the menu prompts
Check the generated expenses.csv for your report

Simple Explanations for Everyone

Two-line explanations so any Group 5 member can understand and explain confidently

🔷 class Receipt

Handles reading, extracting, and analyzing a single receipt image.
It finds items, prices, and categories from one photo.

init(self, filename)

Stores the filename of the receipt image.
This lets us remember which receipt we're working on.

load_image(self)

Loads the image from your computer.
If the image isn't found, it shows an error.

extract_text(self, image)

Turns the receipt image into text using OCR (pytesseract).
Basically, it reads what's written on the paper.

parse_items(self, text)

Goes line by line through the text and finds items with prices.
It also figures out which category each item belongs to.

categorize(self, item_name)

Checks each item name and tries to match it to a category like food or personal care.
If it doesn't match anything, it's marked "Uncategorized".

🔷 class ReceiptManager

Handles all receipts, collects them, and creates the final report.
Think of it as the team captain managing all the receipts together.

init(self)

Creates an empty list to hold all receipt files added.
Also sets the name of the CSV file to save results.

add_receipt(self, filename)

Checks if a file exists and adds it to the list of receipts.
It's like telling the system "this receipt is ready to process."

process_receipts(self)

Goes through all added receipts and writes the details into a CSV report.
Each item from each receipt is saved with name, price, category, and time.

run(self)

Shows the main menu where the user can add receipts, process them, or exit.
This is the main loop that runs the app.

🔷 if name == "main":

This line tells Python to start the program here.
It runs the whole system by calling the ReceiptManager.

✅ Summary

• Receipt handles one receipt: reads image → gets text → finds items.
• ReceiptManager handles all receipts: adds files → processes → saves report.

Now you can explain this project like a pro! 🎉

Line-by-Line Code Breakdown

Complete explanation in plain English for absolute beginners

1 Importing Required Libraries

import pytesseract
import cv2
import csv
import re
import os
from datetime import datetime

What This Does:

pytesseract - The "text reader" that extracts words from images
cv2 (OpenCV) - Helps the computer understand and process images
csv - Creates spreadsheet files to save our results
re - Helps find patterns in text (like prices)
os - Lets Python interact with your computer's files
datetime - Adds timestamps to track when items were recorded

2 Tesseract OCR Path Setup

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Simple Explanation:

This is like telling Python: "The tool that reads text from images is installed here on my computer". The r before the path helps Python understand Windows file paths correctly.

3 Category Dictionary

CATEGORY_MAP = {
    "tooth": "Personal Care", 
    "paste": "Personal Care",
    # ... other items ...
    "adipiscing elit": "Snack"
}

How It Works:

This is like a cheat sheet that tells the program:

If you see "tooth" or "paste" → it's Personal Care
If you see "indomie" → it's Food
If no matches → mark as Uncategorized

This makes automatic categorization possible without manual input.

🔷 The Receipt Class

Handles everything about a single receipt - from loading the image to extracting and categorizing items

Initialization (init)

def __init__(self, filename):
    self.filename = filename
    self.items = []

What happens: When we create a new Receipt object, we:

Remember the image file name (self.filename)
Prepare an empty list (self.items) to store found items later

Loading the Image (load_image)

def load_image(self):
    image = cv2.imread(self.filename)
    if image is None:
        raise FileNotFoundError(f"❌ Could not read image: {self.filename}")
    return image

Step-by-step:

Tries to open the image file using OpenCV (cv2.imread)
If failed (returns None), shows an error message
If successful, returns the image for processing

Think of this like trying to open a photo on your phone - if it can't be opened, you get an error.

Extracting Text (extract_text)

def extract_text(self, image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    return pytesseract.image_to_string(gray)

How it works:

Converts the color image to black & white (GRAY) - this helps OCR work better
Uses pytesseract to "read" the text from the simplified image
Returns all found text as a string

Real-world analogy: Like taking a photo of a receipt and using your phone's "copy text from image" feature.

Finding Items (parse_items)

def parse_items(self, text):
    lines = [line.strip() for line in text.split('\n') if line.strip()]
    
    for line in lines:
        if any(word in line.lower() for word in ['total', 'cash', 'change', 'receipt']):
            continue
            
        match = re.search(r'(.+?)\s+([₦N]?\s?[\d.,]+)[\)]?$', line)
        if match:
            item_name = match.group(1).strip()
            price_str = match.group(2).replace('₦', '').replace('N', '').replace(',', '.').strip()
            
            try:
                price = round(float(price_str))
                category = self.categorize(item_name)
                self.items.append((item_name, price, category))
            except:
                continue

Detailed breakdown:

Line cleaning: Splits text into lines and removes empty ones
Skip totals: Ignores lines containing "total", "cash" etc.
Pattern matching: Looks for item names followed by prices
Price cleaning: Removes currency symbols and fixes decimals
Storage: Saves valid items with their prices and categories

Key Point: The re.search pattern looks for:
1. Item name (text) → 2. Space → 3. Price (numbers with optional currency symbols)

Categorizing Items (categorize)

def categorize(self, item_name):
    item_name = item_name.lower()
    for keyword, category in CATEGORY_MAP.items():
        if keyword in item_name:
            return category
    return "Uncategorized"

How categorization works:

Makes the item name lowercase for easier matching
Checks if any keyword from our dictionary exists in the item name
Returns the matching category if found
Defaults to "Uncategorized" if no matches

Example: "Colgate Toothpaste" → contains "tooth" → returns "Personal Care"

🔷 The ReceiptManager Class

Manages multiple receipts and handles the overall program flow

Initialization (init)

def __init__(self):
    self.receipts = []
    self.output_file = "expenses.csv"

Setup:

self.receipts - Empty list to store receipt filenames
self.output_file - Sets the CSV filename for saving results

This is like preparing a blank notebook (receipts) and deciding where to save the final report (expenses.csv).

Adding Receipts (add_receipt)

def add_receipt(self, filename):
    if os.path.isfile(filename):
        self.receipts.append(filename)
        print(f"✅ Receipt added: {filename}")
    else:
        print("❌ File not found.")

Process flow:

Checks if the file exists using os.path.isfile
If exists → adds to processing queue with success message
If not → shows error message

This is like putting a physical receipt in your "to process" tray.

Processing Receipts (process_receipts)

def process_receipts(self):
    if not self.receipts:
        print("⚠ No receipts added yet.")
        return

    file_exists = os.path.isfile(self.output_file)
    with open(self.output_file, 'a', newline='') as file:
        writer = csv.writer(file)
        if not file_exists:
            writer.writerow(['Receipt', 'Item', 'Price', 'Category', 'Date'])

        for filename in self.receipts:
            try:
                receipt = Receipt(filename)
                img = receipt.load_image()
                text = receipt.extract_text(img)
                receipt.parse_items(text)

                now = datetime.now().strftime("%Y-%m-%d %H:%M")
                for item_name, price, category in receipt.items:
                    writer.writerow([filename, item_name, price, category, now])

                print(f"✅ Processed: {filename}")
            except FileNotFoundError as e:
                print(e)

    self.receipts.clear()
    print("📁 All receipts processed. Check 'expenses.csv' for your report.")

Complete workflow:

First checks if there are receipts to process
Prepares the CSV file (adds headers if new file)
For each receipt:
- Creates a Receipt object
- Loads image and extracts text
- Finds and categorizes items
- Saves each item to CSV with timestamp
Clears the processing queue when done
Shows completion message

Key Features:

Error handling with try/except
Appends to existing CSV without overwriting
Clear user feedback at each step

Main Menu (run)

def run(self):
    print("🔹 PYTHON RECEIPT MANAGER 🔹")
    print("🔁 Developed by Nuhu @ NITDA/NCAIR\n")

    while True:
        print("\nChoose an option:")
        print("1. Add receipt image")
        print("2. Process and generate report")
        print("3. Exit")

        choice = input("Enter your choice (1/2/3): ").strip()

        if choice == '1':
            img_name = input("🖼 Enter image file name (e.g., receipt1.jpg): ").strip()
            self.add_receipt(img_name)

        elif choice == '2':
            self.process_receipts()

        elif choice == '3':
            print("👋 Exiting. Goodbye!")
            break

        else:
            print("❌ Invalid choice. Please enter 1, 2, or 3.")

User interaction:

Shows welcome message and menu
Continuously displays options until user exits
Handles three main actions:
- 1: Add new receipt (calls add_receipt)
- 2: Process all receipts (calls process_receipts)
- 3: Exit the program
Validates user input and provides feedback

This creates the interactive experience users see when running the program.

Program Starter

if __name__ == "__main__":
    manager = ReceiptManager()
    manager.run()

What happens when you run the file:

Python checks if this is the main file being run (not imported)
Creates a ReceiptManager instance
Starts the program by calling run()

Professional Tip: The if __name__ == "__main__": block ensures this code only runs when the file is executed directly, not when imported as a module.

🎯 Key Takeaways

Receipt Class

Handles one receipt at a time
Does the heavy lifting of image processing
Extracts and organizes item data

ReceiptManager Class

Manages multiple receipts
Handles user interaction
Saves final reports

Now you can explain every part of this project with confidence! 🚀

Python Receipt Manager with OCR

Comprehensive Tutorial

Introduction

Key Concepts

Object-Oriented Programming

OCR with Tesseract

Data Processing

File Handling

Step-by-Step Implementation

1. Setting Up the Receipt Class

2. Implementing the ReceiptManager

3. Running the Application

NITDA/NCAIR Coding Standards

Complete Implementation

Python Receipt Manager with OCR

How to Use This Program

Simple Explanations for Everyone

🔷 class Receipt

__init__(self, filename)

load_image(self)

extract_text(self, image)

parse_items(self, text)

categorize(self, item_name)

🔷 class ReceiptManager

__init__(self)

add_receipt(self, filename)

process_receipts(self)

run(self)

🔷 if __name__ == "__main__":

✅ Summary

Line-by-Line Code Breakdown

1 Importing Required Libraries

What This Does:

2 Tesseract OCR Path Setup

Simple Explanation:

3 Category Dictionary

How It Works:

🔷 The Receipt Class

Initialization (__init__)

Loading the Image (load_image)

Extracting Text (extract_text)

Finding Items (parse_items)

Categorizing Items (categorize)

🔷 The ReceiptManager Class

Initialization (__init__)

Adding Receipts (add_receipt)

Processing Receipts (process_receipts)

Main Menu (run)

Program Starter

🎯 Key Takeaways

Receipt Class

ReceiptManager Class

Receipt Manager Q&A Challenge

Your Results

1. What is the primary purpose of the pytesseract library in this project?

2. Why do we convert the image to grayscale before OCR processing?

3. What is the purpose of the CATEGORY_MAP dictionary?

4. What does the cv2.imread() function do?

5. What happens if the program fails to load an image?

6. What is the main purpose of the regular expression in parse_items()?

7. Why does the program skip lines containing 'total' or 'cash'?

8. What does the Receipt class primarily represent in this program?

9. What is the main responsibility of the ReceiptManager class?

10. What file extension is used for the output report?

11. Why is the datetime module imported and used?

12. Why is there a try/except block when processing prices?

13. What category is assigned if no keywords match an item?

14. What does the expression round(float(price_str)) accomplish?

15. How are extracted items stored in the Receipt class?

16. What does the 'a' mode in open() do when saving to CSV?

17. What is the first thing process_receipts() checks?

18. What is written first if creating a new CSV file?

19. What gets cleared after processing all receipts?

20. How are prices cleaned before conversion?

21. What is the main purpose of the run() method?

22. How do you add a receipt to be processed?

23. What happens if you enter an invalid menu choice?

24. What is the purpose of the if __name__ == "__main__" block?

25. Why use classes instead of just functions for this project?

Click to Submit

init(self, filename)

init(self)

🔷 if name == "main":

Initialization (init)

Initialization (init)

24. What is the purpose of the if name == "main" block?