Optical Character Recognition (OCR) with google translate

Optical Character Recognition (OCR) with google translate

This blog post introduces an Optical Character Recognition (OCR) project designed to allow users to upload an image containing text and extract the characters or text from it. The project also enables users to select the preferred area of the uploaded image for OCR, enhancing flexibility and accuracy.

Table of Contents

  1. Introduction

  2. Features

  3. Getting Started

  4. Usage

  5. Code Explanation

  6. Conclusion

1. Introduction

Optical Character Recognition (OCR) technology is revolutionizing the way we interact with printed and handwritten text. By converting images of text into machine-encoded text, OCR simplifies data entry, document digitization, and text analysis. This project harnesses OCR capabilities to provide a user-friendly interface for text extraction from images.

2. Features

  • Image Upload: Users can upload an image containing text.

  • Select Preferred Area: Users can select the area of the uploaded image to extract text from.

  • Text Extraction: The system processes the uploaded image and extracts text using OCR algorithms.

  • Output Display: The extracted text is displayed to the user for review or further processing.

  • Translation: Users can translate the extracted text to a preferred language.

  • Copy and Download: Users can copy the extracted or translated text to the clipboard or download it as CSV or TXT files.

3. Getting Started

3.1. Initial Requirements

Before running the project, ensure you have the following installed:

  1. Anaconda: Download Anaconda

  2. Python: Download Python

  3. Tesseract: Download Tesseract (For Windows) (Other OS)

3.2. Environment Setup

  1. Clone the repository to your local machine.

  2. Create the environment:

conda env create -f environment.yml

3. Activate the environment:

conda activate env_ocr

4.Update the activated environment:

conda env update -f environment.yml --prune

5. Run the application:

streamlit run app.py

6.Access the application through the web browser at http://localhost:8501

4. Usage

  • Upload Image: Click on the upload button and select an image containing text.

  • Select Area: Use the selection tool to highlight the area of the image from which you want to extract text.

  • Extract Text: Click on the “Extract Text” button to process the image and extract the text.

  • Translate Text: If needed, select the target language and click on the “Translate” button.

  • Copy and Download: Use the available buttons to copy the extracted or translated text to the clipboard or download it in your preferred format (CSV or TXT).

5. Code Explanation

#file-app.py
import streamlit as st
from PIL import Image
import pytesseract
from googletrans import Translator
import pyperclip
from functions import get_img_resize,get_key_from_value
from configurations import *
from streamlit_js_eval import streamlit_js_eval
from streamlit_cropper import st_cropper
import pandas as pd


translator = Translator()
# Activate wide mode
st.set_page_config(layout='wide')


if 'detected_text' not in st.session_state:
    st.session_state.detected_text = ""
if 'translated_text' not in st.session_state:
    st.session_state.translated_text = ""
if 'img_text' not in st.session_state:
    st.session_state.img_text = ""
if 'init_destination' not in st.session_state:
    st.session_state.init_destination = None
if 'image_file' not in st.session_state:
    st.session_state.image_file = None   
if 'screen_width' not in st.session_state:
    st.session_state.screen_width = None

st.title('Optical Character Recognition (OCR) & Translator')
st.subheader('Please Upload an Image to Begin.')

# First row for language selection
col1, col2 = st.columns(2)

# Left column for source language selection
with col1:
    src = st.selectbox("From (Auto Detect Enabled)",['English', 'Chinese-Simplified', 'Malay', 'Filipino', 'Vietnamese', 'Tamil','Thai'], key='source_lang')
    source = translate_lang[src]
    st.write("")

# Right column for destination language selection
with col2:
    destination = st.selectbox("To",['English', 'Chinese-Simplified', 'Malay', 'Filipino', 'Vietnamese', 'Tamil','Thai'], key='destination_lang')
    dst = translate_lang[destination]
    st.write("")

# Left column for OCR and image upload
with col1:
    image_file = st.file_uploader("Upload Image", type=['jpg', 'png', 'jpeg', 'JPG'])
    if image_file is not None: 
        img = Image.open(image_file)
        st.subheader('Image you Uploaded...')
        # Resize the image to fit the column width in the streamlit page
        screen_width = streamlit_js_eval(js_expressions='screen.width', key = 'SCR')
        st.session_state.screen_width = screen_width
        if st.session_state.screen_width is not None:
            resized_img= get_img_resize(img,st.session_state.screen_width)
            cropped_img = st_cropper(img_file=resized_img,realtime_update=True)
            st.session_state.image_file = cropped_img if cropped_img else image_file

    if st.button("Convert Text"):
        st.session_state.img_text = pytesseract.image_to_string(st.session_state.image_file, config=custom_config)
        st.session_state.detected_text = ' '.join(st.session_state.img_text.split())
        st.write('')

    if st.session_state.detected_text:         
        detected_lang=translator.detect(st.session_state.detected_text).lang
        if detected_lang is None:
            st.write(f"### Please upload a clear image")
        else:
            print_lang = f"Detected Language is {get_key_from_value(detected_lang)}"
            st.write(f"### {print_lang}")
            st.text_area('Extracted Text',st.session_state.img_text,height=200)

with col2:
    # Display the text area with the translated text inside the container
    if st.session_state.detected_text: 
        try:
            with st.spinner('Translating Text...'):
                sour = translator.detect(st.session_state.detected_text).lang
                result = translator.translate(st.session_state.img_text, src=f'{sour}', dest=f'{dst}').text
            st.text_area('Translated Text',result,height=200) 
            st.session_state.translated_text = result
            st.write('')
        except Exception as e:
            st.error("Translation Error: {}".format(str(e)))

# Create a button to copy the text to clipboard
with col1:
    if st.button('Copy Extracted Text'):
        pyperclip.copy(st.session_state.detected_text)
        st.write('Extracted Text copied to clipboard!')

    download_format_extracted = st.selectbox("Select Ectracted File Format", ["CSV", "TXT"])

    if download_format_extracted == "CSV":
        extracted_text_df = pd.DataFrame({'Extracted Text': [st.session_state.detected_text]})
        extracted_text_filename = 'extracted_text.csv'
        st.download_button(label="Download Extracted Text", data=extracted_text_df.to_csv(), file_name=extracted_text_filename, mime='text/csv')
    elif download_format_extracted == "TXT":
        extracted_text_filename = 'extracted_text.txt'
        st.download_button(label="Download Extracted Text", data=st.session_state.detected_text, file_name=extracted_text_filename, mime='text/plain')

with col2:
    if st.button('Copy Translated Text'):
        pyperclip.copy(result)
        st.write('Transalted Text copied to clipboard!')

    download_format_translated = st.selectbox("Select Translated File Format", ["CSV", "TXT"])

    if download_format_translated == "CSV":
        translated_text_df = pd.DataFrame({'Extracted Text': [st.session_state.translated_text]})
        translated_text_filename = 'translated_text.csv'
        st.download_button(label="Download Translated Text", data=translated_text_df.to_csv(), file_name=translated_text_filename, mime='text/csv')
    elif download_format_translated == "TXT":
        translated_text_filename = 'translated_text.txt'
        st.download_button(label="Download Translated Text", data=st.session_state.translated_text, file_name=translated_text_filename, mime='text/plain')

app.py

The app.py file is the main script for the Streamlit application. It handles the following functionalities:

  • Image Upload and Display: Allows users to upload an image and displays it.

  • Area Selection: Enables users to select a specific area of the image for OCR.

  • Text Extraction: Uses Tesseract OCR to extract text from the selected area.

  • Text Translation: Utilizes the translate library to translate the extracted text into a specified language.

  • Copy and Download: Provides options to copy the text to the clipboard and download it as CSV or TXT files.

# file-configurations.py
# Google translator language codes
translate_lang ={'English':'en','Chinese-Simplified':'zh-CN','Chinese-Traditional':'zh-TW','Malay':'ms','Filipino':'tl','Vietnamese':'vi','Tamil':'ta',
              'Thai':'th'}

# Construct the Tesseract configuration string with the specified languages
ocr_lang = ['eng', 'chi_sim', 'chi_tra', 'msa', 'tgl', 'vie', 'tam', 'tha']
custom_config = r'--psm 6 -l ' + '+'.join(ocr_lang)
# file-functions.py
import streamlit as st
from googletrans import Translator
from configurations import *

def get_img_resize(img,screen_width):
    aspect_ratio = img.width / img.height
    new_width = int(screen_width/ 3)
    new_height = int(new_width / aspect_ratio)
    resized_img = img.resize((new_width, new_height))
    return resized_img

def get_key_from_value(value):
    for key, val in translate_lang.items():
        if val == value:
            return key
    return None

Key libraries used include:

  • streamlit for the web interface

  • PIL for image processing

  • pytesseract for OCR

  • googletrans for translation

  • pyperclip for clipboard operations

  • pandas for handling CSV downloads

Here is a brief overview of the core sections of app.py:

  • Image Upload:
image_file = st.file_uploader("Upload Image", type=['jpg', 'png', 'jpeg', 'JPG'])
  • Area Selection:
if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image.', use_column_width=True)
    # (Code for selecting area)
  • Text Extraction:
if st.button('Extract Text'):
    extracted_text = pytesseract.image_to_string(image)
    st.write('Extracted Text:', extracted_text)
  • Translation:
translator = Translator()
translated_text = translator.translate(extracted_text, dest=target_language).text
st.write('Translated Text:', translated_text)
  • Copy and Download:
if st.button('Copy Extracted Text'):
    pyperclip.copy(extracted_text)
    st.write('Extracted Text copied to clipboard!')

6. Conclusion

This OCR project demonstrates how to build an interactive web application that leverages OCR technology for text extraction and translation. With a user-friendly interface and flexible options, users can easily extract and manipulate text from images, enhancing productivity and accessibility. The project can be further expanded by integrating more advanced OCR algorithms and supporting additional file formats and languages.

Stay tuned for more updates and features! If you have any questions or suggestions, feel free to leave a comment below.

Annexure

requirements.txt

# requirements.txt
streamlit==1.33.0
streamlit-cropper==0.2.2
pytesseract==0.3.10
googletrans==3.1.0a0
pyperclip==1.8.2
streamlit_js_eval==0.1.7

environment.yaml

# environment.yml
name: env_ocr
dependencies:
  - python>=3.5
  - anaconda
  - pip
  - pip:
    - -r requirements.txt

Google Transalate Supported Languages

https://py-googletrans.readthedocs.io/en/latest/

Tessaract Supported Languages

https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html