Keywords

1 Introduction

The decrease in vision ability that cannot be fixed by usual means, such as glasses is called Visual impairment and also known as vision impairment or vision loss [5]. The term blindness is used for complete or nearly complete vision loss. Visual impairment may cause difficulties for people to carry on their normal daily activities such as driving, reading, socializing and walking.

The World Health Organization (WHO) estimates that the number of visually impaired people in the world is 285 million, 39 million of them are blind and 246 million having low vision; 65% of visually impaired people and 82% of the complete blind are at the age of 50 years and older [11].

In the world of advanced technology, many tools have been invented to ease our lives. Disabilities field is one of the most significant aspects that attracted the attention of the researchers. It should conform to the development of current technology and meet the requirements of this digital age. To accomplish this goal, many technologies have emerged to improve the daily lives of people suffering from this impairment. One type of technology that is available nearly to everyone, is the use of mobile applications. In this project, we aim at developing a mobile application to assist blind people and improve the quality of their lives. This paper will illustrate and clarify the problems faced by people with this specific disability and the suggested solutions that our work will propose to address these problems. The rest of the paper is structured as follows: Sect. 2 presents the methodology followed for data collection and analysis. Section 3 describes the specifications of the system requirements. Section 4 demonstrates the implementation process. Section 5 presents the evaluation of the implemented system. Finally, Sect. 6 concludes the paper and outlines future work (Fig. 1).

Fig. 1.
figure 1

Proposed process of the system

2 Related Systems Analysis

This section reviews well-known software solutions that are most related to the proposed project.

2.1 KNFB Reader

The KNFB Reader [10] is capable of reading different types of documents that one might face in daily life. The application uses VoiceOver on iOS and Google TalkBack on android to assist visually impaired users and guide them vocally while using the application.

2.2 Amedia Live Reader

Amedia Live Reader [12] takes the scan of a captured live image and reads any texts in it in real-time. It is designed for visually impaired users. it uses the VoiceOver on iOS platform as a guidance.

2.3 Google Translate

Google translate [7] is a mobile application running on both iOS and Android that translates between many languages. It provides instant camera translation using phone camera and for higher-quality translations, it takes picture of text.

2.4 Acapela TTS Voices

Acapela TTS [1] turns written text into speech and gives you the advantage of buying and installing voices as well as integrating them in your Android device in order to be used with the system or any TTS compatible applications such as Google TalkBack.

2.5 Adel TTS Voice

The Best-of-Vox [14] the application gives you the advantage of typing or pasting text and having it read out loudly by the sound that you have chosen.

2.6 Text Fairy (OCR Text Scanner)

TextFairy [15] turns a scanned image of a document to text and correct the viewpoint of the image. It has the facility to edit extracted text and copy it into the clipboard for utilization in other apps. It can also transform the scanned page into a PDF form.

2.7 CamScanner

CamScanner [6] helps in scanning, storing, syncing and collaborating on numerous contents across smartphones and computers. It uses the phone camera to scan documents and provides text extraction from Image with OCR (optical character recognition) for further editing or text sharing (Table 1).

Table 1. Summary comparison between related systems

KNFB Reader is a good multi language OCR reader, however, it does not support Arabic language as Amedia Live Reader. Amedia Live Reader has a real-time OCR feature but difficult to use because it repeats reading the captured text when the phone camera moves. Text Fairy and CamScanner perform the OCR without the TTS feature and both do not support the Arabic language. CamScanner has the feature of edge detection which is an important feature for our proposed app. Google Translate has a good OCR that supports Arabic language and its API is available in the Google Cloud Platform services.

3 Data Collection and Analysis

This section describes the data collection process to identify the needs of the potential users. The data was collected from 30 visually impaired persons in both genders. Their ages range from 18 to over 50 years old.

3.1 Data Collection Methods

Questioners and interviews have been conducted with visually impaired people and heads of departments from the Governmental foundation for rehabilitation and visual impairment in Jeddah, Saudi Arabia and Special Needs Center in students affairs of King Abdulaziz University.

Fig. 2.
figure 2

Main questionnaires results

3.2 Questionnaires Results

Questionnaires results (Fig. 2) show that 90% of the volunteers use iPhone. Most volunteers prefer voice note feedback and about 70% have a problem in directing camera toward objects. The gathered data from the volunteers is considered as the main requirements of the system.

3.3 Interview Results

The interviews highlighted some findings supported by the questionnaire results, which will be considered in the requirements specification:

  1. 1.

    There are no good existing quality apps that support reading Arabic text through the phone camera. Also, there are inaccuracy issue in extracting the Arabic text from scanned PDF documents.

  2. 2.

    The reason for the domination of the iOS platform for the hand-held over all other platforms is that there is a good quality built-in accessibility features.

  3. 3.

    The visually impaired people are very good in using hand-held devices.

  4. 4.

    There are some levels of visual impairment where people can see using special techniques and devices.

  5. 5.

    The difficulty of learning braille system results in resistant to use this system especially for those who lost their vision in late ages.

4 Requirement Specification

Functional non-functional requirements for the proposed system were specified by using the data gathering methods illustrated in Data Collection and Analysis section.

4.1 Functional Requirements

The systems functional requirements have been categorized as User Manual, Input, Processing, Output, Feedback and Help.

User Manual

  1. 1.

    The system shall play the audio user manual file automatically the first time the user opens the application.

  2. 2.

    The system shall allow the user to replay the audio user manual upon user request.

  3. 3.

    The system shall allow the user to navigate through the application.

Input

  1. 1.

    The system shall allow the user to automatically take a picture by the hand-held device camera.

  2. 2.

    The system shall allow the user to manually take a picture by the hand-held device camera.

  3. 3.

    The system shall allow the user to import a selected photo from the mobile gallery.

Processing

  1. 1.

    The system shall be able to detect the edges of the object in real time in order to auto capture the required object.

  2. 2.

    The system shall be able to detect and extract the printed Arabic text in the captured image.

  3. 3.

    The system shall keep the user aware of the progress while extracting the image by voice notes.

Output

  1. 1.

    The system shall be able to play the extracted text using VoiceOver iOS built-in Screen reader.

  2. 2.

    The system shall allow the user to replay the extracted text.

Feedback and Help

  1. 1.

    The system shall allow the user to read a new object after a task completion.

  2. 2.

    The system shall allow the user to change the speed of the reading inside the application.

4.2 Non-functional Requirements

This section illustrates the non-functional, i.e. quality, requirements of the proposed system, which are categorized into response time and usability.

Quality Requirements

  1. 1.

    Response time:

    The System OCR gives the desired response within a reasonable time.

  2. 2.

    Usability:

    1. (a)

      The system shall be easy to use for the visually impaired after listening to the user manual.

    2. (b)

      All functions of the system shall have audio responses to facilitate the app usage.

    3. (c)

      The system interfaces shall be minimized and support accessibility for visually impaired.

5 Implementation

This section presents the algorithms developed to implement the required functionalities of the proposed system. This is followed by listing the technologies used in the implementation and a walk-through the system.

5.1 Developed Algorithms

Several algorithms were developed to implement the requirements gathered from end users. Three main algorithms were used for text extraction, frame extraction and boundaries detection, explained in the following subsections.

Text Extraction

figure a
figure b

Boundaries Detection Identifying rectangular objects within an image captured by the user camera was the core of our project. Below is the approach we followed to implement this feature.

Figure 3 shows the intermediate results generated from the algorithms mentioned in the pseudocode, while Fig. 4 shows the final result.

Frame Extraction. Camera Auto Capture captures the frames in real time through the mobile camera. These frames are the input for Boundaries Detection algorithm. The boundaries algorithm processes the captured frames to find the rectangular boundaries of the object, which triggers the auto capture event. To implement this functionality, several packages and algorithms were used as illustrated in the following.

Fig. 3.
figure 3

Detecting paper’s boundaries

Fig. 4.
figure 4

Detecting boundaries final result

figure c

5.2 Technologies Used

Google Cloud Vision API provides powerful Image Analytics capabilities as easy to use APIs. It enables application developers to build applications that can see and understand the content within the images. The service enables customers to detect a broad set of entities within an image from everyday objects to faces and product logos [8].

OpenCV is an open source computer vision and machine learning software library mainly aimed at real-time computer vision applications and its originally developed by Intels research center [13].

Voice Over is a screen reader built into Apple Inc.s macOS, iOS, tvOS, watchOS operating systems. By using VoiceOver, the user can access their Macintosh or iOS device based on spoken descriptions. The feature is designed to increase accessibility for blind and low-vision users [4].

5.3 Walk-Through the System

See Fig. 5.

Fig. 5.
figure 5

Application main screens

6 Evaluation

The components of Nateq system are tested by applying different measures and testing approaches. The testing process is divided into two sub-tests: System test which is conducted on the application itself to examine the resources and time consumption, and the usability test which is conducted on the applications end users to observe and record their interaction with it. The data gathered from both sub-tests are considered for the applications reliability and satisfaction in terms of functions and user experience.

6.1 System Testing

The system testing examines the applications consumption levels of resources, the efficiency of the boundaries detection algorithm, the speed and accuracy of the text extraction operation.

Resources Consumption Test. Table 2 shows the applications resources consumption according to different applications status.

Table 2. Resources consumption

When the application launches, it takes about 5.2 MB of the phones memory. When camera works, it processes the frames which results in an increase in the apps memory allocation to 7.7 MB while the CPU usage is about 46%. When the captured image is sent to the text extraction server, the application requires a network connection to upload the image to the server where the image size should not exceed 2 MB and the downloading amount depends on the text written in the captured image. Regarding the battery consumption, the energy required will be high because of the network overhead required to establish the connection [2].

Boundaries Detection Test. Two testing approaches were applied on Boundaries Detection algorithm, performance and speed testing.

Performance Testing: The ability to identify and detect the boundaries of rectangular objects were tested with different background situations (modes) Includes:

  1. 1.

    Test detection in solid color background with high contrast between the background and the object colors.

  2. 2.

    Test detection in solid color background with low contrast between the background and the object colors.

  3. 3.

    Test detection with wobbly (Textured) background.

The evaluation is performed by applying 30 different sample images. Table 3 shows the testing results and success rate along the previously mentioned three cases.

Table 3. Boundaries performance testing results

According to the results shown above, following is the constraints on the boundaries detection:

  1. 1.

    The contrast between the background and the object must be high for better detection.

  2. 2.

    Solid color backgrounds provides better detection rate.

  3. 3.

    Reflected light on the object surface decreases the detection rate.

Speed Testing: The average execution time for the algorithm is 0.05 s, which is considered good compared to Apples CIDetector API [3], which detects rectangles in 0.03 s. The execution time for our algorithm is calculated by running the algorithm in real time and taking the average of 500 different readings.

OCR Test

Response Time Testing: The speed of the OCR provided in the application is tested under different Internet speed. Table 4 shows (in seconds) the effect of the Internet provided on the device in the OCR processing speed.

Table 4. OCR response time in seconds

Extraction Accuracy Testing: The accuracy is tested by applying the OCR on 40 images as a sample size with different fonts styles. The testing is divided into to categories; Testing on document fonts (books, papers, etc) and products fonts (food products, cleaning products, medicines, etc). Table 5 shows the results in each category.

Table 5. Accuracy testing results

Observations:

  1. 1.

    Google OCR gives the best results when applied to documents/Books fonts.

  2. 2.

    It cannot detect stretched Arabic words.

6.2 Usability Testing

The goal of conducting the usability test is to evaluate Nateqs interface and its functions on real users. The test will help to determine whether the application design is usable for the user who uses it for the first time. The users feedback which will be gathered during the test will help the team to improve the application.

Tasks to be Evaluated. Six tasks were chosen to test Nateq functionality. Each of which was piloted to determine the suitable performance measures used for each task. An iPhone- expert visually impaired user was timed when doing these tasks and provided us with a baseline to judge the times that participants would take.

  1. 1.

    Task 1: Access the user manual and navigate through the questions.

  2. 2.

    Task 2: Get Text from a photo saved in photo gallery.

  3. 3.

    Task 3: Repeat reading the extracted text.

  4. 4.

    Task 4: Reach copy text button.

  5. 5.

    Task 5: Get text from a photo captured using Manual Capture

  6. 6.

    Task 6: Get text from a photo captured using Auto Capture

In addition to task completion time, the number of navigation clicks and the number of (action) clicks were added as a measurement. On accessibility mode, users flick (swipe) to the left or right to move to the next item on screen and single-tap to hear a description of what is tapped, these counts as navigation clicks. On the other hand, users double-tap on an item to open or activate it, these counts as action clicks.

The following tables (Tables 6, 7 and Table 8) indicates the tasks performed and measurement levels used for each task.

Table 6. Excellent performance
Table 7. Acceptable performance
Table 8. Unacceptable performance

Testing Results for five participants (Tables 9, 10, 11, 12, 13 and 14).

Table 9. Task 1: Access the user manual and navigate through the questions
Table 10. Task 2: Get text from a photo saved in the studio
Table 11. Task 3: Repeat reading the extracted text
Table 12. Task 4: Reach copy text button
Table 13. Task 5: Get text from a photo captured using manual capture
Table 14. Task 6: Get text from a photo captured using auto capture

According to the previous tables, task 1 to task 5 present acceptable results by taking all participants average performance per each task. While task 6 demonstrated some difficulties faced when using the camera. As per users feedback, more instructions were needed to explain how auto capture works.

7 Conclusion

The paper is concluded by listing the findings of project work and testing on the target users:

  1. 1.

    There is no adequate software that guide blind people in directing the camera toward objects while taking photos.

  2. 2.

    The need of descriptive photo gallery, as blind people face difficulties in finding photos, which are currently labeled only by the date of creation.

These could be future research directions in the field of special needs aid.