Screen reader requirements

Introduction

From the day of dawn of the computer technology, it has affected the lives of millions of people, including the lives of visually challenged persons. Advances in synthetic speech have lead to the development of screen reader software, which can capture text from the computer and transform it to the audio form which is then used by the visually impaired persons or the persons with low vision.

This technology has provided innumerous opportunities to the visually impaired persons including the jobs as programmers, call centers, venturing in to new fields such as science & math, and barrage of others.

Why another screen reader?

Although screen reading technology has provided such a power to the visually impaired persons, it has hardly reached to the broader population in India than a small fortunate English speaking minority.

Hence there is a need to have screen reader that can empower other visually impaired persons who speak Indian languages.

There have been some efforts to create screen reading applications in India, but they have there own limitations. Some have capability to read text, but besides that it provides hardly any other functionality. E.g. no formatting information is provided. Another application has just enabled hindi character reading, and is being advorttized as a screen reader. Reading a character at a time is of not great help to the user.

Target users:

The application is primarily targeted to the visually impaired persons and Dyslexic persons. Approximately 10 to 20 million persons in India are unable to read or write due to the vision problems and 95 percent of them do not know English. Another potential set of users is the large illiterate population, who can not read, but can understand the spoken Indian language. That population is around 350 million. Another group is the old aged persons who know how to read but would prefer speech assistance due the limitation in their sight.

Inner working of a screen reading system:

A screen reading system has two main parts.

1. Speech synthesizer: A speech synthesizer is the main component of any screen reading system, still it is hardly known. Many people confuse between a speech synthesizer and a screen reader. A speech synthesizer accepts plane or marked up text As input and generates speech output. In India, there are different organizations that are involved in development of speech synthesizers for Indian languages. Some of them are private companies and others are institutes. Everyone of these has there own interface to the speech synthesizer. The problem is to integrate them and provide a unified model for a common platform. One course of action in windows operating system may be to create a layer on top of all the synthesizers by following Speech Application Programming Interface (SAPI) standard proposed by Microsoft.

2. Screen reading application: after describing synthesizers, it is now easy to explain the screen reader’s job. A screen reader monitors information on computer screen and keyboard and sends that information to the speech synthesizer in text form. There are many challenges in development of multilingual screen reader and some of them are common with the standard screen reader. Let us cover the problems with the standard screen reader First. A screen reader has to monitor computer screen and the keyboard to provide the information to the user. Monitoring the keyboard is relatively easy, one can just capture keystrokes and convert with the help of in built keyboard layouts. Monitoring the screen is a complex job particularly in windows based system. By windows based system we mean any system with windowing capabilities. Problems in windows based systems arise due to the fact that each windows is just an array of pixels to the screen, hence making it mandatory to get that information from the program that has written that text on the screen. It is possible only when the program has a well defined interface for obtaining that information or graphics APIs are monitored for text out put and then a record is maintained. In windows operating system almost all the screen components are well defined, and they allow one to get such information in multiple ways. Barebone approach is to get such information through win32 API and preferred approach is to hook into Microsoft active accessibility (MSAA). MSAA provides notification to the interested application and provides standard mechanism to obtain this information. However MSAA happens to be the center of any screen reading application, it still can not provide information for all the components and all the desired forms of information. So one has to rely on component Object Model (COM), win32 API, and at times on display drivers. Then there is the issues of each screen component being different than others. For details see the section on various screen components and there features.

Coming to issues specific to multilingual screen readers, we can mainly focus our attention to language representation code or scripts. The preferred coding is Unicode which has codes for almost any language currently being used. Since Unicode has 16 bits for each character, it can represent upto 65536 possible character values. So if the world was based on Unicode only, it would be easier for programmers to write programs which could understand any language text and handle it appropriately. But unfortunately there are different coding standards used by various applications. So our approach will be of a mixed nature. The program will internally use Unicode but when required it can convert from Unicode to other codes (when a speech synthesizer does not understand Unicode) and other codes to Unicode (when a particular application does not provide Unicode text). Our main focus will be on Unicode due to the fact that current windows and the subsequent once will use Unicode for multi lingual applications. When required a converter can be inserted to accommodate non Unicode applications.

Basic features that are required from a screen reader:

Read text as character, word, line, sentence or paragraph.
Inform about the text attributes such as font (size, type, and style), color, positioning, formatting (paragraph, character).
Echo the text as it is typed echoing options being (none, characters, and words, both).
Work with common applications:

(a) MS word: Microsoft word is a word processor with rich word processing capabilities, and needs to be handled in detail due to its importance in the text processing task. The user should be able to read previous, current and next character, word, line, sentence, and paragraph. Information about tables, format, and alignment should also be provided.

(b) MS Excel: Excel is important due to its abilities in calculative tasks. So reading the cell text (with previous, current and next character, word), information about the current column and row number, ability to read formula and information about the formats.

(c) word pad: ability to read richedit controls will enable wordpad reading. For information about richedit control the next section.

(d) notepad: ability to read multiline edit control will enable notepad, so read next section.

(e) internet explorer: Since internet provides access to huge knowledge base, ability to work with internet explorer is vital to any screen reader. User should be able to read previous, current and next character, word, line, links (information about visited links, same page links and others), headings, paragraphs, frames, lists, image alt text, form filling (including edit box, list box, combo box, button, radio button, check box).

(f) email: email is one of the main stream meadiam of communication, so users should be able to work with email software. Working on internet explorer will enable email reading in outlook express and Microsoft outlook as they internally use the same components.

should handle various controls such as

A. Intrinsic Controls

(1) Button: There are 5 types of button as defined in the platform SDK. They have mainly three state elements, focus, pushed, and checked. The software should get information about all of them, and it should be done when the button has focus. Besides that software needs to get the caption for them.

(A) push button: read the caption of the button, and its state.

(B) Checkbox: caption, and state

(C), radio button: whether it is checked, and caption, number of radio buttons in the group, index of the current radio button.

(D) owner drawn:

(E) group: read the group before the items contained in it are spoken and number of radio buttons.

(2) Combo Box: There are three types of comboBoxes. They differ in the combination of a drop down list and edit control. A drop down list is one which can be expanded when needed, other wise a combo box only takes place similar to edit control or buttons.

(A) Drop-down combo box: This has a drop down list and an edit Control for providing the user option.

(B) Drop-down List box: It only has drop down list, which is used to display more options when needed.

We need to get the following information: Inform the user about the type of the combo box, the combo box current selection, allow command to expand the combo box, the current item if the combo box is expanded, and inform the position of the item in the combo box.

(3) Edit Controls: There are mainly two types of edit controls. Besides that there are other styles about which user should be informed: alinement: left, centered, and right; auto scroll (horizontal or vertical); case: lower, or upper; numerical; read only; password; hide selection; convert to OEM character set; and carriage return is entered or not.

It should also inform about the selected text.

(A) Single line Edit Control: it only has single line and pressing enter does not change line, rather it is passed to the parent. User must be able to read previous, current and next character, and word.

(B) Multi line: As the name suggests, it has more than one lines, and it must allow reading of items other than single line edit control such as, previous, current and next line, sentence and paragraph.

(4) List Box: read the current item, number of items in the list, selection in the list, position of the item in the list.

(5) rich Edit Controls: Ability to read previous, current and next character, word, line, and paragraph. Should also provide information about the formatting including alignments.

(6) Scroll bars: read orientation of the scroll bars, percentage of scroll box movement, and inform on request total scroll range and scrolling units.

(7) static Controls: Static controls are used in dialog boxes to display messages and to label controls so ability them is important. Static needs to be read in entirety if it is a message and label and needs to be read as units of character, word, and line, if read with mouse simulated by keyboard.

B. Windows Common Control:

(1) Animation Controls:

(2) ComboBoxEx Control: same as combo box

(3) Date and Time picker Controls:

(4) Drag List Boxes: same as list box

(5) Flat Scrol Bars: same as scroll bars

(6) Header Controls:

(7) Hot Key Controls

(8) Immage lists:

(9) IP Address Controls: similar to edit boxes with the exception to be able to read each item of IP address separately.

(10) List View Controls: similar to list boxes

(11) Month Calender Controls

(12) Pager Controls

(13) Progress Bar Controls: read the percentage task done.

(14) Property sheets:

(15) Rebar Controls

(16) Statis Bars: similar to stati controls.

(17) Tab Controls

(18) Toolbar Controls

(19) Tooltip Controls: read the tool tip text.

(20) Trackbar Controls:

(21) Tree view: caption of current item, its level in the tree, its number in the current level.

(22) Up-Down Controls: read the percentage movement.

(23) Menu: caption, whether contains sub menu, whether opens a dialog box, or it is checkable if yes then the state.

allow to work on desk top
Allow to use: control panel, start menu, menus of various applications.
Mouse: allow to read screen with mouse via keyboard commands, move it up, down, left, and right to the nearest text or per pixel in predetermined increments.
Facility to label graphics.
Dictionary to modify the pronunciation.
A customization option preferably with the help of a general purpose programming language, by exposing the object model of the screen reader.
Option to change the synthesizers and languages on the fly.

Some implementation options:

For a screen reader, there are basically four options to get the information from the computer. Following sequence is given in the increasing order of dificulty of implementation and the most useful technique appears first.

MSAA: Microsoft Active Accessibility provides access to most of the applications provided by Microsoft. Since it has been designed specifically for the benefit of assistive devices, it is desirable to use it before anything else. Using it is a standard way to retrieve information from the applications, besides, one can get more information regarding the object if it is provided. All the common controls support MSAA, as well as some of the Microsoft applications such as MS word, MS excel, and internet explorer including outlook express.
COM: Component Object Model which is widely used in window based applications also exposes a lot of functionality through automation components. This is useful as it is a standard way of communication between applications, and MSAA is also built over it.
Win32 API: window’s 32 application programming interface can be used whare MSAA and COM don’t work. But getting information in this form is difficult as well as limited structural information is available.
At the end if all else fails, one has to go through the video driver intersept, which is even harder than the win32 API.

Conclusion

Considering the wide spread impact of this technology, it is desirable to build the system quickly so as to obtain maximum benefits of it, and also gain competitive advantage over other players who come with similar products in the future. Another advantage of such a technology will be the cost as aposed to the foreign products which beyond the means of a common person.