Wednesday, January 6, 2010

AIDE-D-VOIX HOME AUTOMATION THROUGH SPEECH RECOGNITION

AIDE-D-VOIX

Abstract

Home automation is the technology that enhances the interactivity and autonomy of a home. It is a field with potential explosive growth due to the recent rapid improvements in computing power.

Speech recognition is the ability of a computer system to respond accurately to verbal commands. Speech recognition makes use of specific AI (artificial intelligence) rules to determine what words the speaker is speaking. Speech recognition programs, allows people to give command and enter data using their voices rather than a mouse or keyboard.

Objective

The main aim of designing this software is to provide a tool of accessibility to individuals who have physical or cognitive difficulties, impairments, and disabilities.

A software program is developed for recognizing the speech commands. It derives the input from the user in form of speech then recognizes it and performs according to the conditions specified in the code and corresponding appliance is activated.

“VOICE OUT YOUR SILENT THOUGHTS”
Introduction
Home Automation through speech recognition is the basic concept of designing this product. A software product is developed to control the home appliance through speech commands. The user gives the speech command and it has to be recognised by the system. Then after the speech command is identified, and then comparison is done with the commands given in the source code. If the condition satisfies then a signal is generated to the control corresponding appliance.
What is Home Automation?
Home automation is a field within building automation, specializing in the specific automation requirements of private homes and in the application of automation techniques for the comfort and security of its residents. Although many techniques used in building automation (such as light and climate control, control of doors and window shutters, security and surveillance systems, etc.) are also used in home automation, additional functions in home automation include the control of multi-media home entertainment systems, automatic plant watering and pet feeding, and automatic scenes for dinners and parties.
The main difference between building automation and home automation is, however, the human interface. When home automation is installed during construction of a new home, usually control wires are added before the drywall is installed. These control wires run to a controller, which will then control the environment.
What is Speech Recognition?
Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as voice recognition) is the process of converting a speech signal to a set of words, by means of an algorithm implemented as a computer program.Speech recognition systems, depending on several different factors, could have a wide performance range as measured by word error rate. These factors include the environment, the speaking rate of the speaker, the context (or the grammar) being used in recognition. Speech recognition presently comes in two styles: discrete speech and continuous speech. The older technology, discrete speech recognition, operates by requiring that the user speak one - word - at - a - time. A newer technology, continuous speech recognition, allows the user to dictate by speaking at a more or less normal rate of speech. Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. Part of the confusion mainly comes from the mixed usage of the term speech recognition and dictation.
Speaker-dependent dictation systems requiring a short period of training can capture continuous speech with a large vocabulary at normal pace with a very high accuracy.
Now-a-days , speaker independent softwares with higher efficiency have been developed. Such softwares requires very less training and has a vast collection od database. A partial trainng requiring only very less time is needed for further increasing its efficiency to the maximum.
Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy (getting one to two words out of one hundred wrong) if operated under optimal conditions. The process of speech recognition:








Who Makes Speech Recognition Software?
There are many developers involved in speech recognition, from hardware manufacturers who are trying to build the feature directly into computer hardware, to software developers who write packages for existing PCs. Among the many participants are:
Dragon Systems. Dragon Systems is the leading developer of software-based speech-recognition systems, with several different packages available depending on the user's needs. Dragon System’s Dragon Naturally Speaking Deluxe Edition includes the ability to recognize continuous speech.
IBM. IBM is another leading speech recognition software developer. With IBM’s software, you can surf the Internet hands-free.
Microsoft. Microsoft is working on speech recognition and is trying to build it directly into the operating system so any new PC would automatically be speech-recognition ready.
What Are The System Requirements?
At a minimum, you need:
A Pentium 133 MHz processor
32 MB of RAM
A high-quality, noise-reducing microphone
A 16 bit sound card
Applications of Speech Recognition
Home Automation
Command recognition - Voice user interface with the computer
Dictation
Interactive Voice Response
Medical Transcripion
Pronunciation Teaching in compter-aided language learning applications
Automatic Translation etc…



Why choose Speech Recognition?

The capabilities of speech recognition technology are developing at a remarkable pace. Speech recognition technologies that were only dreamed of a few years ago are now available for everyone at a reasonable cost. Speech recognition on personal computers as an aid for people with mobility impairments. the advantages of speech recognition are not convenience but out of necessity. Many people with disabilities who use the technology go from complete dependence on others, to having the independence necessary to complete their work hands-free.

Benefits Of Speech Recognition:
Speech recognition offers many benefits, from fast, accurate data entry to help for people with disabilities. These benefits include:
Protects Against Repetitive Stress Injury. If you do not have to type or use a mouse, at least not very often, then you are less likely to receive a repetitive stress injury like carpal tunnel syndrome.
Frees Your Hands and Eyes for Other Jobs. If you do not have to sit in front of your PC and watch the screen or keyboard, you are free to read your notes while working, pace the room while dictating, or write and stretch at the same time.
Aids in Data Entry. Speech recognition allows a typing rate that averages around 45 to 65 words per minute; with some users achieve scores as high as 90 words a minute. A data entry clerk may type 80 words per minute, but cannot do it all day without pausing to walk around and stretch, so the human and speech-recognition rates can average out to the about the same number. Additionally, speech recognition nearly always finds the exact word, so it’s spelling is nearly always correct.
Provides PC Access for People with Disabilities. For those users who physically cannot type, or have problems writing or using a keyboard, speech recognition not only opens up the world of computers, it can help open up.
Cost Savings. Although they initially cost several thousand dollars, today's speech-recognition packages start at less than a hundred dollars and go up from there. Considering the cost of a well-trained legal or medical secretary, the cost of work-related injuries, and the benefits of increased productivity, speech-recognition software can quickly pay for itself.
Requirements
PC
Microphone
Speech recognition engine
Interrupt Inpout32.dll
Relay Circuit (+5V relay)
Appliance


Methodology
A typical complete speech recognition process consists of the following parts: (1) sound converter (2) Fragmentation, (3) Recognition.

Sound Acquisition: The user voice is captured with the help of mic in a handset.
Sound Conversion: The digital sound captured by the sound card through a mic is converted in to a more manageable format. The converter translates the stream of amplitudes that form the digital sound wave in to its frequency components. It is still a digital representation, but more akin to what a human ear really perceives.
Fragmentation: The next stage is the identification of phonemes - the elementary sound that is building blocks of words. Each frequency component of sound is mapped to a specific phoneme. This process actually finishes the conversion from sounds to words.
Recognition: The final step is to analyze the string. A grammar, the list of words known to the program, lets the engine to associate the phoneme with particular word.

Recognition procedure is divided into two consecutive stages depending on data set and test set. It consists of two stages:

(1) Training (for data set)
(2) Comparison and Classification (for test set)


(1) Training: The words, which have to be recognized, need to be added in the database provided in the software. The words can be dynamically added to the database. Some level of training is required to be done for accurate recognition.
(2) Comparison: At this stage, comparison is done with the help of the generated word and the words on the program. Based on the result the appropriate function is performed.

System Architecture










Theory: Components of Architecture
The components in the architecture are:
1.PC
2.Circuit & Relay
3.Appliance.

PC: There are three essential components in the PC:
1.User program
2.Speech recognition engine
3.Interrupt.
Speech recognition engine: The recogniser used here is speaker-independent software. It has an inbuilt database containing very large number of words in it. The user program activates the recognition engine when it recognizes a sound. Initially the desired words for the control of each appliance provided in the user program are loaded in to the engine database. The engine splits the received string in to phonemes and tries to group the homophones together from vast collection of words in the database. These grouped words are then compared with a set of words, which are already saved in the database. If there occurs a match, then that word is returned to the user program.

User program: Then the checking is performed between the returned string from engine & a set of options in user program. If the condition satisfies, the signal containing the data is sent to port using the interrupt INPOUT32.dll. This signal is transferred to the relay circuit, which then forwards the activation or deactivation signal to the appliance.
Interfacing PC – INPOUT32.DLL:
Works seam less with all versions of windows (WIN 98, NT, 2000 and XP)

Using a kernel mode driver embedded in the dll.

No special software or driver installation required.

Driver will be automatically installed and configured automatically when the dll is loaded.

No special APIs required only two functions Inp32 and Out32
Can be easily used with VC++ and VB



The two functions exported from inpout32.dll are
1) 'Inp32', reads data from a specified parallel port register.
2) 'Out32', writes data to specified parallel port register.












After the signal is generated from the interrupt and sent to port then, it is transferred to the relay circuit, which in turn passes it to the corresponding appliance.
What is a port?
A port contains a set of signal lines that the CPU sends or receives data with other components. We use ports to communicate via modem, printer, keyboard, mouse etc. In signaling, open signals are "1" and close signals are "0" so it is like binary system. A parallel port sends 8 bits and receives 5 bits at a time. The serial port RS-232 sends only 1 bit at a time but it is multidirectional so it can send 1 bit and receive 1 bit at a time...
Parallel Port: We use parallel port here, since simultaneous control so maximum of eight appliances is possible.

Signal
BIT
PIN
Direction
-Strobe
¬C0
1
Output
+Data Bit 0
D0
2
Output
+Data Bit 1
D1
3
Output
+Data Bit 2
D2
4
Output
+Data Bit 3
D3
5
Output
+Data Bit 4
D4
6
Output
+Data Bit 5
D5
7
Output
+Data Bit 6
D6
8
Output
+Data Bit 7
D7
9
Output
-Acknowledge
S6
10
Input
+Busy
¬S7
11
Input
+Paper End
S5
12
Input
+Select In
S4
13
Input
-Auto Feed
¬C1
14
Output
-Error
S3
15
Input
-Initialize
C2
16
Output
-Select
¬C3
17
Output
Ground
-
18-25
Ground
Circuit & Relay:
Inverting the input
: In the Windows operating system, the signal is sent to all the ports while booting so at that time appliances should not be activated to avoid it signal is inverted. For activation and deactivation of appliance corresponding coding is written. For inverting, NOT (IC 7404) gate is used.




Amplifying the input: The voltage (~ +2.3V) coming out of PC is low to activated relay (+5V). So, we require an amplifying circuit to increase the voltage. The circuit is designed as follows:







Relay: A relay is an electrically operated switch. Current flowing through the coil of the relay creates a magnetic field, which attracts a lever and changes the switch contacts. The coil current can be on or off so relays have two switch positions and they are double throw (changeover) switches. According to the voltage which it receives and thereby controls the appliance activation / deactivation.

The relay's switch connections are usually labeled COM, NC and NO:
COM = Common, always connect to this; it is the moving part of the switch.
NC = Normally Closed, COM is connected to this when the relay coil is off.
NO = Normally Open, COM is connected to this when the relay coil is on.






Appliance: Thus the signal is sent to the appliance. The appliance that can be activated in normal 230V supply can be automised in this system.
Work Plan
PHASE I:
In the Speech Recognition phase, the speech engine recognizes user speech commands and accuracy of the software is tested.
PHASE II:
After the user commands have been recognized, the user program generates the wireless signal from the PC.
PHASE III:
The signal is generated by the interrupt containing the data, which is to be passed to the relay circuit.
PHASE IV:
The amplifying circuits and the electronic switches/relays are designed.
PHASE V:
Activation or deactivation of the appliance.
Future Enhancements
Wireless automation
Multi lingual Speech Mode
Number of appliance can be increased
Conclusion
This supports effective controlling of appliances by the disabled. This method of automising the home appliances serves as a communication aid to the mobility-impaired people.
References

[1] J. Neto, N. Mamede, R. Cassaca, L. Oliveira, "The development of a multi-purpose Spoken Dialogue System", Proc. Eurospeech 03, Genéve, Swiss, 2003.
[2] H. Meinedo, D. Caseiro, J. Neto and I. Trancoso, "AUDIMUS.MEDIA a Broadcast News speech recognition system for the European Portuguese language", Proc.PROPOR'03, Faro, Portugal, 2003.
[3] S. Paulo and L. Oliveira, "Multilevel Annotation Of Speech Signals Using Weighted Finite State Transducers", Proc. 2002 IEEE Workshop on Speech Synthesis, Santa Monica, USA, 2002.
Angers, France, 2003.
[4] D. Traum, "Speech Acts for Dialogue Agents", UMIACS,
Univ. Maryland, USA, 1999.

1 comment: