Monday, June 3, 2019
Image To Voice Converter Is Software Computer Science Essay
realize To Voice converter Is Softw ar Computer Science EssayImage to Voice converter is parcel or a device to recognize an role and convert it into human character. The purpose of the conversion is to provide communication aid for blind raft to sense what the object in their hand or in front of them. This converter is too suitable for children at the age of three until six years old for early upbringing part.In this witness converter, it consists of kitchen stove processing and sound gene symmetryn. For an jut out processing, it is a series of calculation techniques for analyzing, reconstructing, compressing, and enhancing assures. When an object is inputting, an digit forget captured finished s plentyning or webcam prove and manipulate of the physical body, accomplished development various specialized software applications such as MATLAB and output like a printer or a monitor.Image processing has some(prenominal) techniques, including template unified, KNN (K-Nea rest live), thresholding and etc. For the template twinned, it is a technique for ascending sm all parts of an realize to match with the template shape it is also routine to identify printed characters, numbers, and other small, simple objects. KNN (K-Nearest Neighbour) is an algorithm that can work very well in practice and easy to understand. It is also a lazy algorithm that does not utilisation the training data points to do any generalization. Besides, thresholding technique is one of the most important approaches to paradigm segmentation. It is a non-linear operating theater that can converts a gray-scale video into a binary go for.The purpose of image processing in this depict is to analysis of a picture utilize techniques that can identify shades, colorises and relationships that cannot be observed by the human eye. Besides that, an image processing is used to solve identification problems, i.e. in forensic practice of medicine or in establishing weather maps f rom satellite photos. It assigns with images in bitmapped graphics form that suck up been scanned in or taken with digital cameras. For sound gene symmetryn is to break a sound through window sound library or play a wav file from computing device.Problem StatementNowadays, many visually damage people still using blind mans stick to sense the road of the direction and object in front of them in this society. With just only a plain stick and a pair of covered eye, it is difficult for a human to deal sense of their direction. Probably, they would not know what the objects around the people which had been blind eye. As we can see the economy nowadays is getting worse, most of the people or family members were getting busy on their busy work action they sop up no extra judgment of conviction to spend on the handicap people to give them a good care. In this causal agent, for all the handicap people especially blind people, they have to get use to it on their living style. In orde r than that, this product is also available to help the small kids to cleanse the ability on distinguishing or differentiate the daily use objects. This is the primer coat why the product mentioned above was developed.Project Aim and ObjectiveThe aim of this take care is to develop an Image to Voice converter which able to recognize an image from the webcam and then convert it into sound by window sound library or wav file with good transaction. To achieve the main objective of this project, there are sub-objectives need to be carry through as followsTo develop a unequaled image recognition algorithms for crops and colours for palpable time application using MATLAB.To analyze the performance of the image recognition algorithm in term of trueness and time processing.To develop an algorithm to convert recognized image to verbalize using MATLAB.To analyze the performance of image to voice conversion algorithm.Test the performance of the closed loop interface for the image and sound processing converter system.To develop Graphical User Interface (GUI) of the image to voice converter for national of user determination.Project Scope/LimitationThe scope of this project is to construct a unique image to voice converter within a flow rate of time at woo not to exceed RM200. Referring to this project, it consists of hardware which is webcam and software which is MATLAB. The system of this project is to capture an image using webcam, then recognize an image and generate a sound using MATLAB with several techniques. This product specially created for visually impaired people or to improve small kids learning capability. there was few limitation of this project which specified as followsShape limitationColour limitationResolution limitationDistance limitationLiterature go offImage processing is a technique to convert an image into digital specification and go through some actions on it, so as to get an enhanced image or to collect some advanced information f rom it. It is a kind of signal exemption in which input is image, like word-painting frame or photograph and output may be image or features related with that image. Frequently, image processing institution consist of treating images as deuce dimensional signals art object applying al pointy set signal processing techniques to them1. For the image recognition process can be divided into several algorithms which are image acquisition, image pre-processing, image segmentation, image equateation and image classification. For the image acquisition, it is a digital image that captured by one or a few image sensors, such as various types of light-sensitive cameras, range sensors, tomography devices, radar, ultra- sonic cameras and etc. According to the type of sensor, the outcome of an image data is an generally ii dimensional image, a three dimensional capacity, or an image order. The pixel take accounts usually correspond to strength of light in one or a few spectral bands, but can also be involved many physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance.Image pre-processing is one of the algorithms that can increase the dependability of an optical inspection. This algorithm can be categorise into 2 categories which are image enhancement. Image enhancement requires intensifying the different features of images either for display or analysis targets. The enhancements techniques are edge enhancements, noise filtering, magnifying and sharpening an image. several(prenominal) filter operations which increase or reduce certain image features allow an easier or disruptiveer military rank. For examples, mean filter, normal value filter, wiener filter, and etc. With perpetual use, an image will becomes degraded and has many errors. Image restoration is the process used to restore the degraded image. This process is also used to correct images read from different sensors that show up murky or out of focus2.Next, image segmentation is performed to assemble pixels into salient image areas, for example, areas corresponding to specific surfaces, objects, or inseparable sections of objects. Segmentation could be used for object recognition, occlusion boundary estimation within motion or stereo systems, image density, image editing, or image database. The traditional image segmentation order can be divided into several techniques including gray threshold segmentation method, edge ex piece of groundion method, regional harvest-festival method and split consolidation method and etc. Threshold technique was applied in this project. It is a technique that deals with gray-scale images. For the moment of the influence of noise or illumination, it can be assumed that the majority of pixels belonging to the objects will have a comparatively low gray-level, whereas the basis pixels will have a relatively high gray-level. For example, Black is represented by a gray-level of 0, an d White by a gray-level of 255. Based on this observation, we can divide the pixels in the image into two dominant groups, according to their gray-level. These gray-levels may serve as detectors to distinguish mingled with downplay and objects in the image. On the other hand, if the image is one of smooth-edged objects, then it will not be a pure black and white image hence this would not be able to find two distinct gray-levels characterizing the tushground and the objects. This problem intensifies with the existence of noise3. In order to overcome the ill influence of noise and shading, there are two methods that can solve this problem which are Otsu known as Global Threshold and Neighbourhood known as Adaptive Threshold.For the image representation, all information is commonly represented in binary. This is real of images as well as numbers and text. However, an important differentiation needs to be made between how image data is shown and how it is stored. Displaying includes bitmap representation while storing as a file includes many image formats, such as jpeg and png4. There are few techniques for image representation which are Roundness ratio known as Circularity, Fourier Descriptors and etc.The intent of the image classification procedure is to sort all pixels in a digital image into one of several land cover categories, or themes. This categorized data may then be used to deliver thematic maps of the land cover present in an image. Ordinarily, multispectral data are used to carry out the classification and truly the spectral pattern present within the data for each pixel is used as the numerical basis for categorization. The purpose of image classification is to determine and describe, as a distinct gray level or colour, the characteristics occurring in an image in terms of the object or kind of land cover these characteristics practically express on the ground5. The technique for this algorithm is using template matching and KNN (K-Nearest Neighb our).Table Comparison of image sensors for image acquisition6, 7Types of Image Sensor facultyWeakness1Webcam allow face to face interaction low cost easy to use low closing not portable no optical zoom lenses no auto-focus2Digital Camera high resolution portable with batteries has optical zoom lenses has auto-focus high operating speed slight durability battery consumption faster high cost many daedal government agencyFrom the Table 1, it can be seen that both image sensors have its own strengths and weaknesses. This research will more focus on webcam due to this image sensor is using for this project. Webcam can be used to connect with computer to capture an image for image recognition. On the other hand, it is easy to use and cheaper canvas with digital camera which is more multifactorial and high cost. However, the megapixel of digital camera is higher than webcam...Table Comparison of several types of filter for image pre-processing2, 8Types of filterStrengthWeakness1Med ian filter more robust more smoothing provide good results memory consuming complex computing2 pissed filter intuitive simple to use smoothing not good in sharpen images susceptible to negative outliers3Wiener filter short computation time controls output error straightforward to design results often too blurred spatially ceaselessFrom the Table 2, it can be seen that all filters have its own strengths and weaknesses. This research will focus on two types of filter which are median value filter and mean filter. Median filter have been chosen for this project is because median filter is more robust on average than mean filter and so a not representative pixel in a neighbourhood will not influence the median value significantly. Since the median value needs to be the value of one of the pixels in the neighbourhood, the median filter does not establish new unrealistic pixel values when the filter straddles an edge. This is because of the median filter is better at preserving sharp edges than the mean filter. Also, median filter removes the noise level more than mean filter.Table Comparison of threshold techniques for image segmentation 9, 10Threshold TechniquesStrengthWeakness1Otsu fast ease of coding easy to use less sensitivity assumption of uniform illumination does not use any object structure or spatial coherence complex computation2Neighbourhood produce a good result less computation memory consumption time consumption sensitiveFrom the Table 3, it can be seen that both techniques have its own strengths and weaknesses. Otsus method, named after its inventor Nobuyuki Otsu, is a global threhold that consists of many binarization algorithms11. This method involves iterating through all probable threshold values and computing a measure of propagates for the pixel levels each side of the threshold, i.e. the pixels that can be falls in background or foreground. The purpose is to find the threshold value where the total of foreground and background propagate is at its minimum. Neighbourhood which known as adaptive threshold is used to separate desirable foreground image objects from the background based on the difference in pixel intensities of each region. The differences between both methods were Otsu uses a histogram to threshold the image and the Neighbourhood method uses a histogram to threshold the pixels in a small region/neighbourhood around the pixel. In addition, Otsu methods suffer less errors occur that are caused by the sensitivity of the local algorithms to image noise compare with the Neighbourhood methods.Table Comparison of the two techniques for image representation12Techniques of Image originalStrengthWeakness1Roundness Ratiovery fast algorithmscale, position and gyration invarianthigh accuracy if image shape can be preserved properly after segmentationsusceptible to errors if object shape is changed due to improper segmentation2Fourier Descriptor sensitive speed produce a good result low computation cost overcome the weak discrimination abilityscale, position and rotation invariant difficult to obtain high order invariant moments cannot deal with disjoint shapesFrom the Table 4, it can be seen that both techniques have its own strengths and weaknesses. Roundness is specify in term of a surface of novelty like cylinder, cone or sphere where all marks of the surface alternated by any weather sheet vertical to a common axis in case of cylinder and cone are equal in distance from axis. As the axis and bosom do not exist, measurements have to be made with consultation to surfaces of the figures of revolution only. The circularity of the outline is to measuring roundness12. Fourier Descriptors are used to describe the feature of embodiment of shape. It was founded in the early sixties last century by Cosgriff and Fritzsche. According to the Fourier analysis theory, Fourier coefficients can be often generated by Fourier transformation. Lower frequency coefficients have the general shape of th e signature, and higher frequency coefficients have the more information about the shape. As the harmonic amplitude and the phase angle can represent the Fourier Descriptor, and Fourier coefficients are usually normalized by dividing the first Fourier coefficient separately. Because there are some fast algorithms in computing the coefficient of Fourier series, many recognition systems in machine day-dream using these coefficients as shape features.Table Comparison of several techniques for image classification 13-15Techniques for Image ClassificationStrengthWeakness1 usher matching easy to implement high degree of flexibility high accuracy of spotting shape limitation computation speed susceptible to scaling and rotation2K-Nearest Neighbour easy to implement very effective improve accuracy improve run-time performance poor run-time performance if the training set is larger very sensitive outperformed by more exotic techniques3Neural Network minimize energy function high accurac y easy to use unstable curse of dimensionality space consumptionFrom the Table 1.5, it can be seen that all techniques have its own strengths and weaknesses. This research will focus on two techniques which are Template Matching and K-Nearest Neighbour. The standard template matching technique is known as simple mechanism, high accuracy of detection, and is used as a general model assessment and error estimation. Hence, it plays a very important role in image processing, and is commonly used in object detection and recognition. But the contradiction between rapidity and accuracy is exceptional. The main factors affecting rapidity are searching calculation, and operations of template matching. Appropriately decreasing positions and coincidence computing precision can increase the speed of template matching obviously. That is becoming a focus in this field. Many studies focus on amend the searching algorithm, decreasing the matching times by decreasing the matching points on the tem plate of images, which need to be detected so that rapidity is realized. The ordinary algorithms are pyramid algorithm, genetic algorithm and so on. Each matching operation is based on the template matching, thus it is necessary to pay precaution to improving the computation speed of template matching fundamentally14. The intuition underlying Nearest Neighbour Classification is quite straightforward, examples are classified based on the class of their nearest neighbours, it is often useful to take more than one neighbour into account so the technique is more commonly referred to as K-Nearest Neighbour (KNN) Classification where k-nearest neighbours are used in determining the class. Since the training examples are needed at run-time, i.e. they need to be in memory at run-time it is sometimes also called Memory-Based Classification. Because induction is delayed to run time, it is considered a Lazy Learning technique13..Analysis on Similar Products and Paper LiteraturesOral Image to Voice Converter by Takaaki HASEGAWA and Keiichi OHTANI16In this typography, the authors propose a new speech communication system to convert viva image into voice, Image input Microphone. This system synthesizes the voice from only the oral image. This system provides high security and is not affected by acoustic noise, because actual note is not always necessary to input. Moreover, since the voice is synthesized without recognition, this system is independent of languages.Simulations to convert oral image to voice about Japanese five vowels are carried out as basic investigation. A vocal tract area function is estimated from the oral image, and PARCOR synthesis filter is obtained from the vocal tract area function. The PARCOR synthesis filter is driven by a quiver train. The performance of this system is evaluated by hearing tests of the synthesized voice. As a result, audible voice has been synthesized and the mean recognition rate of Japanese five vowels has been 91%.This pa per describes a system to convert oral image into voice with considering humans lip-reading ability. In the proposed system, the voice is directly synthesized only from the oral image without recognition, and actual utterance is not always necessary to input. They use both the feature of a tongue and the feature of lips obtained from the oral image. Therefore this system is not affected by the acoustic noise, and simultaneously, it provides high security because of no utterance input capability.The system structure of this product is using a vocal tract area function which is equivalent to the transfer function of the vocal tract as a parameter. Indirect means synthesis via the vocal tract area function. The vocal tract area function is obtained from the PARCOR analysis of speech signals, and speech signals are synthesized by inverse processing of PARCOR analysis. Therefore if the vocal tract area function is estimated from oral image signals, they can convert the oral image to the corresponding voice. Human utters various voice by changing the vocal tract, and each articulator moves not independently but cooperatively in utterance, It is generally known that the information of articulation is obtained from lip-reading.Software ComparisonTable below shows that the two comparison of the software between MATLAB and C++.Table Comparison of software between MATLAB and C++17Types of SoftwareStrengthWeakness1MATLAB easy to learn fast numerical algorithms low-budget software fast development slow processing complex computation2C++ mature standard large community fast complex computation difficult to debug low level programmingFrom the Table 6, it can be seen that both types of software have its own strengths and weaknesses. MATLAB is software that has been widely used in image processing and computer vision community. Multiple image analysis function has been build into this software it is very useful image analysis tools for end user. C++ is a standard template library (STL), computer graphics, and image processing. Based on C++ template mechanism, the library accepts all C++ build-in types as the image data, although certain functions are only valid to subset of build-in types. MATLAB has been selected due to the project analysis characteristic. MATLAB version R2010b will be used to analyze the image quality and performance in this project.Project MethodologyThis project has been divided into hardware and software. For the hardware section is the webcam as the input and speaker as the output. For the software section is using MATLAB to recognize image to sound with several image processing techniques.Block DiagramWebcamImage Segmentation(Thresholding)Image encyclopedism(Acquire image)Image Preprocessing(Median filtering)MATLABImage Representation(Roundness Ratio)Sound Generation(WAV file)Image Classification (Template Matching using KNN)SpeakerFigure 1 Block diagram of Image to Voice converter.The block diagram shown in Figure 1 is the b asic concept on the system interface that needed to be carried out. Base on the block diagram, first prepared a webcam. Then, capture the image in front of the webcam. After that, perform a median filtering in image pre-processing using MATLAB. It will filtered unwanted signal or noise inside the image. Next is image segmentation, referring to the literature review, the most suitable method is using Otsus method in thresholding techniques to convert grayscale image into binary image to do segmentation. Secondly, find the largest object and do the image representation using roundness ratio to calculate the ratio of the largest object to determine which one is the nearest to the template ratio. Next stage is image classification, using template matching with KNN techniques to find the small part of the image to match with the template image.After matching done, it will automatically generate a sound from the computer with WAV file.Flow ChartStartAcquire image from webcamPerform median filteringColour Space ConversionThresholding using OtsuImage labellingFind the largest objectImage Representation-roundness ratioTemplate matching using KNNIs the image matched?NoYesGenerate SoundFigure 2 Flow chart of Image to Voice converter.Based on Figure 2, before the root system of image recognition, first, acquire an image in front of the webcam, and then the acquired image will go through image enhancement process to perform median filtering to filter some unwanted noise and sharpening the image. After that, the image will perform a colour space conversion which is convert the image colour space to another colour space, i.e. RGB, HSV, YCbCr and etc. The purpose of converting the colour space is to ensure that the converted image to be as same as the mathematical to the original image. Next, perform a threshold technique using Otsus method to reckon a measure of spread for the pixel levels each side of the threshold. The reason of doing this is to separate the objects fro m the background. Once the thresholding technique is done, perform a image labelling by taking the outside lines in the image and label them as occluding the background. After that, find the largest object and do the image representation using roundness ratio to calculate which object is similar to the template ratio. Then, perform a template matching techniques to find a match between the template and a portion of the image. The template that most closely matches the object is then found using the KNN method to do a matching system with the database image. If the data is matched, it will generate a sound automatically by using MATLAB to load the wav file from the computer or laptop. After that, it will repeat the procedure starting from the first step. If the data is unmatched, it wont generate a sound and it will go back to the first step and repeat the procedure again until the data is matched.Projects MethodMedian FilterMedian filters are nonlinear rank-order filters based on r egenerate each element of the source vector with the median value, taken over the fixed neighbourhood of the processed element. These filters are widely used in image and signal processing applications. The purpose of median filtering is to removes impulsive noise, while keeping the signal blurring to the minimum18.Otsu MethodOtsus method is a widely used method of segmentation, also known as the maximum infra-class variance method or the minimum inter-class variance method. This method involves iterating through all the possible threshold values and calculating a measure of spread for the pixel levels each side of the threshold, i.e. the pixels that either falls in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum11.Roundness Ratio/CircularityRoundness is specify as a condition of a surface of revolution like cylinder, cone or sphere where all points of the surface intersected by any plane perpendi cular to a common axis in case of cylinder and cone. Since the axis and centre do not exist physically, measurements have to make with reference to surfaces of the figures of revolution only. For measuring roundness, it is only the circularity of the contour which is determined12.Template MatchingThe classical template matching method is charactered as simple mechanism, high accuracy of detection, and is used as a general model evaluation and error estimation. Therefore, it plays a very important role in image processing, and is widely used in object detection and recognition. It is a technique for finding small parts of an image to match with a database image14.K-Nearest Neighbour (KNN)K-Nearest Neighbour (KNN) is a branch of simple classification and regression algorithms. It can be defined as a lazy method. It does not use the training data points to do any generalization. Although classification remains the primary application of KNN, it can use to do density estimation also. Si nce KNN is non parametric, it can do calculation for arbitrary assignation19.Project SpecificationThis project is divided into 3 main sections which are hardware, software and project estimate cost.HardwareThe hardware was using for this project is Logitech HD Webcam C310, below is the basic requirement of the webcamlogitech-hd-webcam-c310.pngFigure 3 Logitech HD Webcam C31020Windows Vista, Windows 7 (32-bit or 64-bit) or Windows 81 GHz512 MB RAM or more200MB hard drive spaceInternet connectionUSB 1.1 port (2.0 recommended)SoftwareThe software for this project is using MATLAB for image recognition and sound generation.Project Estimate CostThe estimate cost for this project is RM89 which was the Logitech HD Webcam C310, because this project was basically software based project and the software to be used is MATLAB from college engineering lab.Gantt Chart
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.