Project

Video based emotion recognition on social media

Emotions play an important role in shaping how we think and how we behave in many situations. Our action depends upon the surrounding emotions and paying attention to the surrounding emotions helps us take a decision. When we interact with other people, it's important to give cues to them to help them understand how we feel. It may be conveyed in the form of facial expressions and changes in voice pitch. Social communication plays an essential role in our daily lives and relationships and being able to interpret and react to others emotion, which helps us respond to irritated people in the workplace and public places. A video is an effective medium to communicate at personal and professional levels. A video is more effective than image because the contiguous images make it clearer to convey emotions and analyze accurately. 
 
 In this project, a standalone application for emotion recognition with the convolutional neural network (CNN) for image prediction and support vector machine (SVM) for audio prediction is implemented. CNN has characteristic of adaptive feature extraction. It is applied for emotion recognition. Feature extraction is done on basis of RGB value. In feature extraction, the sampling is used for the operation to change or modify the image for the transforming for computation purpose. Pooling is used for reducing the number of parameters in the neural network. It also helps to make feature extraction more accurate by making impervious to scale and orientation. The max pooling is used on basis of max values from a region of image pixel array. The classification of an object is done using a fully connected layer like the classification of emotions from a face. In this project, audio mining includes extracting features from audio, preprocessing on features and labeling and SVM training. The features taken into consideration are Mel Frequency Cepstral Coefficients. These features are used in SVM as input. The output from CNN modules consists of the emotions from images in the video. The extracted audio from video is used as input to SVM module for emotion classification. The result from each module is used for analyzing whether the user selected emotions are present in the video. The dominant emotions are used for classifying the video categories.

Project (M.S., Computer Science)--California State University, Sacramento, 2018.

Emotions play an important role in shaping how we think and how we behave in many situations. Our action depends upon the surrounding emotions and paying attention to the surrounding emotions helps us take a decision. When we interact with other people, it's important to give cues to them to help them understand how we feel. It may be conveyed in the form of facial expressions and changes in voice pitch. Social communication plays an essential role in our daily lives and relationships and being able to interpret and react to others emotion, which helps us respond to irritated people in the workplace and public places. A video is an effective medium to communicate at personal and professional levels. A video is more effective than image because the contiguous images make it clearer to convey emotions and analyze accurately. In this project, a standalone application for emotion recognition with the convolutional neural network (CNN) for image prediction and support vector machine (SVM) for audio prediction is implemented. CNN has characteristic of adaptive feature extraction. It is applied for emotion recognition. Feature extraction is done on basis of RGB value. In feature extraction, the sampling is used for the operation to change or modify the image for the transforming for computation purpose. Pooling is used for reducing the number of parameters in the neural network. It also helps to make feature extraction more accurate by making impervious to scale and orientation. The max pooling is used on basis of max values from a region of image pixel array. The classification of an object is done using a fully connected layer like the classification of emotions from a face. In this project, audio mining includes extracting features from audio, preprocessing on features and labeling and SVM training. The features taken into consideration are Mel Frequency Cepstral Coefficients. These features are used in SVM as input. The output from CNN modules consists of the emotions from images in the video. The extracted audio from video is used as input to SVM module for emotion classification. The result from each module is used for analyzing whether the user selected emotions are present in the video. The dominant emotions are used for classifying the video categories.

Relationships

Items