Human Activity Recognition

Problem Description

The human activity detection has many applications in several fields like biometrics, surveillance, help monitor aged people in home environments etc . Different types of sensors can be used to address this task. The use of multi modal sensors for human activity recognition is increasing these days. Data from four different temporarily synchronised modalities is available. Using this data of either all the types or any single type the human activities must be classified.

Data Set Explanation

The dataset is a freely available dataset named UTD-MHAD, that has four temporarily synchronized data modalities. These modalities include RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions performed by 8 subjects (4 males, 4 females). Each subject repeated each action 4 times. After removing three corrupted sequences, the dataset includes 861 data sequences.


The approach employed is the Hidden Markov Model (HMM), which works very well with sequential data and is essentially a Markov process with hidden and unobservable states. The actions (or output) are visible to the viewer, but the sequences that lead to the output are hidden. The model was trained on the skeleton, inertial and depth data for the first 3 actions - swipe left, swipe right and wave.


The results of the testing were favourable in general. The skeleton model achieved an accuracy of 67%, inertial achieved 94% and depth, 67%. The inertial model was the most accurate due to the nature of the data since the actions were only recorded in a few 3 seconds and there was insignificant data loss, the model would recognize the actions very well. As for the depth model, it depends on visual details and requires some effort to extract features based on edge detection (performed with the Sobel-Feldman algorithm). Initially the features were extracted and convoluted(CNN) before applying the HMM classification.Since the depth sequences may be sensitive to occlusions and the textures of the images are not as good as that of coloured images,the model turned out to be inaccurate. After applying the edge detection algorithm and the corresponding feature extraction,the accuracy improved considerably to 67%.The results could possibly be improved with a larger dataset acquired through additional sensors.