Researchers in the United States are working on teaching computers to identify and interpret human physical action sequences as they take place. The algorithm they are using could also be applied to the medical sector to point up correct or incorrect movements by patients.
Nowadays artificial intelligence algorithms are capable of detecting a face in a crowd or identifying a person’s emotional state using image recognition. However, when people carry out a series of actions, computer programmes cannot always identify what is really going on. Now two researchers, one from MIT the other from the University of California, Irvine, have developed a new algorithm to recognise what a person is doing as the action sequence is happening. The research has appropriated a type of algorithm used in natural language processing, the computer science discipline that investigates techniques for interpreting sentences written in natural language. This system has already been used for voice recognition in applications such as Siri and Google Voice, and has now proved its ability to recognise and categorise actions from a video.
Understanding the ‘grammatical structure’ of actions
To understand how the algorithm works to decode the range of actions carried out, we first of all need to realise that an action is made up of a number of sub-actions. These sub-actions can be thought of as grammatical elements which form a ‘sentence’, i.e. in this case the overall activity. Hamed Pirsiavash, a postdoctoral research associate at MIT, explains the analogy between a grammatical sentence and an action: “If you have a complex action – like making tea or making coffee – that comprises several sub-actions, we can basically stitch together these sub-actions, regarding each one as something like verb, noun, adjective, and adverb.” The researchers have segmented and classified these grammatical elements as a basis for studying their dataset of YouTube videos. One of the advantages this new activity-recognition algorithm has over its predecessors is that it can make good guesses about partially completed actions, so it can handle streaming video. Partway through an action, it will issue a probability that the action is of the type that it is looking for. It may revise that probability as the video continues, and eliminate any hypotheses that do not correspond to the grammatical structure of the action, but it does not have to wait until the action has been completed to assess it.
Action recognition to assist patients
The researchers tested their algorithm on eight different types of athletic endeavour – such as weightlifting and bowling – using training videos culled from YouTube. They found that, according to metrics standard in the field of computer vision, their algorithm identified new instances of the same activities more accurately than its predecessors. Pirsiavash is particularly interested in potential medical applications of action detection. For example, the ‘grammar’ of properly executed physiotherapy exercises might be quite distinct from improperly performed movements. Similarly, action-detection algorithms could also help determine whether, for instance, elderly patients have remembered to take their medication, and send an alert if they have failed to do so.