Abstract:Human action recognition is a hot research topic in video surveillance and human-computer interaction.In the past decades,human recognition algorithms based on single s patio-temporal feature have made great progress,and achieved good results.However,due to different spatio-tem poral features have different characteristics and focus themselves,it is difficult for only one feature to completely describe human action.At the same time,when spatio-temporal interest points are projected into codebook ,the codebook size is relatively large to achieve better performance,which hinders its practical application.Th erefore,in order to solve the problem,human action recognition based on multi-spatio-temporal features is p roposed,which can represent human action well and improve its reality.Large scale experiments on two public and challenging action recognition datasets - KTH and YouTube action datasets show that the proposed m ulti-spatio-temporal features using small size codebook have strong robustness,distinction and stability,who se performance is comparable to the state of the art algorithms.