Pastlens: granting temporal consistency to multi-person pose estimation through longer receptive fields

Reis, Eduardo Souza dos

dc.contributor.advisor	Righi, Rodrigo da Rosa
dc.contributor.author	Reis, Eduardo Souza dos
dc.date.accessioned	2019-05-30T16:29:33Z
dc.date.accessioned	2022-09-22T19:34:26Z
dc.date.available	2019-05-30T16:29:33Z
dc.date.available	2022-09-22T19:34:26Z
dc.date.issued	2019-02-28
dc.identifier.uri	https://hdl.handle.net/20.500.12032/62473
dc.description.abstract	Accurately estimating poses of multiple individuals in unconstrained scenes would improve many vision-based applications. As a few examples: person re-identification, human-computer interaction, behavioral analysis and scene understanding. Through the advancements on convolutional networks’ research, body part detectors are now accurate and can estimate spatial positioning on still images in real-time (30 FPS), for both single- and multi-person scenarios. In turn, multiple individuals interacting in videos impose additional challenges, such as person-to-person occlusion, truncated body parts, additional assignment steps and more sources for double counting. In the last few years, many advancements contributed towards this goal and partially solved some of these challenges. Nonetheless, dealing with long-term person-toperson occlusion is not possible in still images, due to the lack of discriminative features to detect the occluded individual. Most reviewed works solve this problem by collecting motion features that correlate body parts across multiple video frames, exploring temporal dependency. Usually, these approaches either rely only on adjacent frames to keep it close to real-time or process the whole video beforehand, imposing global consistency in an offline manner. Since most of the cited applications rely on near real-time processing in combination with complex human motions, which are not depicted in just a couple frames, we propose the PastLens model. Our main objective is to provide a cost-efficient alternative to the tradeoff between the number of correlated frames and the estimation time. The model impose spatio-temporal constraints to the convolutional network itself, instead of relying on arbitrary designed temporal features. We stretch the receptive field of the mid layers to also include the previous frame, forcing further layers to detect features that correlate poses across the two frames, without losing the per-frame configuration. Moreover, we do not constraint the representation of such features, allowing it to be learned throughout the training process, alongside the pose estimation. By pose estimation and tracking, we refer to the localization and tracking overtime of head, limbs and torso, followed by the assembling of these body parts into poses that correctly encode the scene. We will not evaluate our approach on benchmarks for facial keypoints or gesture recognition. Pose- Track is the dataset of choice for both training and validation steps, since it provides a publicly available benchmark for estimating and tracking poses, in addition to a leaderboard that enable direct comparison of our results with its state-of-the-art counterparts. Experimental results indicate that our model can reach competitive accuracy on multi-person videos, while containing less operations and being easier to attach to pretrained networks. Regarding scientific contributions, we provide a cost-efficient alternative to impose temporal consistency to the HPE pipeline, through receptive field increase only, letting the temporal features’ representation to be learned from data. Hence, our results may lead towards novel ways of exploring temporal consistency for human pose estimation in videos.	en
dc.description.sponsorship	CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior	pt_BR
dc.language	pt_BR	pt_BR
dc.publisher	Universidade do Vale do Rio dos Sinos	pt_BR
dc.rights	openAccess	pt_BR
dc.subject	Estimativa de poses humanas	pt_BR
dc.subject	Human pose estimation	en
dc.title	Pastlens: granting temporal consistency to multi-person pose estimation through longer receptive fields	pt_BR
dc.type	Dissertação	pt_BR

Files in this item

Files	Size	Format	View
Eduardo Souza dos Reis_.pdf	2.719Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Documentos - UNISINOS

Show simple item record