VISAPP 2014 Abstracts


Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 109
Title:

Active Contour Segmentation with Affine Coordinate-based Parametrization

Authors:

Q. Xue, L. Igual, A. Berenguel, M. Guerrieri and L. Garrido

Abstract: In this paper, we present a new framework for image segmentation based on parametrized active contours. The contour and the points of the image space are parametrized using a set of reduced control points that have to form a closed polygon in two dimensional problems and a closed surface in three dimensional problems. By moving the control points, the active contour evolves. We use mean value coordinates as the parametrization tool for the interface, which allows to parametrize any point of the space, inside or outside the closed polygon or surface. Region-based energies such as the one proposed by Chan and Vese can be easily implemented in both two and three dimensional segmentation problems. We show the usefulness of our approach with several experiments.

Paper Nr: 334
Title:

Region-constrained Feature Matching with Hierachical Agglomerative Clustering

Authors:

Jung Whan Jang, Mostafiz Mehebuba Hossain and Hyuk-Jae Lee

Abstract: Local feature matching is one of the most fundamental issues in computer vision. Hierarchical agglomerative clustering (HAC) has been effectively used to distinguish inliers from outliers. The drawback of HAC is its large computational complexity which increases rapidly as the number of feature correspondences increases. To overcome this drawback, this paper proposes a region-constrained feature matching in which an image is segmented into small regions and feature correspondences are clustered inside each region. Adjacent segmented regions are merged to form larger regions if the correspondences inside regions are similar. The merge may increase the accuracy of clustering, and consequently, it improves the accuracy of matching operations as well. The proposed region-constrained clustering dramatically reduces the execution time by as much as 500 times compared to the previous clustering while it achieves a similar matching accuracy.

Short Papers
Paper Nr: 72
Title:

Using Channel Representations in Regularization Terms - A Case Study on Image Diffusion

Authors:

Christian Heinemann, Freddie Åström, George Baravdish, Kai Krajsek, Michael Felsberg and Hanno Scharr

Abstract: In this work we propose a novel non-linear diffusion filtering approach for images based on their channel representation. To derive the diffusion update scheme we formulate a novel energy functional using a soft-histogram representation of image pixel neighborhoods obtained from the channel encoding. The resulting Euler-Lagrange equation yields a non-linear robust diffusion scheme with additional weighting terms stemming from the channel representation which steer the diffusion process. We apply this novel energy formulation to image reconstruction problems, showing good performance in the presence of mixtures of Gaussian and impulse-like noise, e.g. missing data. In denoising experiments of common scalar-valued images our approach performs competitive compared to other diffusion schemes as well as state-of-the-art denoising methods for the considered noise types.

Paper Nr: 93
Title:

Calibrating Focal Length for Paracatadioptric Camera from One Circle Image

Authors:

Huixian Duan, Lin Mei, Yanfeng Shang and Chuanping Hu

Abstract: Camera calibration from circles has great advantages, but for paracatadioptric camera, the estimation of intrinsic parameters using circle images is still an open and challenging problem. Previous work proved that the paracatadioptric projection of a circle is a quartic curve. But due to the partial occlusion, only part of the quartic curve is visible on the image plane. Consequently, circle image cannot be directly estimated using image points extracted from the visible part and camera parameters cannot be calibrated. To solve this problem, In this paper, we study the properties of paracatadioptric circle image and application in calibrating the focal length for the case that aspect ratio is 1 and skew is 0. Firstly, we derive the necessary and sufficient conditions that must be satisfied by paracatadioptric circle image. Next, based on these conditions, a new object function is presented to correctly estimate the circle image. Then, we show that the focal length can be computed from the estimated paracatadioptric circle image and the principal point that is estimated from the projected contour of parabolic mirror. Experimental results on both simulated and real image data have demonstrated the effectiveness of our method.

Paper Nr: 112
Title:

Speeding Up Object Detection - Fast Resizing in the Integral Image Domain

Authors:

Michael Gschwandtner, Andreas Uhl and Andreas Unterweger

Abstract: In this paper, we present an approach to resize integral images directly in the integral image domain. For the special case of resizing by a power of two, we propose a highly parallelizable variant of our approach, which is identical to bilinear resizing in the image domain in terms of results, but requires fewer operations per pixel. Furthermore, we modify a parallelized state-of-the-art object detection algorithm which makes use of integral images on multiple scales so that it uses our approach and compare it to the unmodified implementation. We demonstrate that our modification allows for an average speedup of 6.38% on a dual-core processor with hyper-threading and 12.6% on a 64-core multi-processor system, respectively, without impacting the overall detection performance. Moreover, we show that these results can be extended to a whole class of object detection algorithms.

Paper Nr: 115
Title:

Oriented Half Gaussian Kernels and Anisotropic Diffusion

Authors:

Baptiste Magnier and Philippe Montesinos

Abstract: Nonlinear PDEs (partial differential equations) offer a convenient formal framework for image regularization and are at the origin of several efficient algorithms. In this paper, we present a new approach which is based (i) on a set of half Gaussian kernel filters, and (ii) a nonlinear anisotropic PDE diffusion. On one hand, half Gaussian kernels provide oriented filters whose flexibility enables to detect edges with great accuracy. On the other hand, a nonlinear anisotropic diffusion scheme offers a means to smooth images while preserving fine structures or details, e.g. lines, corners and junctions. Based on the calculus of the gradient magnitude and two diffusion directions, we construct a diffusion control function able to achieve precise image regularization. Some quantified experimental results compared to existing PDEs approaches and a discussion about the parameterizing of the method are presented.

Paper Nr: 136
Title:

Restoration of Old Document Images using Different Color Spaces - Restoration of Old Document Images

Authors:

Ederson Marcos Sgarbi, Wellington Aparecido Della Mura, Nikolas Moya, Jacques Facon and Horacio A. Legal Ayala

Abstract: An obstacle in old document interpretation comes from the lack of image quality. Old documents frequently appear with digitization errors, uneven background, bleed-through effect. A new approach based on morphological color operators to restore the color text is presented. The morphological tools are based on three color spaces, HSI well known in morphological processes, YCrCb and Y IQ rarely used in morphological procedures. Experimental results carried onto 100 old documents have proven that using YCrCb and YIQ is as effective as using HSI to recover ancient texts in uneven and foxed background images, without presenting problems in hue ordination.

Paper Nr: 160
Title:

Dynamic Multiscale Visualization of Flight Data

Authors:

Tijmen Klein, Matthew van der Zwan and Alexandru Telea

Abstract: We present a novel set of techniques for visualization of very large data sets encoding flight information obtained from Air Traffic Control. The aims of our visualization are to provide a smooth way to explore the available information and find outlier spatio-temporal patterns by navigating between fine-scale, detail, views on the data and coarse overviews of large areas and long time periods. To achieve this, we extend and adapt several image-based visualization techniques, including animation, density maps, and bundled graphs. In contrast to previous methods, we are able to visualize significantly more information on a single screen, with limited clutter, and also create real-time animations of the data. For computational scalability, we implement our method using GPU-accelerated techniques. We demonstrate our results on several real-world data sets ranging from hours over a country to one month over the entire world.

Paper Nr: 178
Title:

Hand Veins Recognition System

Authors:

João Ricardo Gonçalves Neves and Paulo Lobato Correia

Abstract: Accurate protection systems capable of replacing the traditional passwords and ID cards are essential, for commodity and for security reasons. A hand-vein pattern recognition system is just one of a vast group of biometrics techniques under research, in order to become the reference recognition system. This paper presents a hand vein biometric recognition system that uses the hand blood vessels pattern to identify an individual. All biometric systems have an immense application potential as they present advantages over the traditional identification systems. They are able to work with patterns that are very hard to duplicate, since they are different from person to person, and it is also impossible to lose of forget them, since the biometric characteristics are intrinsically attached to the human body. The developed approach was created with the intent of providing an effective protection system despite having been designed and implemented using inexpensive hardware, in comparison with the biometric recognition systems presently offered at a commercial level. The results show that a reliable system can be produced at a low cost and can be used standalone or in combination with other systems.

Paper Nr: 222
Title:

Tone Mapping for Single-shot HDR Imaging

Authors:

Johannes Herwig, Matthias Sobczyk and Josef Pauli

Abstract: The problem of tone mapping for HDR (high dynamic range) to LDR (low dynamic range) conversion is introduced by a unified framework considering all the usual processing steps. Then the specific problem of single-shot HDR is outlined where special emphasis is taken on the effect of the greater noise floor of those images when compared to the usual exposure bracketing approach to HDR. We herein tailor the popular tone mapping operators proposed by Reinhard for single-shot HDR. A region-based approach for preprocessing any HDR image in order to increase SNR and perceptual sharpness is introduced as an extension to our initial tone mapping framework. The results are compared with respect to specially developed baseline tone mappers and an extensive subjective evaluation is performed.

Paper Nr: 278
Title:

Study of Interference Noise in Multi-Kinect Set-up

Authors:

Tanwi Mallick, Partha Pratim Das and Arun Kumar Majumdar

Abstract: KinectTM, a low-cost multimedia sensing device, has revolutionized human computer interaction (HCI) by making various applications of human activity tracking affordable and widely available. Often multiple Kinects are used in imaging applications to improve the field of view, depth of field and uni-directional vision of a single Kinect. Unfortunately, multiple Kinects lead to IR Interference Noise (IR Noise, in short) in the depth map. In this paper we analyse the estimators for interference noise, survey various imaging techniques to mitigate the interference at source, and characterize them in parallel to a well-known classification system in telecom industry. Finally we compare their performance from reported literature and outline our on-going research to control interference noise by software shuttering.

Paper Nr: 287
Title:

Computational Models of Machine Vision - Goal, Role and Success

Authors:

Tayyaba Azim and Mahesan Niranjan

Abstract: This paper surveys the learning algorithms of visual features representation and the computational modelling approaches proposed with the aim of developing better artificial object recognition systems. It turns out that most of the learning theories and schemas have been developed either in the spirit of understanding biological facts of vision or designing machines that provide better or competitive perception power than humans. In this study, we discuss and analyse the impact of notable statistical approaches that map the cognitive neural activity at macro level formally, as well as those that work independently without any biological inspiration towards the goal of developing better classifiers. With the ultimate objective of classification in hand, the dimensions of research in computer vision and AI in general, have expanded so much so that it has become important to understand if our goals and diagnostics of the visual input learning are correct or not. We first highlight the mainstream approaches that have been proposed to solve the classification task ever since the advent of the field, and then suggest some criterion of success that can guide the direction of the future research.

Paper Nr: 343
Title:

Towards Relative Altitude Estimation in Topological Navigation Tasks using the Global Appearance of Visual Information

Authors:

Francisco Amorós, Luis Payá, Oscar Reinoso, David Valiente and Lorenzo Fernández

Abstract: In this work, we present a collection of different techniques oriented to the altitude estimation in topological visual navigation tasks. All the methods use descriptors based on the global appearance of the scenes. The techniques are tested using our own experimental database, which is composed of a set of omnidirectional images captured in real lightning conditions including several locations and altitudes. We use different representations of the visual information, including the panoramic and orthographic views, and the projection of the omnidirectional image into the uni sphere. The experimental results demonstrate the effectiveness of some of the techniques.

Paper Nr: 373
Title:

Expression, Pose, and Illumination Invariant Face Recognition using Lower Order Pseudo Zernike Moments

Authors:

Madeena Sultana, Marina Gavrilova and Svetlana Yanushkevich

Abstract: Face recognition is an extremely challenging task with the presence of expression, orientation, and lightning variation. This paper presents a novel expression and pose invariant feature descriptor by combining Daubechies discrete wavelets transform and lower order pseudo Zernike moments. A novel normalization method is also proposed to obtain illumination invariance. The proposed method can recognize face images regardless of facial orientation, expression, and illumination variation using small number of features. An extensive experimental investigation is conducted using a large variation of facial orientation, expression, and illumination to evaluate the performance of the proposed method. Experimental results confirm that the proposed approach obtains high recognition accuracy and computational efficiency under different pose, expression, and illumination conditions.

Paper Nr: 385
Title:

Fuzzy-rule-embedded Reduction Image Construction Method for Image Enlargement with High Magnification

Authors:

Hakaru Tamukoh, Noriaki Suetake, Hideaki Kawano, Ryosuke Kubota, Byungki Cha and Takashi Aso

Abstract: This paper proposes a fuzzy-rule-embedded reduction image construction method for image enlargement. A fuzzy rule is generated by considering distribution of pixel value around a target pixel. The generated rule is embedded into the target pixel in a reduction image. The embedded fuzzy rule is used in a fuzzy inference to generate a highly magnified image from the reduction image. Experimental results, which scale factors are three and four, show that the proposed method realizes high-quality image enlargement in terms of both objective and subjective evaluations in comparison with conventional methods.

Paper Nr: 402
Title:

GPU based Parallel Image Processing Library for Embedded Systems

Authors:

Mustafa Cavus, Hakkı Doganer Sumerkan, Osman Seckin Simsek, Hasan Hassan, Abdullah Giray Yaglikci and Oguz Ergin

Abstract: Embedded image processing systems have many challenges, due to large computational requirements and other physical, power, and environmental constraints. However recent contemporary mobile devices include a graphical processing unit (GPU) in order to offer better use interface in terms of graphics. Some of these embedded GPUs also support OpenCL which allows the use of computation capacity of embedded GPUs for general purpose computing. Within this OpenCL support, challenges of image processing in embedded systems become easier to handle. In this paper, we present a new OpenCL-based image processing library, named TRABZ-10, which is specifically designed to run on an embedded platform. Our results show that the functions of TRABZ-10 show 7X speedup on embedded platform over the functions of OpenCV on average.

Posters
Paper Nr: 26
Title:

Kernel-based Adaptive Image Sampling

Authors:

Jianxiong Liu, Christos Bouganis and Peter Y. K. Cheung

Abstract: This paper presents an adaptive progressive image acquisition algorithm based on the concept of kernel construction. The algorithm takes the conventional route of blind progressive sampling to sample and reconstruct the ground truth image in an iterative manner. During each iteration, an equivalent kernel is built for each unsampled pixel to capture the spatial structure of its local neighborhood. The kernel is normalized by the estimated sample strength in the local area and used as the projection of the influence of this unsampled pixel to the consequent sampling procedure. The sampling priority of a candidate unsampled pixel is the sum of such projections from other unsampled pixels in the local area. Pixel locations with the highest priority are sampled in the next iteration. The algorithm does not require to pre-process or compress the ground truth image and therefore can be used in various situations where such procedure is not possible. The experiments show that the proposed algorithm is able to capture the local structure of images to achieve a better reconstruction quality than that of the existing methods.

Paper Nr: 44
Title:

Multi-spectral Flash Imaging under Low-light Condition using Optimization with Weight Map

Authors:

Bong-Seok Choi, Dae-Chul Kim, Wang-Jun Kyung and Yeong-Ho Ha

Abstract: Long exposure shot and flash lights are generally used to acquire images under low-light environments. However, flash lights often induce color distortion, red-eye effect, and they can disturb the subject. The other hand, long-exposure shots are prone to motion-blur due to camera shake or subject-motion. Recently, multi-spectral flash imaging has been introduced to overcome the limitations of traditional low-light photography. Multi-spectral flash imaging is performed by combining the invisible and visible spectrum information. However, common multi spectral flash approaches induce color distortion due to the lower accuracy of the invisible spectrum image. In this paper, we propose a multi-spectral flash imaging algorithm using optimization with weight map in order to improve color accuracy and brightness of image. The UV/IR and visible spectrum images are firstly captured, respectively. Then, to compensate luminance value under low light condition, tone reproduction is performed by using adaptive curve due to image features that is obtained by Naka-Rushton formula. Next, to discriminate uniform regions from detail regions, weight map is generated by using Canny operator. Finally, the optimization object function takes into account the output likelihood with respect to the visible light image, the sparsity of image gradients as well as the spectral constraints for the IR-red channels and UV-blue channels. The performance of the proposed method has been subjectively evaluated using z-score, and we also show that output images have improved color accuracy and lower noise with respect to other methods.

Paper Nr: 55
Title:

Tetrachromatic Metamerism - A Discrete, Mathematical Characterization

Authors:

Alfredo Restrepo Palacios

Abstract: Two light beams that are seen as of having the same colour but that have different spectra are said to be metameric. The colour of a light beam is based on the reading of severel photodetectors with different spectral responses and metamerism results when a set of photodetectors is unable to resolve two spectra. The spectra are then said to be metameric. We are interested in exploring the concept of metamerism in the tetrachromatic case. Applications are in computer vision, computational photography and satellite imaginery, for example.

Paper Nr: 137
Title:

A Combined Calibration of 2D and 3D Sensors - A Novel Calibration for Laser Triangulation Sensors based on Point Correspondences

Authors:

Alexander Walch and Christian Eitzinger

Abstract: In this paper we describe a 2D/3D vision sensor, which consists of a laser triangulation sensor and matrix colour camera. The outcome of this sensor is the fusion of the 3D data delivered from the laser triangulation sensor and the colour information of the matrix camera in the form of a coloured point cloud. For this reason a novel calibration method for the laser triangulation sensor was developed, which makes it possible to use one common calibration object for both cameras and provides their relative spatial position. A sensor system with a SICK Ranger E55 profile scanner and a DALSA Genie color camera was set up to test the calibration in terms of the quality of the match between the color information and the 3D point cloud.

Paper Nr: 159
Title:

Converting Underwater Imaging into Imaging in Air

Authors:

Tim Dolereit and Arjan Kuijper

Abstract: The application of imaging devices in underwater environments has become a common practice. Protecting the camera’s constituent electric parts against water leads to refractive effects emanating from the water-glass-air transition of light rays. These non-linear distortions can not be modeled by the pinhole camera model. For our new approach we focus on flat interface systems. By handling refractive effects properly, we are able to convert the problem to imaging conditions in air. We show that based on the location of virtual object points in water, virtual parameters of a camera following the pinhole camera model can be computed per image ray. This enables us to image the same object as if it was situated in air. Our novel approach works for an arbitrary camera orientation to the refractive interface. We show experimentally that our adopted physical methods can be used for the computation of 3D object points by a stereo camera system with much higher precision than with a naive in-situ calibration.

Paper Nr: 164
Title:

Exemplar-based Human Body Super-resolution for Surveillance Camera Systems

Authors:

Kento Nishibori, Tomokazu Takahashi, Daisuke Deguchi, Ichiro Ide and Hiroshi Murase

Abstract: In this paper, we propose an exemplar-based super-resolution method applied to a human body in a surveillance video. Since persons are usually captured as low-resolution images by a video surveillance system, it is sometimes necessary to perform detection and identification of persons from not only a human face but also from the human body appearance. The super-resolution for a human body image is difficult because the appearances of person images vary according to the color of clothing and the posture of persons. Thus, we focus on the high-frequency components that could restore the lost high-frequency components of the low resolution image regardless to the variation of the clothing. Therefore, the purpose of the work presented in this paper is to apply the exemplar-based super-resolution using high-frequency components for a lowresolution human body image to generate a high-resolution human body image so that both computer systems and humans can identify persons more accurately. As a result of experiments, we confirmed the effectiveness of the proposed super-resolution method.

Paper Nr: 179
Title:

Stabilization of Endoscopic Videos using Camera Path from Global Motion Vectors

Authors:

Navya Amin, Thomas Gross, Marvin C. Offiah, Susanne Rosenthal, Nail El-Sourani and Markus Borschbach

Abstract: Many algorithms for video stabilization have been proposed so far. However, not many digital video stabilization procedures for endoscopic videos are discussed. Endoscopic videos contain immense shakes and distortions as a result of some internal factors like body movements or secretion of body fluids as well as external factors like manual handling of endoscopic devices, introduction of surgical devices into the body, luminance changes etc.. The feature detection and tracking approaches that successfully stabilize the non-endoscopic videos might not give similar results for the endoscopic videos due to the presence of these distortions. Our focus of research includes developing a stabilization algorithm for such videos. This paper focusses on a special motion estimation method which uses global motion vectors for tracking applied to different endoscopic types (while taking into account the endoscopic region of interest). It presents a robust video processing and stabilization technique that we have developed and the results of comparing it with the state-of-the-art video stabilization tools. Also it discusses the problems specific to the endoscopic videos and the processing techniques which were necessary for such videos unlike the real-world videos.

Paper Nr: 219
Title:

A Block Size Optimization Algorithm for Parallel Image Processing

Authors:

J. Alvaro Fernandez and M. Dolores Moreno

Abstract: The aim of this work is to define a strategy for rectangular block partitioning that can be adapted to the number of available processing units in a parallel processing machine, regardless of the input data size. With this motivation, an algorithm for optimal vector block partitioning is introduced and tested in a typical parallel image application. The proposed algorithm provides a novel partition method that reduces data sharing between blocks and maintains block sizes as equal as possible for any input size.

Paper Nr: 243
Title:

Local Regression based Colorization Coding

Authors:

Paul Oh, Suk Ho Lee and Moon Gi Kang

Abstract: A new image coding technique for color image based on colorization method is proposed. In colorization based image coding, the encoder selects the colorization coefficients according to the basis made from the luminance channel. Then, in the decoder, the chrominance channels are reconstructed by utilizing the luminance channel and the colorization coefficients sent from the encoder. The main issue in colorization based coding is to extract colorization coefficients well such that the compression rate and the quality of the reconstructed color becomes good enough. In this paper, we use a local regression method to extract the correlated feature between the luminance channel and the chrominance channels. The local regions are obtained by performing an image segmentation on the luminance channel both in the encoder and the decoder. Then, in the decoder, the chrominance values in each local region are reconstructed via a local regression method. The use of the correlated features helps to colorize the image with more details. The experimental results show that the proposed algorithm performs better than JPEG and JPEG2000 in terms of the compression rate and the PSNR value.

Paper Nr: 257
Title:

A Novel Fusion Algorithm for Visible and Infrared Image using Non-subsampled Contourlet Transform and Pulse-coupled Neural Network

Authors:

Chihiro Ikuta, Songjun Zhang, Yoko Uwate, Guoan Yang and Yoshifumi Nishio

Abstract: An image fusion algorithm between visible and infrared images is significant task for computer vision applications such as multi-sensor systems. Among them, although a visible image is clear perfectly able to be seen through the naked eyes, it is often suffers with noise; while an infrared image is unclear but it has high anti-noise property. In this paper, we propose a novel image fusion algorithm for visible and infrared images using a non-subsampled contourlet transform (NSCT) and a pulse-coupled neural network (PCNN). First, we decompose two original images above mentioned into low and high frequency coefficients based on the NSCT. Moreover, each low frequency coefficients for both images are duplicated at multiple scales, and are processed by laplacian filter and average filter respectively. Finally, we can fuse the normalized coefficients by using the PCNN. Conversely, we can reconstruct a fused image based on the low and high frequency coefficients, which are fused by using the inverse NSCT. Experimental results show that the proposed image fusion algorithm surpasses the conventional and state-of-art image fusion algorithm.

Paper Nr: 259
Title:

Multi-scale Regions from Edge Fragments - A Graph Theory Approach

Authors:

Wajahat Kazmi and Hans Jørgen Andersen

Abstract: In this article we introduce a novel method for detecting multi-scale salient regions around edges using a graph based image compression algorithm. Images are recursively decomposed into triangles arranged into a binary tree using linear interpolation. The entropy of any local region of the image is inherent in the areas of the triangles and tree depth. We introduce twin leaves as nodes whose sibling share the same characteristics. Triangles corresponding to the twin leaves are filtered out from the binary tree. Graph connectivity is exploited to get clusters of triangles followed by ellipse fitting to estimate regions. Salient regions are thus formed as stable regions around edges. Tree hierarchy is then used to generate multi-scale regions. We evaluate our detector by performing image retrieval tests on our building database which shows that combined with Spin Images (Lazebnik et al., 2003), their performance is comparable to SIFT (Lowe, 2004).We also show that when they are used together with MSERs (Matas et al., 2002), the performance of MSERs is boosted.

Paper Nr: 306
Title:

3D Object Emphasis using Multiple Projectors

Authors:

Shohei Takada, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a method for emphasizing 3D shapes by using patterned light projection from multiple projectors. In this method, we project patterned lights from multiple projectors. Then, the patterned lights are mixed up at the surface of objects. As a result, object regions which are different from preregistered 3D shapes are colored and emphasized visually. In this method, we do not need any computation for image processing, since the image processing is achieved by mixing lights projected from multiple projectors. Furthermore, we do not need to find image correspondences in order to obtain 3D information of objects. In this paper, we propose a method for generating projection patterns for visualizing small difference in 3D shapes such as defects of shape. The efficiency of the proposed method is test by using multiple projectors.

Paper Nr: 353
Title:

SKen: A Statistical Test for Removing Outliers in Optical Flow - A 3D Reconstruction Case

Authors:

Samuel Macedo, Luis Vasconcelos, Vinicius Cesar, Saulo Pessoa and Judith Kelner

Abstract: The 3D reconstruction can be employed in several areas such as markerless augmented reality, manipulation of interactive virtual objects and to deal with the occlusion of virtual objects by real ones. However, many improvements into the 3D reconstruction pipeline in order to increase its efficiency may still be done. In such context, this paper proposes a filter for optimizing a 3D reconstruction pipeline. It is presented the SKen technique, a statistical hypothesis test that classifies the features by checking the smoothness of its trajectory. Although it was not mathematically proven that inliers features performed smooth camera paths, this work shows some evidence of a relationship between smoothness and inliers. By removing features that did not present smooth paths, the quality of the 3D reconstruction was enhanced.

Paper Nr: 365
Title:

Synopsis of an Engineering Solution for a Painful Problem - Phantom Limb Pain

Authors:

A. Mousavi, J. Cole, T. Kalganova, R. Stone, J. Zhang, S. Pettifer, R. Walker, P. Nikopoulou-Smyrni, D. Henderson Slater, A. Aggoun, S. Von Rump and S. Naylor

Abstract: This paper is synopsis of a recently proposed solution for treating patients who suffer from Phantom Limb Pain (PLP). The underpinning approach of this research and development project is based on an extension of “mirror box” therapy which has had some promising results in pain reduction. An outline of an immersive individually tailored environment giving the patient a virtually realised limb presence, as a means to pain reduction is provided. The virtual 3D holographic environment is meant to produce immersive, engaging and creative environments and tasks to encourage and maintain patients’ interest, an important aspect in two of the more challenging populations under consideration (over-60s and war veterans). The system is hoped to reduce PLP by more than 3 points on an 11 point Visual Analog Scale (VAS), when a score less than 3 could be attributed to distraction alone.

Paper Nr: 384
Title:

Switching Median Filter with Signal Dependent Thresholds Designed by using Genetic Algorithm

Authors:

Ryosuke Kubota, Keisuke Onaga and Noriaki Suetake

Abstract: In this paper, we propose a new switching median filter with signal dependent thresholds designed by a genetic algorithm (GA). The switching median filter detects noise-corrupted pixels based on a threshold. Then it restores only the detected pixels. The present switching median filter deals with the random-valued impulse noises, whose distribution is ideally assumed as a uniform distribution. In the present method, the switching median filter, which has two kinds of the thresholds, is introduced. One is switching thresholds to detect the noise, and the other is selecting thresholds to choose the suitable switching threshold. As the suitable selecting threshold, a variance of signals is used. Then all of the switching and selecting thresholds of the proposed switching median filter are automatically optimized by using GA. To optimize the thresholds with GA, distribution distance between the assumed and the detected noises is employed as a fitness function. The validity and effectiveness of the proposed method is verified by some experiments.

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 19
Title:

Polygonal Approximation of an Object Contour by Detecting Edge Dominant Corners using Iterative Corner Suppression

Authors:

Rabih Al Nachar, Elie Inaty, Patrick J. Bonnin and Yasser Alayli

Abstract: A new algorithm to detect straight edge parts which form the contour of an object presented in an image is discussed in this paper. This algorithm is very robust and can detect true straight edges even when their pixel's locations are not straight due to natural noise at the object borders. These straight edges are than used to report and classify contour's corners according to their angle and their adjacent segments lengths. A new technique for polygonal approximation is also presented to find the best set among these corners to construct the polygon vertices that best describe the approximating contour. It starts by eliminating the corners, one after the other using Iterative Corner Suppression (ICS) process. This in turn enables us to obtain the smallest possible error in the approximation. Experimental results demonstrate the efficiency of this technique in comparison with recently proposed algorithms.

Paper Nr: 52
Title:

The World vs. SCOTT - Synthesis of COncealment Two-level Texture

Authors:

Julien Gosseaume, Kidiyo Kpalma and Joseph Ronsin

Abstract: We propose an original method of Synthesis of COncealment Two-level Texture (SCOTT). SCOTT was designed according to the Human Visual System so that the concealment texture is faithful to the visual environment it will be placed in, in terms of forms and colors. The results of simulation prove that the concealment texture is efficient although it is made of simple forms and only a few colors. Even if SCOTT has initially been designed for an application of reducing the visual pollution caused by manmade equipments (antenna, electrical cabinets, distributor boxes, repeater shelters, etc.), it may be used in many applications, such as inpainting, and even in image compression.

Paper Nr: 103
Title:

A Saliency-based Framework for 2D-3D Registration

Authors:

Mark Brown, Jean-Yves Guillemaut and David Windridge

Abstract: Here we propose a saliency-based filtering approach to the problem of registering an untextured 3D object to a single monocular image. The principle of saliency can be applied to a range of modalities and domains to find intrinsically descriptive entities from amongst detected entities, making it a rigorous approach to multi-modal registration. We build on the Kadir-Brady saliency framework due to its principled information-theoretic approach which enables us to naturally extend it to the 3D domain. The salient points from each domain are initially aligned using the SoftPosit algorithm. This is subsequently refined by aligning the silhouette with contours extracted from the image. Whereas other point based registration algorithms focus on corners or straight lines, our saliency-based approach is more general as it is more widely applicable e.g. to curved surfaces where a corner detector would fail. We compare our salient point detector to the Harris corner and SIFT keypoint detectors and show it generally achieves superior registration accuracy.

Paper Nr: 120
Title:

Generic and Real-time Detection of Specular Reflections in Images

Authors:

Alexandre Morgand and Mohamed Tamaazousti

Abstract: In this paper, we propose a generic and efficient method for real-time specular reflections detection in images. The method relies on a new thresholding technique applied in the Hue-Saturation-Value (HSV) color space. A detailed experimental study was conducted in this color space to highlight specular reflections’ properties. Current state-of-the-art methods have difficulties with lighting jumps by being too specific or computationally expensive for real-time applications. Our method addresses this problem using the following three steps: an adaptation of the contrast of the image to handle lighting jumps, an automatic thresholding to isolate specular reflections and a post-processing step to further reduce the number of false detections. This method has been compared with the state-of-the-art according to our two proposed experimental protocols based on contours and gravity center and offers fast and accurate results without a priori on the image in real-time.

Paper Nr: 121
Title:

On the Segmentation and Classification of Water in Videos

Authors:

Pascal Mettes, Robby T. Tan and Remco Veltkamp

Abstract: The automatic recognition of water entails a wide range of applications, yet little attention has been paid to solve this specific problem. Current literature generally treats the problem as a part of more general recognition tasks, such as material recognition and dynamic texture recognition, without distinctively analyzing and characterizing the visual properties of water. The algorithm presented here introduces a hybrid descriptor based on the joint spatial and temporal local behaviour of water surfaces in videos. The temporal behaviour is quantified based on temporal brightness signals of local patches, while the spatial behaviour is characterized by Local Binary Pattern histograms. Based on the hybrid descriptor, the probability of a small region of being water is calculated using a Decision Forest. Furthermore, binary Markov Random Fields are used to segment the image frames. Experimental results on a new and publicly available water database and a subset of the DynTex database show the effectiveness of the method for discriminating water from other dynamic and static surfaces and objects.

Paper Nr: 122
Title:

Segmentation of Optic Disc in Retina Images using Texture

Authors:

Suraya Mohammad, D. T. Morris and Neil Thacker

Abstract: The paper describes our work on the segmentation of the optic disc in retinal images. Our approach comprises of two main steps; a pixel classification method to identify pixels that may belong to the optic disc boundary and a circular template matching method to estimate the circular approximation of the optic disc boundary. The features used are based on texture, calculated using the intensity differences of local image patches. This was adapted from Binary Robust Independent Elementary Features (BRIEF). BRIEF is inherently invariant to image illumination and has a lower degree of computational complexity compared to other existing texture measurement methods. Fuzzy C-Means (FCM) and Naive Bayes are the clustering and classifier used to cluster/classify the image pixels. The method was tested on a set of 196 images composed of 110 healthy retina images and 86 glaucomatous images. The average mean overlap ratio between the true optic disc region and segmented region is 0.81 for both FCM and Naive Bayes. Comparison with a method based on the Hough Transform is also provided.

Paper Nr: 127
Title:

Comparison of Different Color Spaces for Image Segmentation using Graph-cut

Authors:

Xi Wang, Ronny Hänsch, Lizhuang Ma and Olaf Hellwich

Abstract: Graph-cut optimization has been successfully applied in many image segmentation tasks. Within this framework color information has been extensively used as a perceptual property of objects to segment the foreground object from background. There are different representations of color in digital images, each with special characteristics. Previous work on segmentation lacks a systematic study of which color space is better suited for image segmentation. This work applies the Graph Cut algorithm for image segmentation based on five different, widespread color spaces and evaluates their performance on public benchmark datasets. Most of the tested color spaces lead to similar results. Segmentations based on L*a*b* color space are of slightly higher or similar quality as all the other methods. In contrast, RGB-based segmentations are mostly worse than a segmentation based on any other tested color space.

Paper Nr: 171
Title:

Fast Segmentation for Texture-based Cartography of whole Slide Images

Authors:

Grégory Apou, Benoît Naegel, Germain Forestier, Friedrich Feuerhake and Cédric Wemmert

Abstract: In recent years, new optical microscopes have been developed, providing very high spatial resolution images called Whole Slide Images (WSI). The fast and accurate display of such images for visual analysis by pathologists and the conventional automated analysis remain challenging, mainly due to the image size (sometimes billions of pixels) and the need to analyze certain image features at high resolution. To propose a decision support tool to help the pathologist interpret the information contained by the WSI, we present a new approach to establish an automatic cartography of WSI in reasonable time. The method is based on an original segmentation algorithm and on a supervised multiclass classification using a textural characterization of the regions computed by the segmentation. Application to breast cancer WSI shows promising results in terms of speed and quality.

Paper Nr: 183
Title:

Watershed from Propagated Markers based on Morphological Hierarchical Segmentation and Graph Matching

Authors:

André Roberto Ortoncelli and Franklin César Flores

Abstract: Watershed from propagated markers is a generic method to interactive segmentation of objects in image sequences, given by the combination of classical watershed from markers technique to motion estimation. The mask of segmentation, given by the segmentation of the object in the previous frame, is the main parameter to compute a set of markers to segment the same objects in the current frame. This paper introduces a new version of the watershed from propagated markers. In this proposal, the set of markers and its associated model graph are constructed in function of the mask of segmentation. The input graph is constructed given by the hierarchical segmentation of the next frame. The graph matching between the model graph and the input graph provides a pre-segmentation mask that will be used to compute the initial markers to the next frame. Experiments were done to illustrate the performance of the new version and its comparison to methods found in the literature and to previous versions of the watershed from propagated markers.

Paper Nr: 187
Title:

Image Compensation for Improving Extraction of Driver’s Facial Features

Authors:

Jung-Ming Wang, Han-Ping Chou, Sei-Wang Chen and Chiou-Shann Fuh

Abstract: Extracting driver’s facial feature helps to identify the vigilance level of a driver. Some research about facial feature extraction also has been developed for controlled interface of vehicle. To acquire facial feature of drivers, research using various visual sensors have been reported. However, potential challenges to such a work include rapid illumination variation resulting from ambient lights, abrupt lighting change (e.g., entering/exiting tunnels and sunshine/shadow), and partial occlusion. In this paper, we propose an image compensation method for improve extraction of a driver’s facial features. This method has the advantages of fast processing and high adaptation. Our experiments show that the extraction of driver’s facial features can be improved significantly.

Paper Nr: 239
Title:

Edge-based Foreground Detection with Higher Order Derivative Local Binary Patterns for Low-resolution Video Processing

Authors:

Francis Deboeverie, Gianni Allebosch, Dirk Van Haerenborgh, Peter Veelaert and Wilfried Philips

Abstract: Foreground segmentation is an important task in many computer vision applications and a commonly used approach to separate foreground objects from the background. Extremely low-resolution foreground segmentation, e.g. on video with resolution of 30x30 pixels, requires modifications of traditional high-resolution methods. In this paper, we adapt a texture-based foreground segmentation algorithm based on Local Binary Patterns (LBPs) into an edge-based method for low-resolution video processing. The edge information in the background model is introduced by a novel LBP strategy with higher order derivatives. Therefore, we propose two new LBP operators. Similar to the gradient operator and the Laplacian operator, the edge information is obtained by the magnitudes of First Order Derivative LBPs (FOD-LBPs) and the signs of Second Order Derivative LBPs (SOD-LBPs). Posterior to background subtraction, foreground corresponds to edges on moving objects. The method is implemented and tested on low-resolution images produced by monochromatic smart sensors. In the presence of illumination changes, the edge-based method outperforms texture-based foreground segmentation at low resolutions. In this work, we demonstrate that edge information becomes more relevant than texture information when the image resolution scales down.

Paper Nr: 252
Title:

Hierarchical Bayesian Modelling of Visual Attention

Authors:

Jinhua Xu

Abstract: The brain employs interacting bottom-up and top-down processes to speed up searching and recognizing visual targets relevant to specific behavioral tasks. In this paper, we proposed a Bayesian model of visual attention that optimally integrates top-down, goal-driven attention and bottom-up, stimulus-driven visual saliency. In this approach, we formulated a multi-scale hierarchical model of objects in natural contexts, where the computing nodes at the higher levels have lower resolutions and larger sizes than the nodes at the lower levels, and provide local contexts for the nodes at the lower levels. The conditional probability of a visual variable given its context is calculated in an efficient way. The model entails several existing models of visual attention as its special cases. We tested this model as a predictor of human fixations in free-viewing and object searching tasks in natural scenes and found that the model performed very well.

Paper Nr: 260
Title:

Shape Similarity based Surface Registration

Authors:

Manuel Frei and Simon Winkelbach

Abstract: In the last 20 years many approaches for the registration and localization of surfaces were developed. Most of them generate solutions by minimizing point distances or maximizing contact areas between surface points. Other algorithms try to detect corresponding points on the two surfaces by searching for points with same features and align them. However, aligning and localizing self-similar surfaces or surfaces having large regions with approximately constant curvature is still a complex problem. In this paper a new algorithm for registration and matching of surfaces is introduced, which extends an approach maximizing the contact area between the surfaces by surface-based dissimilarity features and thereby solves the problem of registering the problematic surfaces described above. Our evaluation shows the great potential of our approach regarding efficiency, accuracy and robustness for various applications like scan alignment, pottery assembly or bone reduction.

Paper Nr: 290
Title:

M5AIE - A Method for Body Part Detection and Tracking using RGB-D Images

Authors:

Andre Brandao, Leandro A. F. Fernandes and Esteban Clua

Abstract: The automatic detection and tracking of human body parts in color images is highly sensitive to appearance features such as illumination, skin color and clothes. As a result, the use of depth images has been shown to be an attractive alternative over color images due to its invariance to lighting conditions. However, body part detection and tracking is still a challenging problem, mainly because the shape and depth of the imaged body can change depending on the perspective. We present a hybrid approach, called M5AIE, that uses both color and depth information to perform body part detection, tracking and pose classification. We have developed a modified Accumulative Geodesic Extrema (AGEX) approach for detecting body part candidates. We also have used the Affine-SIFT (ASIFT) algorithm for feature extraction, and we have adapted the conventional matching method to perform tracking and labeling of body parts in a sequence of images that has color and depth information. The results produced by our tracking system were used with the C4.5 Gain Ratio Decision Tree, the naïve Bayes and the KNN classification algorithms for the identification of the users pose.

Paper Nr: 295
Title:

Real-time Emotion Recognition - Novel Method for Geometrical Facial Features Extraction

Authors:

Claudio Loconsole, Catarina Runa Miranda, Gustavo Augusto, Antonio Frisoli and Verónica Costa Orvalho

Abstract: Facial emotions provide an essential source of information commonly used in human communication. For humans, their recognition is automatic and is done exploiting the real-time variations of facial features. However, the replication of this natural process using computer vision systems is still a challenge, since automation and real-time system requirements are compromised in order to achieve an accurate emotion detection. In this work, we propose and validate a novel methodology for facial features extraction to automatically recognize facial emotions, achieving an accurate degree of detection. This methodology uses a real-time face tracker output to define and extract two new types of features: eccentricity and linear features. Then, the features are used to train a machine learning classifier. As result, we obtain a processing pipeline that allows classification of the six basic Ekman’s emotions (plus Contemptuous and Neutral) in real-time, not requiring any manual intervention or prior information of facial traits.

Paper Nr: 323
Title:

Key-point Detection with Multi-layer Center-surround Inhibition

Authors:

Foti Coleca, Sabrina Zîrnovean, Thomas Käster, Thomas Martinetz and Erhardt Barth

Abstract: We present a biologically inspired algorithm for key-point detection based on multi-layer and nonlinear centersurround inhibition. A Bag-of-Visual-Words framework is used to evaluate the performance of the detector on the Oxford III-T Pet Dataset for pet recognition. The results demonstrate an increased performance of our algorithm compared to the SIFT key-point detector. We further improve the recognition rate by separately training codebooks for the ON- and OFF-type key points. The results show that our key-point detection algorithms outperform the SIFT detector by having a lower recognition-error rate over a whole range of different key-point densities. Randomly selected key-points are also outperformed.

Paper Nr: 324
Title:

Delineation of Rock Fragments by Classification of Image Patches using Compressed Random Features

Authors:

Geoff Bull, Junbin Gao and Michael Antolovich

Abstract: Monitoring of rock fragmentation is a commercially important problem for the mining industry. Existing analysis methods either resort to physically sieving rock samples, or using image analysis software. The currently available software systems for this problem typically work with 2D images and often require a significant amount of time by skilled human operators, particularly to accurately delineate rock fragments. Recent research into 3D image processing promises to overcome many of the issues with analysis of 2D images of rock fragments. However, for many mines it is not feasible to replace their existing image collection systems and there is still a need to improve on methods used for analysing 2D images. This paper proposes a method for delineation of rock fragments using compressed Haar-like features extracted from small image patches, with classification by a support vector machine. The optimum size of image patches and the numbers of compressed features have been determined empirically. Delineation results for images of rocks were superior to those obtained using the watershed algorithm with manually assigned markers. Using compressed features is demonstrated to improve the computational efficiently such that a machine learning solution is viable.

Paper Nr: 331
Title:

Monte-Carlo Image Retargeting

Authors:

Roberto Gallea, Edoardo Ardizzone and Roberto Pirrone

Abstract: In this paper an efficient method for image retargeting is proposed. It relies on a monte-carlo model that makes use of image saliency. Each random sample is extracted from deformation probability mass function defined properly, and shrinks or enlarges the image by a fixed size. The shape of the function, determining which regions of the image are affected by the deformations, depends on the image saliency. High informative regions are less likely to be chosen, while low saliency regions are more probable. Such a model does not require any optimization, since its solution is obtained by extracting repeatedly random samples, and allows real-time application even for large images. Computation time can be additionally improved using a parallel implementation. The approach is fully automatic, though it can be improved by providing interactively cues such as geometric constraints and/or automatic or manual labeling of relevant objects. The results prove that the presented method achieves results comparable or superior to reference methods, while improving efficiency.

Short Papers
Paper Nr: 48
Title:

Statistical Models of Shape and Spatial Relation-application to Hippocampus Segmentation

Authors:

Saïd Ettaïeb, Kamel Hamrouni and Su Ruan

Abstract: This paper presents a new method based both on Active Shape Model (ASM) and spatial distance model to segment brain structures. It combines two types of a priori knowledge: the structure shapes and the distances between them. This knowledge consists of shape and distance variability which are estimated during a training step. Then, the obtained models are used to guide simultaneously the evolution of initial structure shapes towards the target contours. The proposed models are applied to extract two hippocampal regions on coronal MRI of the brain. The obtained results are encouraging and show the performance of the proposed model.

Paper Nr: 86
Title:

Focus Evaluation Approach for Retinal Images

Authors:

Diana Veiga, Carla Pereira, Manuel Ferreira, Luís Gonçalves and João Monteiro

Abstract: Digital fundus photographs are often used to provide clinical diagnostic information about several pathologies such as diabetes, glaucoma, macular degeneration and vascular and neurologic disorders. To allow a precise analysis, digital fundus image quality should be assessed to evaluate if minimum requirements are present. Focus is one of the causes of low image quality. This paper describes a method that automatically classifies fundus images as focused or defocused. Various focus measures described in literature were tested and included in a feature vector for the classification step. A neural network classifier was used. HEI-MED and MESSIDOR image sets were utilized in the training and testing phase, respectively. All images were correctly classified by the proposed algorithm.

Paper Nr: 91
Title:

A New Algorithm for Objective Video Quality Assessment on Eye Tracking Data

Authors:

Maria Grazia Albanesi and Riccardo Amadeo

Abstract: In this paper, we present an innovative algorithm based on a voting process approach, to analyse the data provided by an eye tracker during tasks of user evaluation of video quality. The algorithm relies on the hypothesis that a lower quality video is more “challenging” for the Human Visual System (HVS) than a high quality one, and therefore visual impairments influence the user viewing strategy. The goal is to generate a map of saliency of the human gaze on video signals, in order to create a No Reference objective video quality assessment metric. We consider the impairment of video compression (H.264/AVC algorithm) to generate different versions of video quality. We propose a protocol that assigns different playlists to different user groups, in order to avoid any effect of memorization of the visual stimuli on strategy. We applied our algorithm to data generated on a heterogeneous set of video clips, and the final result is the computation of statistical measures which provide a rank of the videos according to the perceived quality. Experimental results show that there is a strong correlation between the metric we propose and the quality of impaired video, and this fact confirms the initial hypothesis.

Paper Nr: 96
Title:

Depth-Scale Method in 3D Registration of RGB-D Sensor Outputs

Authors:

Ismail Bozkurt and Egemen Özden

Abstract: Automatic registration of 3D scans with RGB data is studied in this paper. In contrast to bulk of research in the field which deploy 3D geometry consistency, local RGB image feature matches are used to solve the unknown 3D rigid transformation. The key novelty in this work is the introduction of a new simple measure, we call “Depthscale measure”, which logically represents the size of the local image features in 3D world, thanks to the availability of the depth data from the sensor. Depending on the operating characteristics of the target application, we show this measure can be useful and efficient in eliminating outliers through experimental results. Also system level details are given to help scientists who want to build a similar system.

Paper Nr: 118
Title:

A Comparative Evaluation of 3D Keypoint Detectors in a RGB-D Object Dataset

Authors:

Silvio Filipe and Luís A. Alexandre

Abstract: When processing 3D point cloud data, features must be extracted from a small set of points, usually called keypoints. This is done to avoid the computational complexity required to extract features from all points in a point cloud. There are many keypoint detectors and this suggests the need of a comparative evaluation. When the keypoint detectors are applied to 3D objects, the aim is to detect a few salient structures which can be used, instead of the whole object, for applications like object registration, retrieval and data simplification. In this paper, we propose to do a description and evaluation of existing keypoint detectors in a public available point cloud library with real objects and perform a comparative evaluation on 3D point clouds. We evaluate the invariance of the 3D keypoint detectors according to rotations, scale changes and translations. The evaluation criteria used are the absolute and the relative repeatability rate. Using these criteria, we evaluate the robustness of the detectors with respect to changes of point-of-view. In our experiments, the method that achieved better repeatability rate was the ISS3D method.

Paper Nr: 125
Title:

2D Shape Matching based on B-spline Curves and Dynamic Programming

Authors:

Nacéra Laiche and Slimane Larabi

Abstract: In this paper, we propose an approach for two-dimensional shape representation and matching using the B-spline modelling and Dynamic Programming (DP), which is robust with respect to affine transformations such as translation, rotation, scale change and some distortions. Boundary shape is first splitedinto distinctpartsbased on the curvature. Curvature points are critical attributes for shape description, allowing the concave and convex parts of an objectrepresentation, which are obtained by the polygonal approximation algorithm in our approach. After thateach part is approximated by a normalized B-spline curve usingsome global features including the arc length, the centroid of the shape and moments.Finally, matching and retrieval of similar shapes are obtained using a similarity measure defined on their normalized curves with Dynamic Programming.Dynamic programming not only recovers the best matching, but also identifies the most similar boundary parts. The experimental results on some benchmark databases validate the proposed approach.

Paper Nr: 191
Title:

About the Impact of Pre-processing Tools on Segmentation Methods - Applied for Tree Leaves Extraction

Authors:

Manuel Grand-Brochier, Antoine Vacavant, Robin Strand, Guillaume Cerutti and Laure Tougne

Abstract: In this paper, we present a comparative study highlighting the improvements provided by pre-processing tools, such as input stroke or use of distance map for segmentation approaches. We propose in particular to highlight new methods for calculating distance map based on the prediction of changes in local color (published by G. Cerutti et al. in ReVeS Participation - Tree Species Classification Using Random Forests and Botanical Features. CLEF 2012). We study differents methods using thresholding, clustering, or even active contours, tested for an issue of tree leaves extraction. The observation criteria, such as Dice index, SSIM or MAD for example, allow us to analyze the performance obtained by each approach and in particular those of the GAC method, which are better for this context.

Paper Nr: 201
Title:

A Visibility Graph based Shape Decomposition Technique

Authors:

Foteini Fotopoulou and Emmanouil Z. Psarakis

Abstract: In this paper, a new shape decomposition method named Visibility Shape Decomposition (VSD) is presented. Inspired from an idealization of the visibility matrix having a block diagonal form, the definition of a neighborhood based visibility graph is proposed and a two step iterative algorithm for its transformation into a block diagonal form, that can be used for a visually meaningful decomposition of the candidate shape, is presented. Although the proposed technique is applied to shapes of the MPEG7 database, it can be extended to 3D objects. The preliminary results we have obtained are promising.

Paper Nr: 241
Title:

Performance Evaluation of Feature Point Descriptors in the Infrared Domain

Authors:

Pablo Ricaurte, Carmen Chilán , Cristhian A. Aguilera-Carrasco, Boris X. Vintimilla and Angel D. Sappa

Abstract: This paper presents a comparative evaluation of classical feature point descriptors when they are used in the long-wave infrared spectral band. Robustness to changes in rotation, scaling, blur, and additive noise are evaluated using a state of the art framework. Statistical results using an outdoor image data set are presented together with a discussion about the differences with respect to the results obtained when images from the visible spectrum are considered.

Paper Nr: 263
Title:

1-D Temporal Segments Analysis for Traffic Video Surveillance

Authors:

M. Brulin, C. Maillet and H. Nicolas

Abstract: Traffic video surveillance is an important topic for security purposes and to improve the traffic flow management. Video surveillance can be used for different purposes such as counting of vehicles or to detect their speed and behaviors. In this context, it is often important to be able to analyze the video in real-time. The huge amount of data generated by the increasing number of cameras is an obstacle to reach this goal. A solution consists in selecting in the video only the regions of interest, essentially the vehicles on the road areas. In this paper, we propose to extract significant segments of the regions of interest and to analyze them temporally to count vehicles and to define their behaviors. Experiments on real data show that precise vehicle’s counting and high recall and precision are obtain for vehicle’s behavior and traffic analysis.

Paper Nr: 282
Title:

Face Verification using LBP Feature and Clustering

Authors:

Chenqi Wang, Kevin Lin and Yi-Ping Hung

Abstract: In this paper, we present a mechanism to extract certain special faces—LBP-Faces, which are designed to represent different kinds of faces around the world, and utilize them as the basis to verify other faces. In particular, we show how our idea can integrate with Local Binary Pattern (LBP) and improve its performance. Other than most of the previous LBP-variant approaches, which, no matter try to improve coding mechanism or optimize the neighbourhood sizes, first divide a face into patch-level regions (e.g. 7×7 patches), concatenating histograms calculated in each patch to derive a rather long dimension vector, and then apply PCA to implement dimension reduction, our work use original LBP histograms, trying to retain the major properties such as discriminability and invariance, but in a much bigger component-level region (we divide faces into 7 components). In each component, we cluster LBP descriptors—in the form of histograms to derive N clustering centroids, which we define as LBP-Faces. Then, to any input face, we calculate its similarities with all these N LBP-Faces and use the similarities as final features to verify the face. It looks like we project the faces image into a new feature space—LBP-Faces space. The intuition within it is that when we depict an unknown face, we are prone to use description such as how likely the face’s eye or nose is to an known one. Result of our experiment on the Labeled Face in Wild (LFW) database shows that our method outperforms LBP in face verification.

Paper Nr: 305
Title:

Local Texton Dissimilarity with Applications on Biomass Classification

Authors:

Radu Tudor Ionescu, Andreea-Lavinia Popescu, Dan Popescu and Marius Popescu

Abstract: Texture classification, texture synthesis, or similar tasks are an active topic in computer vision and pattern recognition. This paper aims to present a novel texture dissimilarity measure based on textons, namely the Local Texton Dissimilarity (LTD), inspired from (Dinu et al., 2012). Textons are represented as a set of features extracted from image patches. The proposed dissimilarity measure shows its application on biomass type identification. A new data set of biomass texture images is provided by this work, which is available at http://biomass.herokuapp.com. Images are separated into three classes, each one representing a type of biomass. The biomass type identification and quality assessment is of great importance when one in the biomass industry needs to produce another energy product, such as biofuel, for example. Two more experiments are conducted on popular texture classification data sets, namely Brodatz and UIUCTex. The proposed method benefits from a faster computational time compared to (Dinu et al., 2012) and a better accuracy when used for texture classification. The performance level of the machine learning methods based on LTD is comparable to the state of the art methods.

Paper Nr: 310
Title:

A Fast Leaf Recognition Algorithm based on SVM Classifier and High Dimensional Feature Vector

Authors:

Cecilia Di Ruberto and Lorenzo Putzu

Abstract: Plants are fundamental for human beings, so it's very important to catalog and preserve all the plants species. Identifying an unknown plant species is not a simple task. Automatic image processing techniques based on leaves recognition can help to find the best features useful for plant representation and classification. Many methods present in literature use only a small and complex set of features, often extracted from the binary images or the boundary of the leaf. In this work we propose a leaf recognition method which uses a new features set that incorporates shape, color and texture features. A total of 138 features are extracted and used for training of a SVM model. The method has been tested on Flavia dataset, showing excellent performance both in terms of accuracy that often reaches 100\%, and in terms of speed, less than a second to process and extract features from an image.

Paper Nr: 315
Title:

Automatic Analysis of In-the-Wild Mobile Eye-tracking Experiments using Object, Face and Person Detection

Authors:

Stijn De Beugher, Geert Brône and Toon Goedemé

Abstract: In this paper we present a novel method for the automatic analysis of mobile eye-tracking data in natural environments. Mobile eye-trackers generate large amounts of data, making manual analysis very time-consuming. Available solutions, such as marker-based analysis minimize the manual labour but require experimental control, making real-life experiments practically unfeasible. We present a novel method for processing this mobile eye-tracking data by applying object, face and person detection algorithms. Furthermore we present a temporal smoothing technique to improve the detection rate and we trained a new detection model for occluded person and face detections. This enables the analysis to be performed on the object level rather than the traditionally used coordinate level. We present speed and accuracy results of our novel detection scheme on challenging, large-scale real-life experiments.

Paper Nr: 328
Title:

Contour Localization based on Matching Dense HexHoG Descriptors

Authors:

Yuan Liu and Paul Siebert

Abstract: The ability to detect and localize an object of interest from a captured image containing a cluttered background is an essential function for an autonomous robot operating in an unconstrained environment. In this paper, we present a novel approach to refining the pose estimate of an object and directly labelling its contours by dense local feature matching. We perform this task using a new image descriptor we have developed called the HexHoG. Our key novel contribution is the formulation of HexHoG descriptors comprising hierarchical groupings of rotationally invariant (S)HoG fields, sampled on a hexagonal grid. These HexHoG groups are centred on detected edges and therefore sample the image relatively densely. This formulation allows arbitrary levels of rotation-invariant HexHoG grouped descriptors to be implemented efficiently by recursion. We present the results of an evaluation based on the ALOI image dataset which demonstrates that our proposed approach can significantly improve an initial pose estimation based on image matching using standard SIFT descriptors. In addition, this investigation presents promising contour labelling results based on processing 2892 images derived from the 1000 image ALOI dataset.

Paper Nr: 344
Title:

Effortless Scanning of 3D Object Models by Boundary Aligning and Stitching

Authors:

Susana Brandão, João P. Costeira and Manuela Veloso

Abstract: We contribute a novel algorithm for the digitation of complete 3D object models that requires little preparation effort from the user. Notably, the presented algorithm, Joint Alignment and Stitching of Non-Overlapping Meshes (JASNOM), completes 3D object models by aligning and stitching two 3D meshes by the boundaries and does not require any previous registration between them. JASNOM only requirement is the lack of overlap between meshes, which is simple to achieve in most man made object. JASNOM takes advantage that both meshes can only be connected by their boundary to reframe the alignment problem as a search of the best assignment between boundary vertices. To make the problem tractable, JASNOM reduces the search space considerably by imposing strong constraints on valid assignments that transform the original combinatorial problem into a discrete linear problem. By not requiring previous camera registration and by not depending on shape features, JASNOM contributions range from quick modeling of 3D objects to hole filling in meshes.

Paper Nr: 387
Title:

Audiovisual Data Fusion for Successive Speakers Tracking

Authors:

Quentin Labourey, Olivier Aycard, Denis Pellerin and Michele Rombaut

Abstract: In this paper, a human speaker tracking method on audio and video data is presented. It is applied to conversation tracking with a robot. Audiovisual data fusion is performed in a two-steps process. Detection is performed independently on each modality: face detection based on skin color on video data and sound source localization based on the time delay of arrival on audio data. The results of those detection processes are then fused thanks to an adaptation of bayesian filter to detect the speaker. The robot is able to detect the face of the talking person and to detect a new speaker in a conversation.

Paper Nr: 388
Title:

Motion Characterization of a Dynamic Scene

Authors:

Arun Balajee Vasudevan, Srikanth Muralidharan, Shiva Pratheek Chintapalli and Shanmuganathan Raman

Abstract: Given a video, there are many algorithms to separate static and dynamic objects present in the scene. The proposed work is focused on classifying the dynamic objects further as having either repetitive or non-repetitive motion. In this work, we propose a novel approach to achieve this challenging task by processing the optical flow fields corresponding to the video frames of a dynamic natural scene. We design an unsupervised learning algorithm which uses functions of the flow vectors to design the feature vector. The proposed algorithm is shown to be effective in classifying a scene into static, repetitive, and non-repetitive regions. The proposed approach finds significance in various vision and computational photography tasks such as video editing, video synopsis, and motion magnification.

Paper Nr: 399
Title:

Graph Cut and Image Segmentation using Mean Cut by Means of an Agglomerative Algorithm

Authors:

Elaine Ayumi Chiba, Marco Antonio Garcia Carvalho and André Luís Costa

Abstract: Graph partitioning, or graph cut, has been studied by several authors as a tool for image segmentation. It refers to partitioning a graph into several subgraphs such that each of them represents a meaningful object of interest in the image. In this work we propose a hierarchical agglomerative clustering algorithm driven by the cut and mean cut criteria. Some preliminary experiments were performed using the benchmark of Berkeley BSDS500 with promising results.

Posters
Paper Nr: 8
Title:

Image Analysis through Shifted Orthogonal Polynomial Moments

Authors:

Rajarshi Biswas and Sambhunath Biswas

Abstract: Image analysis is significant from the standpoint of image description. A well described image has merits in different research areas, e.g., image compression, machine learning, computer vision etc. This paper is an attempt to analyze graylevel images through shifted orthogonal polynomial moments, computed on a discrete disc. This removes the difficulty of computing the moments on an analytic disc. Excellent rotational invariance as well as illumination invariance is observed.

Paper Nr: 12
Title:

Contour based Split and Merge Segmentation and Pre-classification of Zooplankton in Very Large Images

Authors:

Enrico Gutzeit, Christian Scheel, Tim Dolereit and Matthias Rust

Abstract: Zooplankton is an important component in the water ecosystem and food chain. To understand the influence of zooplankton on the ecosystem a data collection is necessary. In research the automatic image based recognition of zooplankton is of growing interest. Several systems have been developed for zooplankton recognition on low resolution images. For large images approaches are seldom. Images of this size easily exceed the main memory of standard computers. Our novel automatic segmentation approach is able to handle these large images. We developed a contour based Split & Merge approach for segmentation and, to reduce the nonzooplankton segments, combine it with a pre-classification of the segments in reference to their shape. The latter includes a detection of quasi round segments and a novel one for thin segments. Experiment results on several huge images show that we are able to handle this huge images satisfactory.

Paper Nr: 13
Title:

Experimental Comparison of Vasculature Segmentation Methods

Authors:

Yuchun Ding and Li Bai

Abstract: Vessel segmentation algorithms play a very important role in vascular disease diagnosis and prediction. Current vessel segmentation research uses mostly images of large vessels, which are relatively easy to extract, but segmenting microvasculature is more challenging and very important for analysing vascular disease such as Alzheimer’s Diseases. The aim of this paper is to report experimental results of several common vessel image segmentation methods. Retinal vessel image database DRIVE is used for 2D experiments and a micro-CT image is used for 3D experiments.

Paper Nr: 20
Title:

Image Registration based on Edge Dominant Corners

Authors:

Rabih Al Nachar, Elie Inaty, Patrick J. Bonnin and Yasser Alayli

Abstract: This paper presents a new algorithm for image registration working on an image sequence using dominant corners located on the image's edges under the assumption that the deformation between the successive images can be modeled by an affine transformation. To guarantee this assumption, the time interval between acquired images should be small like the time interval in a video sequence. In the edge image, dominant corners are extracted per linked contour and form a polygon that best approximates the current linked contour. The number of these dominant corners per contour is derived automatically given an approximation error. These dominant corners are shown to be very repeatable under affinity transformation. Then, a Primitive is constructed by four dominant corners. The invariant measure that characterizes each primitive is the ratio of areas of two triangles constructed by two triplets selected from these four corners.

Paper Nr: 35
Title:

A Recursive Approach For Multiclass Support Vector Machine - Application to Automatic Classification of Endomicroscopic Videos

Authors:

Alexis Zubiolo, Grégoire Malandain, Barbara André and Éric Debreuve

Abstract: The two classical steps of image or video classification are: image signature extraction and assignment of a class based on this image signature. The class assignment rule can be learned from a training set composed of sample images manually classified by experts. This is known as supervised statistical learning. The well-known Support Vector Machine (SVM) learning method was designed for two classes. Among the proposed extensions to multiclass (three classes or more), the one-versus-one and one-versus-all approaches are the most popular ones. This work presents an alternative approach to extending the original SVM method to multiclass. A tree of SVMs is built using a recursive learning strategy, achieving a linear worst-case complexity in terms of number of classes for classification. During learning, at each node of the tree, a bi-partition of the current set of classes is determined to optimally separate the current classification problem into two sub-problems. Rather than relying on an exhaustive search among all possible subsets of classes, the partition is obtained by building a graph representing the current problem and looking for a minimum cut of it. The proposed method is applied to classification of endomicroscopic videos and compared to classical multiclass approaches.

Paper Nr: 161
Title:

Energy based Descriptors and their Application for Car Detection

Authors:

Radovan Fusek, Eduard Sojka, Karel Mozdřeň and Milan Šurkala

Abstract: In this paper, we propose a novel technique for object description. The proposed method is based on investigation of energy distribution (in the image) that describes the properties of objects. The energy distribution is encoded into a vector of features and the vector is then used as an input for the SVM classifier. Generally, the technique can be used for detecting arbitrary objects. In this paper, however, we demonstrate the robustness of the proposed descriptors for solving the problem of car detection. Compared with the state-of-the-art descriptors (e.g. HOG, Haar-like features), the proposed approach achieved better results, especially from the viewpoint of dimensionality of the feature vector; the proposed approach is able to successfully describe the objects of interest with a relatively small set of numbers without the use of methods for the reduction of feature vector.

Paper Nr: 173
Title:

Efficient Inference of Spatial Hierarchical Models

Authors:

Jan Mačák and Ondřej Drbohlav

Abstract: The long term goal of artificial intelligence and computer vision is to be able to build models of the world automatically and to use them for interpretation of new situations. It is natural that such models are efficiently organized in a hierarchical manner; a model is build by sub-models, these sub-models are again build of another models, and so on. These building blocks are usually shareable; different objects may consist of the same components. In this paper, we describe a hierarchical probabilistic model for visual domain and propose a method for its efficient inference based on data partitioning and dynamic programming. We show the behaviour of the model, which is in this case made manually, and inference method on a controlled yet challenging dataset consisting of rotated, scaled and occluded letters. The experiments show that the proposed model is robust to all above-mentioned aspects.

Paper Nr: 210
Title:

Saliency Detection in Images using Graph-based Rarity, Spatial Compactness and Background Prior

Authors:

Sudeshna Roy and Sukhendu Das

Abstract: Bottom-up saliency detection techniques extract salient regions in an image while free-viewing the image. We have approached the problem with three different low-level cues– graph based rarity, spatial compactness and background prior. First, the image is broken into similar colored patches, called superpixels. To measure rarity we represent the image as a graph with superpixels as node and exponential color difference as the edge weights between the nodes. Eigenvectors of the Laplacian of the graph are then used, similar to spectral clustering (Ng et al., 2001). Each superpixel is associated with a descriptor formed from these eigenvectors and rarity or uniqueness of the superpixels are found using these descriptors. Spatial compactness is computed by combining disparity in color and spatial distance between superpixels. Concept of background prior is implemented by finding the weighted Mahalanobis distance of the superpixels from the statistically modeled mean background color. These cues in combination gives the proposed saliency map. Experimental results demonstrate that our method outperforms many of the recent state-of-the-art methods both in terms of accuracy and speed.

Paper Nr: 223
Title:

Liquid Crystal Image Analysis by Image Descriptors

Authors:

Guilherme Enoc Egas de Carvalho, Franklin César Flores, Fernando Carlos Messias Freire and Anderson Reginaldo Sampaio

Abstract: Liquid crystals are substances with high impact technological, new substances have been discovered and the properties of these materials need to be examined. When viewed under a microscope using a polarized light source, different liquid crystal phases will appear to have distinct textures and colors. The use of digital image processing and computer vision is being initialized in the analysis of these materials. The goal of this work is to propose methods, based on visual descriptors, which are able to identify phase transitions and classify phases in liquid crystals from a sequence of images.

Paper Nr: 232
Title:

Automatic Detection of MEO Satellite Streaks from Single Long Exposure Astronomic Images

Authors:

Anca Ciurte and Radu Danescu

Abstract: Nowadays, there is an increased interest in achieving an accurate surveillance of the sky, since the number of objects in Earth’s orbit (active satellites and debris) is continuously increasing. The satellites constantly need to be supervised in order to notice their deviations from their trajectories and update their coordinates. This paper presents a new method for satellite detection in 2D astronomic images acquired with a cheap, easy to set up optical surveillance system. The proposed method use the Radon Transform in order to identify satellite strikes in images followed by a set of decision rules to decide whether the streak is a satellite or not. The method was tested on multiple sequences of astronomic images, and was found to have a very high detection rate, along with a very low false positive rate.

Paper Nr: 253
Title:

Hand Pose Recognition by using Masked Zernike Moments

Authors:

JungSoo Park, Hyo-Rim Choi, JunYoung Kim and TaeYong Kim

Abstract: In this paper we present a novel way of applying Zernike moments for image matching. Zernike moments are obtained from projecting image information under a circumscribed circle to Zernike basis function. However, the problem is that the power of discrimination may be reduced because hand images include lots of overlapped information due to their shape characteristic. On the other hand, in the pose discrimination shape information of hands excluding the overlapped area can increase the power of discrimination. In order to solve the overlapped information problem, we present a way of applying subtraction masks. Internal mask R1 eliminates overlapped information in hand images, while external mask R2 weighs outstanding features of hand images. Mask R3 combines the results from the image masked by R1 and the image masked by R2. The moments obtained by R3 mask increase the accuracy of discrimination for hand poses, which is shown in experiments by comparing conventional methods.

Paper Nr: 267
Title:

Analysis of Widely-used Descriptors for Finger-vein Recognition

Authors:

Fariba Yousefi, Erdal Sivri, Ozgur Kaya, Selma Suloglu and Sinan Kalkan

Abstract: For finger-vein recognition, many successful methods, such as Line Tracking (LT), Maximum Curvature (MC) and Wide Line Detector (WL), have been proposed. Among these, LT has a very slow matching and feature-extraction phase, and LT, MC and WL are translation and rotation dependent. Moreover, we show in the paper, they are affected by noise. To overcome these drawbacks, we propose using popular feature descriptors widely used for several Computer Vision or Pattern Recognition (CVPR) problems in the literature. The CVPR descriptors we test include Histogram of Oriented Gradients (HOG), Fourier Descriptors (FD), Zernike Moments (ZM), Local Binary Patterns (LBP) and Global Binary Patterns (GBP), which have not been applied to the finger-vein recognition problem before. We compare these descriptors against LT, MC, and WL and evaluate their running times, performance and resilience against noise, rotation and translation. We report that the LT and WL methods accuracy are comparable to each other and WL gives the best accuracy, LT method’s speed is the slowest. Our results indicate that WL can be used together with ZM and GBP in case of rotation and noise, respectively.

Paper Nr: 291
Title:

Non-rigid Surface Registration using Cover Tree based Clustering and Nearest Neighbor Search

Authors:

Manal H. Alassaf, Yeny Yim and James K. Hahn

Abstract: We propose a novel non-rigid registration method that computes the correspondences of two deformable surfaces using the cover tree. The aim is to find the correct correspondences without landmark selection and to reduce the computational complexity. The source surface S is initially aligned to the target surface T to generate a cover tree from the densely distributed surface points. The cover tree is constructed by taking into account the positions and normal vectors of the points and used for hierarchical clustering and nearest neighbor search. The cover tree based clustering divides the two surfaces into several clusters based on the geometric features, and each cluster on the source surface is transformed to its corresponding cluster on the target. The nearest neighbor search from the cover tree reduces the search space for correspondence computation, and the source surface is deformed to the target by optimizing the point pairs. The correct correspondence of a given source point is determined by choosing one target point with the best correspondence measure from the k nearest neighbors. The proposed energy function with Jacobian penalty allows deforming the surface accurately and with less deformation folding.

Paper Nr: 301
Title:

Unsupervised Segmentation of Hyperspectral Images based on Dominant Edges

Authors:

Sangwook Lee, Sanghun Lee and Chulhee Lee

Abstract: In this paper, we propose a new unsupervised segmentation method for hyperspectral images based on dominant edge information. In the proposed algorithm, we first apply the principal component analysis and select the dominant eigenimages. Then edge operators and the histogram equalizer are applied to the selected eigenimages, which produces edge images. By combining these edge images, we obtain a binary edge image. Morphological operations are then applied to these binary edge image to remove erroneous edges. Experimental results show that the proposed algorithm produced satisfactory results without any user input.

Paper Nr: 311
Title:

Statistical Features for Image Retrieval - A Quantitative Comparison

Authors:

Cecilia Di Ruberto and Giuseppe Fodde

Abstract: In this paper we present a comparison between various statistical descriptors and analyze their goodness in classifying textural images. The chosen statistical descriptors have been proposed by Tamura, Battiato and Haralick. In this work we also test a combination of the three descriptors for texture analysis. The databases used in our study are the well-known Brodatz’s album and DDSM(Heath et al., 1998). The computed features are classified using the Naive Bayes, the RBF, the KNN, the Random Forest and Random Tree models. The results obtained from this study show that we can achieve a high classification accuracy if the descriptors are used all together.

Paper Nr: 312
Title:

Event Clustering of Lifelog Image Sequence using Emotional and Image Similarity Features

Authors:

Photchara Ratsamee, Yasushi Mae, Masaru Kojima, Mitsuhiro Horade, Kazuto Kamiyama and Tatsuo Arai

Abstract: Lifelog image clustering is the process of grouping images into events based on image similarities. Until now, groups of images with low variance can be easily clustered, but clustering images with high variance is still a problem. In this paper, we challenge the problem of high variance, and present a methodology to accurately cluster images into their corresponding events. We introduce a new approach based on rankorder distance techniques using a combination of image similarity and an emotional feature measured from a biosensor. We demonstrate that emotional features along with rank-order distance based clustering can be used to cluster groups of images with low, medium, and high variance. Experimental evidence suggests that compared to average clustering precision rate (65.2%) from approaches that only consider image visual features, our technique achieves a higher precision rate (85.5%) when emotional features are integrated.

Paper Nr: 319
Title:

High Definition Visual Attention based Video Summarization

Authors:

Yiming Qian and Matthew Kyan

Abstract: A High Definition visual attention based video summarization algorithm is proposed to extract feature frames and create a video summary. It uses colour histogram shot detection algorithm to separate the video into shots, then applies a novel high definition visual attention algorithm to construct a saliency map for each frame. A multivariate mutual information algorithm is applied to select a feature frame to represent each shot. Finally, those feature frames are processed by a self-organizing map to remove the redundant frames. The algorithm was assessed against manual key frame summaries presented with tested datasets from www.open-video.org. Of the frames selected by the algorithm, 27.8% to 68.1% were in agreement with the manual frame summaries depending on the category and length of the video.

Paper Nr: 322
Title:

Optimization of Image Interpolation based on Nearest Neighbour Algorithm

Authors:

Olivier Rukundo and B. T. Maharaj

Abstract: This paper proposes an optimization scheme for the image interpolation algorithms, in particular the bilinear algorithm. The only original point is a decision step in which it is decided whether the four neighbouring pixels have the same value and if so the conventional bilinear interpolation is replaced by a nearest neighbour interpolation. The experimental results corroborated the efficiency of the proposed scheme over conventional bilinear and showed improvements in terms of speed and quality, especially in case where images with less grain textures have been interpolated.

Paper Nr: 326
Title:

Evaluation of Color Spaces for Robust Image Segmentation

Authors:

Alexander Jungmann, Jan Jatzkowski and Bernd Kleinjohann

Abstract: In this paper, we evaluate the robustness of our color-based segmentation approach in combination with different color spaces, namely RGB, L*a*b*, HSV, and log-chromaticity (LCCS). For this purpose, we describe our deterministic segmentation algorithm including its gradually transformation of pixel-precise image data into a less error-prone and therefore more robust statistical representation in terms of moments. To investigate the robustness of a specific segmentation setting, we introduce our evaluation framework that directly works on the statistical representation. It is based on two different types of robustness measures, namely relative and absolute robustness. While relative robustness measures stability of segmentation results over time, absolute robustness measures stability regarding varying illumination by comparing results with ground truth data. The significance of these robustness measures is shown by evaluating our segmentation approach with different color spaces. For the evaluation process, an artificial scene was chosen as representative for application scenarios based on artificial landmarks.

Paper Nr: 352
Title:

An Investigation on Local Wrinkle-based Extractor of Age Estimation

Authors:

Choon-Ching Ng, Moi Hoon Yap, Nicholas Costen and Baihua Li

Abstract: Research related to age estimation using face images has become increasingly important due to its potential use in various applications such as age group estimation in advertising and age estimation in access control. In contrast to other facial variations, age variation has several unique characteristics which make it a challenging task. As we age, the most pronounced facial changes are the appearance of wrinkles (skin creases), which is the focus of ageing research in cosmetic and nutrition studies. This paper investigates an algorithm for wrinkle detection and the use of wrinkle data as an age predictor. A novel method in detecting and classifying facial age groups based on a local wrinkle-based extractor (LOWEX) is introduced. First, each face image is divided into several convex regions representing wrinkle distribution areas. Secondly, these areas are analysed using a Canny filter and then concatenated into an enhanced feature vector. Finally, the face is classified into an age group using a supervised learning algorithm. The experimental results show that the accuracy of the proposed method is 80% when using FG-NET dataset. This investigation shows that local wrinkle-based features have great potential in age estimation. We conclude that wrinkles can produce a prominent ageing descriptor and identify some future research challenges.

Paper Nr: 361
Title:

Uncertainty Fusion based Object Recognition and Tracking in Maritime Scenes using Spatiotemporal Active Contours

Authors:

Ikhlef Bechar, Frederic Bouchara, Thibault Lelore, Vincente Guis and Michel Grimaldi

Abstract: This article addresses the problem of near real time video analysis of a maritime scene using a (moving) airborne RGB video camera in the goal of detecting and eventually recognizing a target maritime vessel. This is a very challenging problem mainly due to the high level of uncertainty of a maritime scene including a dynamic and noisy background, camera’s and target’s motions, and broad variability of background’s versus target’s appearances. We propose an approach which attempts to combine several types of spatiotemporal uncertainty in a single probabilistic framework. This allows to achieve a likelihood ratio with respect to any possible spatiotemporal configuration of the 2D+T video volume. Using the MAP estimation criterion, such a problem can be recast as as an energy minimization problem that we solve efficiently using a spatiotemporal active contour approach. We demonstrate the feasibility of the proposed approach using real maritime videos.

Paper Nr: 369
Title:

General Purpose Segmentation for Microorganisms in Microscopy Images

Authors:

S. N. Jensen, R. Irani, T. B. Moeslund and Christian Rankl

Abstract: In this paper, we propose an approach for achieving generalized segmentation of microorganisms in microscopy images. It employs a pixel-wise classification strategy based on local features. Multilayer perceptrons are utilized for classification of the local features and is trained for each specific segmentation problem using supervised learning. This approach was tested on five different segmentation problems in bright field, differential interference contrast, fluorescence and laser confocal scanning microscopy. In all instance good results were achieved with the segmentation quality scoring a Dice coefficient of 0.831 or higher.

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 24
Title:

Can 3D Shape of the Face Reveal your Age?

Authors:

Baiqiang Xia, Boulbaba Ben Amor, Mohamed Daoudi and Hassen Drira

Abstract: Age reflects the continuous accumulation of durable effects from the past since birth. Human faces deform with time non-inversely and thus contains their aging information. In addition to its richness with anatomy information, 3D shape of faces could have the advantage of less dependent on pose and independent of illumination, while it hasn’t been noticed in literature. Thus, in this work we investigate the age estimation problem from 3D shape of the face. With several descriptions grounding on Riemannian shape analysis of facial curves, we first extracted features from ideas of face Averageness, face Symmetry, its shape variations with Spatial and Gradient descriptors. Then, using the Random Forest-based Regression, experiments are carried out following the Leaving-One-Person-Out (LOPO) protocol on the FRGCv2 dataset. The proposed approach performs with a Mean Absolute Error (MAE) of 3:29 years using a gender-general test protocol. Finally, with the gender-specific experiments, which first separate the 3D scans into Female and Male subsets, then train and test on each gender specific subset in LOPO fashion, we improves the MAE to 3:15 years, which confirms the idea that the aging effect differs with gender.

Paper Nr: 36
Title:

Exploiting Scene Cues for Dropped Object Detection

Authors:

Adolfo Lopez-Mendez, Florent Monay and Jean-Marc Odobez

Abstract: This paper presents a method for the automated detection of dropped objects in surveillance scenarios, which is a very important task for abandoned object detection. Our method works in single views and exploits prior information of the scene, such as geometry or the fact that a number of false alarms are caused by known objects, such as humans. The proposed approach builds dropped object candidates by analyzing blobs obtained with a multi-layer background subtraction approach. The created dropped object candidates are then characterized both by appearance and by temporal aspects such as the estimated drop time. Next, we incorporate prior knowledge about the possible sizes and positions of dropped objects through an efficient filtering approach. Finally, the output of a human detector is exploited over in order to filter out static objects that are likely to be humans that remain still. Experimental results on the publicly available PETS2006 datasets and on several long sequences recorded in metro stations show the effectiveness of the proposed approach. Furthermore, our approach can operate in real-time.

Paper Nr: 38
Title:

Egocentric Activity Recognition using Histograms of Oriented Pairwise Relations

Authors:

Ardhendu Behera, Matthew Chapman, Anthony G. Cohn and David C. Hogg

Abstract: This paper presents an approach for recognising activities using video from an egocentric (first-person view) setup. Our approach infers activity from the interactions of objects and hands. In contrast to previous approaches to activity recognition, we do not require to use an intermediate such as object detection, pose estimation, etc. Recently, it has been shown that modelling the spatial distribution of visual words corresponding to local features further improves the performance of activity recognition using the bag-of-visual words representation. Influenced and inspired by this philosophy, our method is based on global spatio-temporal relationships between visual words. We consider the interaction between visual words by encoding their spatial distances, orientations and alignments. These interactions are encoded using a histogram that we name the Histogram of Oriented Pairwise Relations (HOPR). The proposed approach is robust to occlusion and background variation and is evaluated on two challenging egocentric activity datasets consisting of manipulative task. We introduce a novel representation of activities based on interactions of local features and experimentally demonstrate its superior performance in comparison to standard activity representations such as bag-of-visual words.

Paper Nr: 80
Title:

Revisiting Pose Estimation with Foreshortening Compensation and Color Information

Authors:

Achint Setia, Anoop R. Katti and Anurag Mittal

Abstract: This paper addresses the problem of upper body pose estimation. The task is to detect and estimate 2D human configuration in static images for six parts: head, torso, and left-right upper and lower arms. The common approach to solve this has been the Pictorial Structure method (Felzenszwalb and Huttenlocher, 2005). We present this as a graphical model inference problem and use the loopy belief propagation algorithm for inference. When a human appears in fronto-parallel plane, fixed size part detectors are sufficient and give reliable detection. But when parts like lower and upper arms move out of the plane, we observe foreshortening and the part detectors become erroneous. We propose an approach that compensates foreshortening in the upper and lower arms, and effectively prunes the search state space of each part. Additionally, we introduce two extra pairwise constraints to exploit the color similarity information between parts during inference to get better localization of the upper and lower arms. Finally, we present experiments and results on two challenging datasets (Buffy and ETHZ Pascal), showing improvements on the lower arms accuracy and comparable results for other parts.

Paper Nr: 88
Title:

Action Categorization based on Arm Pose Modeling

Authors:

Chongguo Li and Nelson H. C. Yung

Abstract: This paper proposes a novel method to categorize human action based on arm pose modeling. Traditionally, human action categorization relies much on the extracted features from video or images. In this research, we exploit the relationship between action categorization and arm pose modeling, which can be visualized in a graphic model. Given visual observations, both states can be estimated by maximum a posteriori (MAP) in that arm poses are first estimated under the hypothesis of action category by dynamic programming, and then action category hypothesis is validated by soft-max model based on the estimated arm poses. The prior distribution for every action is estimated by a semi-parametric estimator in advance, and pixel-based dense features including LBP, SIFT, colour-SIFT, and texton are utilized to enhance the likelihood computation by the joint Adaboosting algorithm. The proposed method has been evaluated on videos of walking, waving and jog from the HumanEva-I dataset. It is found to have arm pose modeling performance better than the method of mixtures of parts, and action categorization success rate of 96.69%.

Paper Nr: 126
Title:

Learning Semantic Attributes via a Common Latent Space

Authors:

Ziad Al-Halah, Tobias Gehrig and Rainer Stiefelhagen

Abstract: Semantic attributes represent an adequate knowledge that can be easily transferred to other domains where lack of information and training samples exist. However, in the classical object recognition case, where training data is abundant, attribute-based recognition usually results in poor performance compared to methods that used image features directly. We introduce a generic framework that boosts the performance of semantic attributes considerably in traditional classification and knowledge transfer tasks, such as zero-shot learning. It incorporates the discriminative power of the visual features and the semantic meaning of the attributes by learning a common latent space that joins both spaces. We also specifically account for the presence of attribute correlations in the source dataset to generalize more efficiently across domains. Our evaluation of the proposed approach on standard public datasets shows that it is not only simple and computationally efficient but also performs remarkably better than the common direct attribute model.

Paper Nr: 139
Title:

Absolute Spatial Context-aware Visual Feature Descriptors for Outdoor Handheld Camera Localization - Overcoming Visual Repetitiveness in Urban Environments

Authors:

Daniel Kurz, Peter Georg Meier, Alexander Plopski and Gudrun Klinker

Abstract: We present a framework that enables 6DoF camera localization in outdoor environments by providing visual feature descriptors with an Absolute Spatial Context (ASPAC). These descriptors combine visual information from the image patch around a feature with spatial information, based on a model of the environment and the readings of sensors attached to the camera, such as GPS, accelerometers, and a digital compass. The result is a more distinct description of features in the camera image, which correspond to 3D points in the environment. This is particularly helpful in urban environments containing large amounts of repetitive visual features. Additionally, we describe the first comprehensive test database for outdoor handheld camera localization comprising of over 45,000 real camera images of an urban environment, captured under natural camera motions and different illumination settings. For all these images, the dataset not only contains readings of the sensors attached to the camera, but also ground truth information on the full 6DoF camera pose, and the geometry and texture of the environment. Based on this dataset, which we have made available to the public, we show that using our proposed framework provides both faster matching and better localization results compared to state-of-the-art methods.

Paper Nr: 146
Title:

Combining Dense Features with Interest Regions for Efficient Part-based Image Matching

Authors:

Priyadarshi Bhattacharya and Marina L. Gavrilova

Abstract: One of the most popular approaches for object recognition is bag-of-words which represents an image as a histogram of the frequency of occurrence of visual words. But it has some disadvantages. Besides requiring computationally expensive geometric verification to compensate for the lack of spatial information in the representation, it is particularly unsuitable for sub-image retrieval problems because any noise, background clutter or other objects in vicinity influence the histogram representation. In our previous work, we addressed this issue by developing a novel part-based image matching framework that utilizes spatial layout of dense features within interest regions to vastly improve recognition rates for landmarks. In this paper, we improve upon the previously published recognition results by more than 12% and achieve significant reductions in computation time. A region of interest (ROI) selection strategy is proposed along with a new voting mechanism for ROIs. Also, inverse document frequency weighting is introduced in our image matching framework for both ROIs and dense features inside the ROIs. We provide experimental results for various vocabulary sizes on the benchmark Oxford 5K and INRIA Holidays datasets.

Paper Nr: 165
Title:

Active Learning in Social Context for Image Classification

Authors:

Elisavet Chatzilari, Spiros Nikolopoulos, Yiannis Kompatsiaris and Josef Kittler

Abstract: Motivated by the widespread adoption of social networks and the abundant availability of user-generated multimedia content, our purpose in this work is to investigate how the known principles of active learning for image classification fit in this newly developed context. The process of active learning can be fully automated in this social context by replacing the human oracle with the user tagged images obtained from social networks. However, the noisy nature of user-contributed tags adds further complexity to the problem of sample selection since, apart from their informativeness, our confidence about their actual content should be also maximized. The contribution of this work is on proposing a probabilistic approach for jointly maximizing the two aforementioned quantities with a view to automate the process of active learning. Experimental results show the superiority of the proposed method against various baselines and verify the assumption that significant performance improvement cannot be achieved unless we jointly consider the samples’ informativeness and the oracle’s confidence.

Paper Nr: 169
Title:

Surface Area Analysis for People Number Estimation

Authors:

Hiroyuki Arai, Naoki Ito and Yukinobu Taniguchi

Abstract: An important property of surface areas of objects as observed by a calibrated monocular camera is introduced; also improved techniques to apply the property to people number estimation are proposed. Standard surface area (SSA) is defined as the surface area of the reverse projection of an image-pixel onto a plane at specific height in the real world. SSA is calculated for each pixel according to camera calibration parameters. When the target object is bound to a certain plane, for example the floor plane, the sum of SSA along with the foreground pixels of one target object becomes constant. Therefore, simple foreground detection and SSA summation yield the number of target objects. This basic idea was proposed in a prior article, but there were two major limitations. One is that the original model could not be applied to the area directly below the camera. The other is that the silhouette of the target object was limited to a simple rectangle. In this paper we propose improved techniques that remove the limitations. Slant silhouette analysis removes the first limitation, and silhouette decomposition the second. The validity and the effectiveness of the techniques are confirmed by experiments.

Paper Nr: 174
Title:

Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications

Authors:

Markus Schoeler, Simon Christoph Stein, Jeremie Papon, Alexey Abramov and Florentin Woergoetter

Abstract: Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.

Paper Nr: 203
Title:

Joint Learning for Multi-class Object Detection

Authors:

Hamidreza Odabai Fard, Mohamed Chaouch, Quoc-cuong Pham, Antoine Vacavant and Thierry Chateau

Abstract: In practice, multiple objects in images are located by consecutively applying one detector for each class and taking the best confident score. In this work, we propose to show the advantage of grouping similar object classes into a hierarchical structure. While this approach has found interest in image classification, it is not analyzed for the object detection task. Each node in the hierarchy represents one decision line. All the decision lines are learned jointly using a novel problem formulation. Based on experiments using PASCAL VOC 2007 dataset, we show that our approach improves detection performance compared to a baseline approach.

Paper Nr: 216
Title:

Subtasks of Unconstrained Face Recognition

Authors:

Joel Z. Leibo, Qianli Liao and Tomaso Poggio

Abstract: Unconstrained face recognition remains a challenging computer vision problem despite recent exceptionally high results ( ~ 95% accuracy) on the current gold standard evaluation dataset: Labeled Faces in the Wild (LFW). We offer a decomposition of the unconstrained problem into subtasks based on the idea that invariance to identity-preserving transformations is the crux of recognition. Each of the subtasks in the Subtasks of Unconstrained Face Recognition (SUFR) challenge consists of a same-different face-matching problem on a set of 400 individual synthetic faces rendered so as to isolate a specific transformation or set of transformations. We characterized the performance of 9 different models (8 previously published) on each of the subtasks. One notable finding was that the HMAX-C2 feature was not nearly as clutter-resistant as had been suggested by previous publications. Next we considered LFW and argued that it is too easy of a task to continue to be regarded as a measure of progress on unconstrained face recognition. In particular, strong performance on LFW requires almost no invariance, yet it cannot be considered a fair approximation of the outcome of a detection --> alignment pipeline since it does not contain the kinds of variability that realistic alignment systems produce when working on non-frontal faces. We offer a new, more difficult, natural image dataset: SUFR-in-the-Wild (SUFR-W), which we created using a protocol that was similar to LFW, but with a few differences designed to produce more need for transformation invariance. We present baseline results for eight different face recognition systems on the new dataset and argue that it is time to retire LFW and move on to more difficult evaluations for unconstrained face recognition.

Paper Nr: 230
Title:

Classifying and Visualizing Motion Capture Sequences using Deep Neural Networks

Authors:

Kyunghyun Cho and Xi Chen

Abstract: The gesture recognition using motion capture data and depth sensors has recently drawn more attention in vision recognition. Currently most systems only classify dataset with a couple of dozens different actions. Moreover, feature extraction from the data is often computational complex. In this paper, we propose a novel system to recognize the actions from skeleton data with simple, but effective, features using deep neural networks. Features are extracted for each frame based on the relative positions of joints (PO), temporal differences (TD), and normalized trajectories of motion (NT). Given these features a hybrid multi-layer perceptron is trained, which simultaneously classifies and reconstructs input data. We use deep autoencoder to visualize learnt features. The experiments show that deep neural networks can capture more discriminative information than, for instance, principal component analysis can. We test our system on a public database with 65 classes and more than 2,000 motion sequences. We obtain an accuracy above 95% which is, to our knowledge, the state of the art result for such a large dataset.

Paper Nr: 236
Title:

Multi-feature Real Time Pedestrian Detection from Dense Stereo SORT-SGM Reconstructed Urban Traffic Scenarios

Authors:

Ion Giosan and Sergiu Nedevschi

Abstract: In this paper, a real-time system for pedestrian detection in traffic scenes is proposed. It takes the advantage of having a pair of stereo video-cameras for acquiring the image frames and uses a sub-pixel level optimized semi-global matching (SORT-SGM) based stereo reconstruction for computing the dense 3D points map with high accuracy. A multiple paradigm detection module considering 2D, 3D and optical flow information is used for segmenting the candidate obstacles from the scene background. Novel features like texture dissimilarity, humans’ body specific features, distance related measures and speed are introduced and combined in a feature vector with traditional features like HoG score, template matching contour score and dimensions. A random forest (RF) classifier is trained and then applied in each frame for distinguishing the pedestrians from other obstacles based on the feature vector. A k-NN algorithm on the classification results over the last frames is applied for improving the accuracy and stability of the tracked obstacles. Finally, two comparisons are made: first between the classification results obtained by using the new SORT-SGM and the older local matching approach for stereo reconstruction and the second by comparing the different features RF classification results with other classifiers’ results.

Paper Nr: 284
Title:

Image-based Object Classification of Defects in Steel using Data-driven Machine Learning Optimization

Authors:

Fabian Bürger, Christoph Buck, Josef Pauli and Wolfram Luther

Abstract: In this paper we study the optimization process of an object classification task for an image-based steel quality measurement system. The goal is to distinguish hollow from solid defects inside of steel samples by using texture and shape features of reconstructed 3D objects. In order to optimize the classification results we propose a holistic machine learning framework that should automatically answer the question "How well do state-of-the-art machine learning methods work for my classification problem?" The framework consists of three layers, namely feature subset selection, feature transform and classifier which subsequently reduce the data dimensionality. A system configuration is defined by feature subset, feature transform function, classifier concept and corresponding parameters. In order to find the configuration with the highest classifier accuracies, the user only needs to provide a set of feature vectors and ground truth labels. The framework performs a totally data-driven optimization using partly heuristic grid search. We incorporate several popular machine learning concepts, such as Principal Component Analysis (PCA), Support Vector Machines (SVM) with different kernels, random trees and neural networks. We show that with our framework even non-experts can automatically generate a ready for use classifier system with a significantly higher accuracy compared to a manually arranged system.

Paper Nr: 288
Title:

PhotoCluster - A Multi-clustering Technique for Near-duplicate Detection in Personal Photo Collections

Authors:

Vassilios Vonikakis, Amornched Jinda-Apiraksa and Stefan Winkler

Abstract: This paper presents PhotoCluster, a new technique for identifying non-identical near-duplicate images in personal photo collections. Contrary to existing methods, PhotoCluster estimates the probability that a pair of images may be considered near-duplicate. Its main thrust is a multiple clustering step that produces a non-binary near-duplicate probability for each image pair, which exhibits correlation with the average observer opinion. First, PhotoCluster partitions the photolibrary into groups of semantically similar photos, using global features. Then, the multiple clustering step is applied within the images of these groups, using a combination of global and local features. Computationally expensive comparisons between local features are taking place only on a limited part of the library, resulting in a low overall computational cost. Evaluation with two publicly available datasets show that PhotoCluster outperforms existing methods, especially in identifying ambiguous near-duplicate cases.

Paper Nr: 294
Title:

Who is the Hero? - Semi-supervised Person Re-identification in Videos

Authors:

Umar Iqbal, Igor D. D. Curcio and Moncef Gabbouj

Abstract: Given a crowd-sourced set of videos of a crowded public event, this paper addresses the problem of detecting and re-identifying all appearances of every individual in the scene. The persons are ranked according to the frequency of their appearance and the rank of a person is considered as the measure of his/her importance. Grouping appearances of every person from such videos is a very challenging task. This is due to unavailability of prior information or training data, large changes in illumination, huge variations in camera viewpoints, severe occlusions and videos from different photographers. These problems are made tractable by exploiting a variety of visual and contextual cues i.e., appearance, sensor data and co-occurrence of people. A unified framework is proposed for efficient person matching across videos followed by their ranking. Experimental results on two challenging video data sets demonstrate the effectiveness of the proposed algorithm.

Paper Nr: 320
Title:

Detecting Events in Crowded Scenes using Tracklet Plots

Authors:

Pau Climent-Pérez, Alexandre Mauduit, Dorothy N. Monekosso and Paolo Remagnino

Abstract: The main contribution of this paper is a compact representation of the ‘short tracks’ or tracklets present in a time window of a given video input, which allows to analyse and detect different crowd events. To proceed, first, tracklets are extracted from a time window using a particle filter multi-target tracker. After noise removal, the tracklets are plotted into a square image by normalising their lengths to the size of the image. Different histograms are then applied to this compact representation. Thus, different events in a crowd are detected via a Bag-of-words modelling. Novel video sequences, can then be analysed to detect whether an abnormal or chaotic situation is present. The whole algorithm is tested with our own dataset, also introduced in the paper.

Paper Nr: 341
Title:

Impact of Facial Cosmetics on Automatic Gender and Age Estimation Algorithms

Authors:

Cunjian Chen, Antitza Dantcheva and Arun Ross

Abstract: Recent research has established the negative impact of facial cosmetics on the matching accuracy of automated face recognition systems. In this paper, we analyze the impact of cosmetics on automated gender and age estimation algorithms. In this regard, we consider the use of facial cosmetics for (a) gender spoofing where male subjects attempt to look like females and vice versa, and (b) age alteration where female subjects attempt to look younger or older than they actually are. While such transformations are known to impact human perception, their impact on computer vision algorithms has not been studied. Our findings suggest that facial cosmetics can potentially be used to confound automated gender and age estimation schemes.

Paper Nr: 346
Title:

Exploring Residual and Spatial Consistency for Object Detection

Authors:

Hao Wang, Ya Zhang and Zhe Xu

Abstract: Local image features show a high degree of repeatability, while their local appearance usually does not bring enough discriminative pattern to obtain a reliable matching. In this paper, we present a new object matching algorithm based on a novel robust estimation of residual consensus and flexible spatial consistency filter. We evaluate the similarity between different homography model via two-parameter integrated Weibull distribution and inlier probabilities estimates, which can select uncontaminated model to help eliminating outliers. Spatial consistency test was encoded by the geometric relationships of domain knowledge in two directions, which is invariant to scale, rotation, and translation especially robust to the flipped image. Experiment results on nature images with clutter background demonstrate our method effectiveness and robustness.

Short Papers
Paper Nr: 2
Title:

Image Flower Recognition based on a New Method for Color Feature Extraction

Authors:

Amira Ben Mabrouk, Asma Najjar and Ezzeddine Zagrouba

Abstract: In this paper, we present, first, a new method for color feature extraction based on SURF detectors. Then, we proved its efficiency for flower image classification. Therefore, we described visual content of the flower images using compact and accurate descriptors. These features are combined and the learning process is performed using a multiple kernel framework with a SVM classifier. The proposed method has been tested on the dataset provided by the university of oxford and achieved better results than our implementation of the method proposed by Nilsback and Zisserman (Nilsback and Zisserman, 2008) in terms of classification rate and execution time.

Paper Nr: 14
Title:

Driver Drowsiness Estimation from Facial Expression Features - Computer Vision Feature Investigation using a CG Model

Authors:

Taro Nakamura, Akinobu Maejima and Shigeo Morishima

Abstract: We propose a method for estimating the degree of a driver’s drowsiness on the basis of changes in facial expressions captured by an IR camera. Typically, drowsiness is accompanied by drooping eyelids. Therefore, most related studies have focused on tracking eyelid movement by monitoring facial feature points. However, the drowsiness feature emerges not only in eyelid movements but also in other facial expressions. To more precisely estimate drowsiness, we must select other effective features. In this study, we detected a new drowsiness feature by comparing a video image and CG model that are applied to the existing feature point information. In addition, we propose a more precise degree of drowsiness estimation method using wrinkle changes and calculating local edge intensity on faces, which expresses drowsiness more directly in the initial stage.

Paper Nr: 16
Title:

Enhanced Hierarchical Conditional Random Field Model for Semantic Image Segmentation

Authors:

Li-Li Wang, Shan-Shan Zhu and N. H. C. Yung

Abstract: Pairwise and higher order potentials in the Hierarchical Conditional Random Field (HCRF) model play a vital role in smoothing region boundary and extracting actual object contour in the labeling space. However, pairwise potential evaluated by color information has the tendency to over-smooth small regions which are similar to their neighbors in the color space; and the higher order potential associated with multiple segments is prone to produce incorrect guidance to inference, especially for objects having similar features to the background. To overcome these problems, this paper proposes two enhanced potentials in the HCRF model that is capable to abate the over smoothness by propagating the believed labeling from the unary potential and to perform coherent inference by ensuring reliable segment consistency. Experimental results on the MSRC-21 data set demonstrate that the enhanced HCRF model achieves pleasant visual results, as well as significant improvement in terms of both global accuracy of 87.52% and average accuracy of 80.18%, which outperforms other algorithms reported in the literature so far.

Paper Nr: 17
Title:

Discriminant Boosted Dynamic Time Warping and Its Application to Gesture Recognition

Authors:

Tarik Arici, Sait Celebi, Ali Selman Aydin and Talha Tarik Temiz

Abstract: Dynamic time warping (DTW) measures similarity between two data sequences by minimizing an accumulated distance between two sequence samples at each iteration and a cost is computed to assess the level of the similarity. The DTW cost may then be used to assign a sequence to a class if the problem is a classification problem. In machine learning, classification problems are solved using features with good discrimination power, which are generated by exploiting the distribution of data vectors. Linear Discriminant Analysis (LDA) is such a technique and finds discriminative projection directions which are used to generate features as projections of sequence vectors on to these directions. Unfortunately, these techniques are not applicable to warped sequences because the mapping between the test sequences and the training sequences is not known. To solve this problem, we propose a constrained LDA framework that produces direction vectors that repeat unit vectors that have dimensions equal to the dimensions of a single sequence sample. Such projection vectors can be used without knowing the mapping of test sequence vectors to training sequence vectors. Experiment results show that generating features by discriminant analysis improves the performance significantly.

Paper Nr: 22
Title:

Towards Reliable Real-time Person Detection

Authors:

Silviu-Tudor serban, Srinidhi Mukanahallipatna Simha, Vasanth Bathrinarayanan, Etienne Corvee and Francois Bremond

Abstract: We propose a robust real-time person detection system, which aims to serve as solid foundation for developing solutions at an elevated level of reliability. Our belief is that clever handling of input data correlated with efficacious training algorithms are key for obtaining top performance. We introduce a comprehensive training method based on random sampling that compiles optimal classifiers with minimal bias and overfit rate. Building upon recent advances in multi-scale feature computations, our approach attains state-of-the-art accuracy while running at high frame rate.

Paper Nr: 25
Title:

Hidden Conditional Random Fields for Action Recognition

Authors:

Lifang Chen, Nico van der Aa, Robby T. Tan and Remco C. Veltkamp

Abstract: In the field of action recognition, the design of features has been explored extensively, but the choice of action classification methods is limited. Commonly used classification methods like k-Nearest Neighbors and Support Vector Machines assume conditional independency between features. In contrast, Hidden Conditional Random Fields (HCRFs) include the spatial or temporal dependencies of features to be better suited for rich, overlapping features. In this paper, we investigate the performance of HCRF and Max-Margin HCRF and their baseline versions, the root model and Multi-class SVM, respectively, for action recognition on the Weizmann dataset. We introduce the Part Labels method, which uses explicitly the part labels learned by HCRF as a new set of local features. We show that only modelling spatial structures in 2D space is not sufficient to justify the additional complexity of HCRF, MMHCRF or the Part Labels method for action recognition.

Paper Nr: 32
Title:

Application of Dynamic Distributional Clauses for Multi-hypothesis Initialization in Model-based Object Tracking

Authors:

D. Nitti, G. Chliveros, M. Pateraki, L. De Raedt, E. Hourdakis and P. Trahanias

Abstract: In this position paper we propose the use of the Distributional Clauses Particle Filter in conjunction with a model-based 3D object tracking method in monocular camera sequences. We describe the model based object tracking method that is based on contour and edge features for 3D pose relative estimation. We also describe the application of the Distributional Clauses Particle Filter that takes into account inputs from object tracking. We argue that objects’ dynamics can be modeled via probabilistic rules, which makes possible to predict and utilise a pose hypothesis space for fully occluded or ‘invisible’ (hidden-away) objects that may re-appear in the camera field of view. Important issues, such as losing track of the object in a ‘total occlusion’ scenario, are discussed.

Paper Nr: 33
Title:

Subsign Detection and Classification System for Automated Traffic-sign Inventory Systems

Authors:

Lykele Hazelhoff, Ron op het Veld, Ivo Creusen and Peter H. N. de With

Abstract: Road safety is influenced by the accurate placement and visibility of road signs, which are maintained based on inventories of traffic signs. These inventories are created (semi-)automatically from street-level images, based on object detection and classification. These systems often neglect the present complimentary signs (subsigns), although clearly important for the meaning and validity of signs. This paper presents a generic, learning-based approach for both detection and classification of subsigns, which is based on the same principles as the system employed for finding traffic signs and can be used as an extension to automated inventory systems. The system starts with detection of subsigns in a region below each detected sign, followed by analysis of the results obtained for all capturings of the same sign. When a subsign is found, the corresponding pixel regions are extracted and subject to classification. This recognition system is evaluated on 3;104 signs (397 with subsign) identified by an existing inventory system. At a detection rate of 98%, only 757 signs (24:4% of the signs) are labeled as containing a subsign, while 91:4% of the subsigns of a class known to our classifier are also classified correctly.

Paper Nr: 34
Title:

Dictionary based Pooling for Object Categorization

Authors:

Sean Ryan Fanello, Nicoletta Noceti, Giorgio Metta and Francesca Odone

Abstract: It is well known that image representations learned through ad-hoc dictionaries improve the overall results in object categorization problems. Following the widely accepted coding-pooling visual recognition pipeline, these representations are often tightly coupled with a coding stage. In this paper we show how to exploit ad-hoc representations both within the coding and the pooling phases. We learn a dictionary for each object class and then use local descriptors encoded with the learned atoms to guide the pooling operator. We exhaustively evaluate the proposed approach in both single instance object recognition and object categorization problems. From the applications standpoint we consider a classical image retrieval scenario with the Caltech 101, as well as a typical robot vision task with data acquired by the iCub humanoid robot.

Paper Nr: 43
Title:

Temporally Consistent Snow Cover Estimation from Noisy, Irregularly Sampled Measurements

Authors:

Dominic Rüfenacht, Matthew Brown, Jan Beutel and Sabine Süsstrunk

Abstract: We propose a method for accurate and temporally consistent surface classification in the presence of noisy, irregularly sampled measurements, and apply it to the estimation of snow coverage over time. The input imagery is extremely challenging, with large variations in lighting and weather distorting the measurements. Initial snow cover estimations are obtained using a Gaussian Mixture Model of color. To achieve a temporally consistent snow cover estimation, we use a Markov Random Field that penalizes rapid fluctuations in the snow state, and show that the penalty term needs to be quite large, resulting in slow reactivity to changes. We thus propose a classifier to separate good from uninformative images, which allows to use a smaller penalty term. We show that the incorporation of domain knowledge to discard uninformative images leads to better reactivity to changes in snow coverage as well as more accurate snow cover estimations.

Paper Nr: 47
Title:

Toward Object Recognition with Proto-objects and Proto-scenes

Authors:

Fabian Nasse, Rene Grzeszick and Gernot A. Fink

Abstract: In this paper a bottom-up approach for detecting and recognizing objects in complex scenes is presented. In contrast to top-down methods, no prior knowledge about the objects is required beforehand. Instead, two different views on the data are computed: First, a GIST descriptor is used for clustering scenes with a similar global appearance which produces a set of Proto-Scenes. Second, a visual attention model that is based on hiearchical multi-scale segmentation and feature integration is proposed. Regions of Interest that are likely to contain an arbitrary object, a Proto-Object, are determined. These Proto-Object regions are then represented by a Bag-of-Features using Spatial Visual Words. The bottom-up approach makes the detection and recognition tasks more challenging but also more efficient and easier to apply to an arbitrary set of objects. This is an important step toward analyzing complex scenes in an unsupervised manner. The bottom-up knowledge is combined with an informed system that associates Proto-Scenes with objects that may occur in them and an object classifier is trained for recognizing the Proto-Objects. In the experiments on the VOC2011 database the proposed multi-scale visual attention model is compared with current state-of-the-art models for Proto-Object detection. Additionally, the the Proto-Objects are classified with respect to the VOC object set.

Paper Nr: 63
Title:

Large-scale Image Retrieval based on the Vocabulary Tree

Authors:

Bo Cheng, Li Zhuo, Pei Zhang and Jing Zhang

Abstract: In this paper, vocabulary tree based large-scale image retrieval scheme is proposed that can achieve higher accuracy and speed. The novelty of this paper can be summarized as follows. First, because traditional Scale Invariant Feature Transform (SIFT) descriptors are excessively concentrated in some areas of images, the extraction process of SIFT features is optimized to reduce the number. Then, combined with optimized-SIFT, color histogram in Hue, Saturation, Value (HSV) color space is extracted to be another image feature. Moreover, Local Fisher Discriminant Analysis (LFDA) is applied to reduce the dimension of SIFT and color features, which will help to shorten feature-clustering time. Finally, dimension-reduced features are used to generate vocabulary trees which will be used for large-scale image retrieval. The experimental results on several image datasets show that, the proposed method can achieve satisfying retrieval precision.

Paper Nr: 92
Title:

Image Retrieval with Reciprocal and Shared Nearest Neighbors

Authors:

Agni Delvinioti, Hervé Jégou, Laurent Amsaleg and Michael Houle

Abstract: Content-based image retrieval systems typically rely on a similarity measure between image vector representations, such as in bag-of-words, to rank the database images in decreasing order of expected relevance to the query. However, the inherent asymmetry of k-nearest neighborhoods is not properly accounted for by traditional similarity measures, possibly leading to a loss of retrieval accuracy. This paper addresses this issue by proposing similarity measures that use neighborhood information to assess the relationship between images. First, we extend previous work on k-reciprocal nearest neighbors to produce new measures that improve over the original primary metric. Second, we propose measures defined on sets of shared nearest neighbors for reranking the shortlist. Both these methods are simple, yet they significantly improve the accuracy of image search engines on standard benchmark datasets.

Paper Nr: 99
Title:

Learning a Loopy Model Exactly

Authors:

Andreas Christian Müller and Sven Behnke

Abstract: Learning structured models using maximum margin techniques has become an indispensable tool for computer vision researchers, as many computer vision applications can be cast naturally as an image labeling problem. Pixel-based or superpixel-based conditional random fields are particularly popular examples. Typically, neighborhood graphs, which contain a large number of cycles, are used. As exact inference in loopy graphs is NP-hard in general, learning these models without approximations is usually deemed infeasible. In this work we show that, despite the theoretical hardness, it is possible to learn loopy models exactly in practical applications. To this end, we analyze the use of multiple approximate inference techniques together with cutting plane training of structural SVMs. We show that our proposed method yields exact solutions with an optimality guarantees in a computer vision application, for little additional computational cost. We also propose a dynamic caching scheme to accelerate training further, yielding runtimes that are comparable with approximate methods. We hope that this insight can lead to a reconsideration of the tractability of loopy models in computer vision.

Paper Nr: 106
Title:

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Authors:

Hong-Thinh Nguyen, Cécile Barat and Christophe Ducottet

Abstract: The Spatial Pyramid Matching approach has become very popular to model images as sets of local bag-of words. The image comparison is then done region-by-region with an intersection kernel. Despite its success, this model presents some limitations: the grid partitioning is predefined and identical for all images and the matching is sensitive to intra- and inter-class variations. In this paper, we propose a novel approach based on approximate string matching to overcome these limitations and improve the results. First, we introduce a new image representation as strings of ordered bag-of-words. Second, we present a new edit distance specifically adapted to strings of histograms in the context of image comparison. This distance identifies local alignments between subregions and allows to remove sequences of similar subregions to better match two images. Experiments on 15 Scenes and Caltech 101 show that the proposed approach outperforms the classical spatial pyramid representation and most existing concurrent methods for classification presented in recent years.

Paper Nr: 107
Title:

Environment Adaptive Pedestrian Detection using In-vehicle Camera and GPS

Authors:

Daichi Suzuo, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Hiroyuki Ishida and Yoshiko Kojima

Abstract: In recent years, accurate pedestrian detection from in-vehicle camera images is focused to develop a safety driving assistance system. Currently, successful methods are based on statistical learning. However, in such methods, it is necessary to prepare a large amount of training images. Thus, the decrease in the number of training images degrades the detection accuracy. That is, in driving environments with few or no training images, it is difficult to detect pedestrians accurately. Therefore, we propose an approach that collects training images automatically to build classifiers for various driving environments. This is expected to realize highly accurate pedestrian detection by using an appropriate classifier corresponding to the current location. The proposed method consists of three steps; Classification of driving scenes, collection of non-pedestrian images and training of classifiers for each scene class, and associating a scene-class-specific classifier with GPS location information. Through experiments, we confirmed the effectiveness of the method compared to baseline methods.

Paper Nr: 119
Title:

Active Shape Models with SIFT Descriptors and MARS

Authors:

Stephen Milborrow and Fred Nicolls

Abstract: We present a technique for locating landmarks in images of human faces. We replace the 1D gradient profiles of the classical Active Shape Model (ASM) (Cootes and Taylor, 1993) with a simplified form of SIFT descriptors (Lowe, 2004), and use Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991) for descriptor matching. This modified ASM is fast and performs well against existing techniques for automatic face landmarking on frontal faces.

Paper Nr: 133
Title:

Towards Unsupervised Sudden Group Movement Discovery for Video Surveillance

Authors:

Sofia Zaidenberg, Piotr Bilinski and François Brémond

Abstract: This paper presents a novel and unsupervised approach for discovering “sudden” movements in video surveillance videos. The proposed approach automatically detects quick motions in a video, corresponding to any action. A set of possible actions is not required and the proposed method successfully detects potentially alarm-raising actions without training or camera calibration. Moreover, the system uses a group detection and event recognition framework to relate detected sudden movements and groups of people, and provide a semantical interpretation of the scene. We have tested our approach on a dataset of nearly 8 hours of videos recorded from two cameras in the Parisian subway for a European Project. For evaluation, we annotated 1 hour of sequences containing 50 sudden movements.

Paper Nr: 150
Title:

A Pattern Recognition System for Detecting Use of Mobile Phones While Driving

Authors:

Rafael A. Berri, Alexandre G. Silva, Rafael S. Parpinelli, Elaine Girardi and Rangel Arthur

Abstract: It is estimated that 80% of crashes and 65% of near collisions involved drivers inattentive to traffic for three seconds before the event. This paper develops an algorithm for extracting characteristics allowing the cell phones identification used during driving a vehicle. Experiments were performed on sets of images with 100 positive images (with phone) and the other 100 negative images (no phone), containing frontal images of the driver. Support Vector Machine (SVM) with Polynomial kernel is the most advantageous classification system to the features provided by the algorithm, obtaining a success rate of 91.57% for the vision system. Tests done on videos show that it is possible to use the image datasets for training classifiers in real situations. Periods of 3 seconds were correctly classified at 87.43% of cases.

Paper Nr: 195
Title:

Paper Substrate Classification based on 3D Surface Micro-geometry

Authors:

Hossein Malekmohamadi, Khemraj Emrith, Stephen Pollard, Guy Adams, Melvyn Smith and Steve Simske

Abstract: This paper presents an approach to derive a novel 3D signature based on the micro-geometry of paper surfaces so as to uniquely characterise and classify different paper substrates. This procedure is extremely important to confront different conducts of tampering valuable documents. We use a 4-light source photometric stereo (PS) method to recover dense 3D geometry of paper surfaces captured using an ultra-high resolution sensing device. We derived a unique signature for each paper type based on the shape index (SI) map generated from the surface normals of the 3D data. We show that the proposed signature can robustly and accurately classify paper substrates with different physical properties and different surface textures. Additionally, we present results demonstrating that our classification model using the 3D signature performs significantly better as compared to the use of conventional 2D image based descriptors extracted from both printed and non-printed paper surfaces. Accuracy of the proposed method is validated over a dataset comprising of 21 printed and 22 non-printed paper types and a measure of classification success of over 92%is achieved in both cases (92.5% for printed surfaces and 96% for the non-printed ones).

Paper Nr: 202
Title:

Invariant Shape Prior Knowledge for an Edge-based Active Contours - Invariant Shape Prior for Active Contours

Authors:

Mohamed Amine Mezghich, Slim M’Hiri and Faouzi Ghorbel

Abstract: In this paper, we intend to propose a new method to incorporate geometric shape prior into an edge-based active contours for robust object detection in presence of partial occlusions, low contrast and noise. A shape registration method based on phase correlation of binary images, associated with level set functions of the active contour and a reference shape, is used to define prior knowledge making the model invariant with respect to Euclidean transformations. In case of several templates, a set of complete invariant shape descriptors is used to select the most suitable one according to the evolving contour. Experimental results show the ability of the proposed approach to constrain an evolving curve towards a target shapes that may be occluded and cluttered under rigid transformations.

Paper Nr: 209
Title:

Multi-viewpoint Visibility Coverage Estimation for 3D Environment Perception - Volumetric Representation as a Gateway to High Resolution Data

Authors:

Marek Ososinski and Frédéric Labrosse

Abstract: Estimation of visibility is a crucial element of coverage estimation of large, complex environments. This nonprobabilistic problem is often tackled in a 2D context. We present an algorithm that can estimate the visibility of a high resolution scene from a low resolution 3D representation. An octree based voxel representation provides a dataset that is easy to process. Voxel occupancy properties ensure a good approximation of visibilty at high resolution. Our system is capable of producing a reasonable solution to the viewpoint placement issue of the Art gallery problem.

Paper Nr: 211
Title:

Pose Recognition in Indoor Environments using a Fisheye Camera and a Parametric Human Model

Authors:

K. K. Delibasis, V. P. Plagianakos and I. Maglogiannis

Abstract: In this paper we present a system that uses computer vision techniques and a deformable 3D human model, in order to recognize the posture of a monitored person, given the segmented human silhouette from the background. The video data are acquired indoors from a fixed fish-eye camera placed in the living environment. The implemented 3D human model collaborates with a fish-eye camera model, allowing the calculation of the real human position in the 3D-space and consequently recognizing the posture of the monitored person. The paper discusses the details of the human model and fish-eye camera model, as well as the posture recognition methodology. Initial results are also presented for a small number of video sequences, of walking or standing humans.

Paper Nr: 229
Title:

A Multi-stage Segmentation based on Inner-class Relation with Discriminative Learning

Authors:

Haoqi Fan, Yuanshi Zhang and Guoyu Zuo

Abstract: In this paper, we proposed a segmentation approach that not only segment an interest object but also label different semantic parts of the object, where a discriminative model is presented to describe an object in real world images as multiply, disparate and correlative parts. We propose a multi-stage segmentation approach to make inference on the segments of an object. Then we train it under the latent structural SVM learning framework. Then, we showed that our method boost an average increase of about 5% on ETHZ Shape Classes Dataset and 4% on INRIA horses dataset. Finally, extensive experiments of intricate occlusion on INRIA horses dataset show that the approach have a state of the art performance in the condition of occlusion and deformation.

Paper Nr: 240
Title:

A Method of Weather Recognition based on Outdoor Images

Authors:

Qian Li, Yi Kong and Shi-ming Xia

Abstract: To improve the quality of video surveillance in outdoor and automatic acquire of the weather situations, a method to recognize weather phenomenon based on outdoor images is presented. There are three features of our method: firstly, the features, such as the power spectrum slope, contrast, noise and saturation and so on are extracted, after analysing the effect of weather situations on image; secondly, a decision tree is constructed in accordance with the distance between the features; thirdly, when every SVM classifier on the non-leaf node of the decision tree is constructed, some features are selected by assigning the weight. The experiment results prove that the proposed method can effectively recognize the weather situations in outdoor.

Paper Nr: 256
Title:

An Improved Approach for Depth Data based Face Pose Estimation using Particle Swarm Optimization

Authors:

Xiaozheng Mou and Han Wang

Abstract: This paper presents an improved approach for face pose estimation based on depth data using particle swarm optimization (PSO). In this approach, the frontal face of the system-user is first initialized and its depth image is taken as a person-specific template. Each query face of that user is rotated and translated with respect to its centroid using PSO to match with the template. Since the centroid of each query face always changes with the face pose changing, a common reference point has to be defined to measure the exact transformation of the query face. Thus, the nose tips of the optimal transformed face and the query face are localized to recompute the transformation from the query face to the optimal transformed face that matched with the template. Using the recomputed rotation and translation information, finally, the pose of the query face can be approximated by the relative pose between the query face and the template face. Experiments on public database show that the accuracy of this new method is more than 99%, which is much higher than the best performance (< 91%) of existing work.

Paper Nr: 269
Title:

Beyond SIFT for Image Classification

Authors:

Sébastien Paris, Xanadu Halkias and Hervé Glotin

Abstract: In classifying images, scenes or objects, the most popular approach is based on the features extraction-codingpooling framework allowing to generate discriminative and robust image representations from densely extracted local patches, mainly some SIFT/HOG ones. The majority of the latest research is focused on how to improve successfully these coding and pooling parts. In this work, we show that substantial improvements can be also obtained by coding information closer to the pixel values level in the same way that deep-learning architectures do. We introduce a two layer, stacked, coder-pooler architecture where the first layer is specifically dedicated to extract, from our so-called Differential Vectors (DV) patches, some efficient, local low-level features more discriminative and efficient that their classic handcrafted counterpart. This first layer can advantageously replace any classic dense SIFT/HOG patches extraction stage. We demonstrate the effectiveness of our approach on three datasets: UIUC-Sports, Scene 15 and Caltech 101. We achieve excellent performances with simple linear classification while using basic coding and pooling schemes for both layers, i.e. Sparse Coding (SC) and Max-Pooling (MP) respectively.

Paper Nr: 272
Title:

Learning Weighted Joint-based Features for Action Recognition using Depth Camera

Authors:

Guang Chen, Daniel Clarke and Alois Knoll

Abstract: Human action recognition based on joints is a challenging task. The 3D positions of the tracked joints are very noisy if occlusions occur, which increases the intra-class variations in the actions. In this paper, we propose a novel approach to recognize human actions with weighted joint-based features. Previous work has focused on hand-tuned joint-based features, which are difficult and time-consuming to be extended to other modalities. In contrast, we compute the joint-based features using an unsupervised learning approach. To capture the intra-class variance, a multiple kernel learning approach is employed to learn the skeleton structure that combine these joints-base features. We test our algorithm on action application using Microsoft Research Action3D (MSRAction3D) dataset. Experimental evaluation shows that the proposed approach outperforms state-of-the art action recognition algorithms on depth videos.

Paper Nr: 277
Title:

Multi-objective Optimization for Characterization of Optical Flow Methods

Authors:

Jose Delpiano, Luis Pizarro, Rodrigo Verschae and Javier Ruiz-del-Solar

Abstract: Optical flow methods are among the most accurate techniques for estimating displacement and velocity fields in a number of applications that range from neuroscience to robotics. The performance of any optical flow method will naturally depend on the configuration of its parameters. Beyond the standard practice of manual (ad-hoc) selection of parameters for a specific application, in this article we propose a framework for automatic parameter setting that allows searching for an approximated Pareto-optimal set of configurations in the whole parameter space. This final Pareto front characterizes each specific method, enabling proper method comparison. We define two performance criteria, namely the accuracy and speed of the optical flow methods.

Paper Nr: 302
Title:

High Resolution Light Field Photography from Split Ray Imaging and Coded Aperture

Authors:

Shota Taki, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a method for obtaining high resolution 4D light fields by using low resolution camera sensors and controllable coded apertures. Recently, 4D light filed acquisition has been studied extensively in the field of computational photography. Since the 4D light filed consists of much lager information than the ordinary 2D image, we have to use super high resolution camera sensors in order to obtain high resolution 4D light fields. In this paper, we propose a method for obtaining high resolution 4D light fields from low resolution camera sensors. In this method, we combine the standard light field imaging technique with the coded aperture. By using these techniques, we can obtain high resolution 4D light fields from low resolution cameras with small number of image acquisitions. The efficiency of the proposed method is tested by real images.

Paper Nr: 304
Title:

Assisting Navigation in Homogenous Fog

Authors:

Mihai Negru and Sergiu Nedevschi

Abstract: An important cause of road accidents is the reduced visibility due to the presence of fog or haze. For this reason, there is a fundamental need for Advanced Driving Assistance Systems (ADAS) based on efficient real time algorithms able to detect the presence of fog, estimate the fog’s density, determine the visibility distance and inform the driver about the maximum speed that the vehicle should be traveling. Our solution is an improvement over existing methods of detecting fog due to the temporal integration of the horizon line and inflection point in the image. Our method performs in real time; approximately 50 frames per second. It is based on a single in-vehicle camera and is able to detect day time fog in real time in a wide range of scenarios, including urban scenarios.

Paper Nr: 349
Title:

Preliminary Study on the Design of a Low-cost Movement Analysis System - Reliability Measurement of Timed Up and Go Test

Authors:

Asma Hassani, Alexandre Kubicki, Vincent Brost and Fan Yang

Abstract: In this paper, we present experiments on the design of a novel movement analysis system for real-time balance assessment in the frail elderly. Using the Microsoft Kinect sensors, we capture TUG (Timed Up and Go) tests and mainly analyze the transfer from sitting-to-standing and back-to-sitting which represent two of the most commonly executed human movements. Nine spatio-temporal parameters were extracted from recorded joint positions by 3D skeletal sequence processing. In order to validate and evaluate the developed system, practical test experiences have been performed on ten healthy young subjects, who were asked to realize the TUG in three different conditions: normal, cognitive and motor. Obtained results showed good measurement reliability and reproducibility with important precision. In addition, we observed that even for young healthy subjects, there is a significant difference of movement parameter between normal condition and cognitive condition, which represents a stimulating result in the dual task paradigm field. This preliminary study opens a new research and development way for geriatric health which implies multiple aspects: user-friendly, hygiene, low-cost, home-based environment, and automatic autonomy assessment.

Paper Nr: 354
Title:

From Text Vocabularies to Visual Vocabularies - What Basis?

Authors:

Jean Martinet

Abstract: The popular "bag-of-visual-words" approach for representing and searching visual documents consists in describing images (or video keyframes) using a set of descriptors, that correspond to quantized low-level features. Most of existing approaches for visual words are inspired from works in text indexing, based on the implicit assumption that visual words can be handled the same way as text words. More specifically, these techniques implicitly rely on the same postulate as in text information retrieval, stating that the words distribution for a natural language globally follows Zipf's law -- that is to say, words from a natural language appear in a corpus with a frequency inversely proportional to their rank. However, our study shows that the visual words distribution depends on the choice of low-level features, and also especially on the choice of the clustering method. We also show that when the visual words distribution is close to this of text words, the results of an image retrieval system are increased. To the best of our knowledge, no prior study has yet been carried out to compare the distributions of text words and visual words, with the objective of establishing the theoretical foundations of visual vocabularies.

Paper Nr: 356
Title:

A Survey of Extended Methods to the Bag of Visual Words for Image Categorization and Retrieval

Authors:

Mouna Dammak, Mahmoud Mejdoub and Chokri Ben Amar

Abstract: The semantic gap is a crucial issue in the enhancement of computer vision. The user longs for retrieving images on a semantic level, but the image characterizations can only give a low-level similarity. As a result, recording a stage medium between high-level semantic concepts and low-level visual features is a stimulating task. A recent work, called Bag of visual Words (BoW) have arisen to resolve this difficulty in greater generality through the conception of techniques genius relevantly learning semantic vocabularies. In spite of its clarity and effectiveness, the building of a codebook is a critical step which is ordinarily performed by coding and pooling step. Yet, it is still difficult to build a compact codebook with shortened calculation cost. For that, several approaches try to overcome these difficulties and to improve image representation. In this paper, we introduce a survey investigates to cover the inadequacy of a full description of the most important public approaches for image categorization and retrieval.

Paper Nr: 401
Title:

Weighted SIFT Feature Learning with Hamming Distance for Face Recognition

Authors:

Guoyu Lu, Yingjie Hu and Nicu Sebe

Abstract: Scale-invariant feature transform (SIFT) feature has been successfully utilized for face recognition for its tolerance to the changes of image scaling, rotation and distortion. However, a big concern on the use of original SIFT feature for face recognition is SIFT feature’s high dimensionality which leads to slow image matching. Meanwhile, large memory capacity is required to store high dimensional SIFT features. Aiming to find an efficient approach to solve these issues, we propose a new integrated method for face recognition in this paper. The new method consists of two novel functional modules in which a projection function transforms the original SIFT features into a low dimensional Hamming feature space while each bit of the Hamming descriptor is ranked based on their discrimination power. Furthermore, a weighting function assigns different weights to the correctly matched features based on their matching times. Our proposed face recognition method has been applied on two benchmark facial image datasets: ORL and Yale datasets. The experimental results have shown that the new method is able to produce good image recognition rate with much improved computational speed.

Posters
Paper Nr: 29
Title:

A Complete Framework for Fully-automatic People Indexing in Generic Videos

Authors:

Dario Cazzato, Marco Leo and Cosimo Distante

Abstract: Face indexing is a very popular research topic and it has been investigated over the last 10 years. It can be used for a wide range of applications such as automatic video content analysis, data mining, video annotation and labeling, etc. In this work a fully automated framework that can detect how many people are present in a generic video (even having low resolution and/or taken from a mobile camera) is presented. It also extracts the intervals of frames in which each person appears. The main contributions of the proposed work are that no initializations neither a priory knowledge about the scene contents are required. Moreover, this approach introduces a generalized version of the k-means method that, through different statistical indices, automatically determines the number of people in the scene.

Paper Nr: 61
Title:

Obstacle and Planar Object Detection using Sparse 3D Information for a Smart Walker

Authors:

Séverine Cloix, Viviana Weiss, Guido Bologna, Thierry Pun and David Hasler

Abstract: With the increasing proportion of senior citizens, many mobility aid devices have been developed such as the rollator. However, under some circumstances, the latter may cause accidents. The EyeWalker project aims to develop a small and autonomous device for rollators to help elderly people, especially those with some degree of visual impairment, avoiding common dangers like obstacles and hazardous ground changes, both outdoors and indoors. We propose a method of real-time stereo obstacle detection using sparse 3D information. Working with sparse 3D points, in opposition to dense 3D maps, is computationally more efficient and more appropriate for a long battery-life. In our approach, 3D data are extracted from a stereo-rig of two 2D high dynamic range cameras developed at the CSEM (Centre Suisse d'Electronique et de Microtechnique) and processed to perform a boosting classification. We also present a deformable 3D object detector for which the 3D points are combined in several different ways and result in a set of pose estimates used to execute a less ill-posed classification. The evaluation, carried out on real stereo images of obstacles described with both 2D and 3D features, shows promising results for a future use in real-world conditions.

Paper Nr: 69
Title:

A Robust, Real-time Ground Change Detector for a “Smart” Walker

Authors:

Viviana Weiss, Séverine Cloix, Guido Bologna, David Hasler and Thierry Pun

Abstract: Nowadays, there are many different types of mobility aids for elderly people. Nevertheless, these devices may lead to accidents, depending on the terrain where they are being used. In this paper, we present a robust ground change detector that will warn the user of potentially risky situations. Specifically, we propose a robust classification algorithm to detect ground changes based on colour histograms and texture descriptors. In our design, we compare the current frame and the average of the k previous frames using different colour systems and Local Edge Patterns. To assess the performance of our algorithm, we evaluated different Artificial Neural Networks architectures. The best results were obtained by representing in the input neurons measures related to Histogram Intersections, Kolmogorov-Smirnov distance, Cumulative Integrals and Earth mover’s distance. Under real environmental conditions our results indicated that our proposed detector can accurately distinguish the grounds changes in real-time.

Paper Nr: 83
Title:

Detecting Unusual Inactivity by Introducing Activity Histogram Comparisons

Authors:

Rainer Planinc and Martin Kampel

Abstract: Unusual inactivity at elderly’s homes is an evidence that help is needed. Hence, the automatic detection of abnormal behaviour with a low number of false positives is desired. The aim of this work is to improve the accuracy of inactivity detection by introducing a new approach based on histogram comparison in order to reliably detect abnormal behaviour in elderly’s homes. The proposed approach compares activity histograms with a pre-trained reference histogram and detects deviations from normal behavior. Evaluation is performed on a dataset containing 103 days of activity, where six days were reported as containing ”unusual” inactivity (i.e., longer absence from home) by an elderly couple.

Paper Nr: 95
Title:

Photo Rating of Facial Pictures based on Image Segmentation

Authors:

Arnaud Lienhard, Marion Reinhard, Alice Caplier and Patricia Ladret

Abstract: A single glance at a face is enough to infer a first impression about someone. With the increasing amount of pictures available, selecting the most suitable picture for a given use is a difficult task. This work focuses on the estimation of the image quality of facial portraits. Some image quality features are extracted such as blur, color representation, illumination and it is shown that concerning facial picture rating, it is better to estimate each feature on the different picture parts (background and foreground). The performance of the proposed image quality estimator is evaluated and compared with a subjective facial picture quality estimation experiment.

Paper Nr: 108
Title:

VabCut: A Video Extension of GrabCut for Unsupervised Video Foreground Object Segmentation

Authors:

Sebastien Poullot and Shin'Ichi Satoh

Abstract: This paper introduces VabCut, a video extension of GrabCut, an original unsupervised solution to tackle the video foreground object segmentation task. Vabcut works on an extension of the RGB colour domain to RGBM, where M is the motion. It requires a prior step: the computation of the motion layer (M-layer) of the frame to segment. In order to compute this layer we propose to intersect the frame to segment with N temporally close aligned frames. This paper also introduces a new iterative and collaborative method for an optimal frame alignment, based on points of interest and RANSAC, which automatically discards outliers and refines the homographies in turns. The whole method is fully automatic and can handle standard video, i.e. not professional, shaky, blurry or else. We tested VabCut on the SegTrack 2011 benchmark, and demonstrated its effectiveness, it especially outperforms the state of the art methods while being faster.

Paper Nr: 114
Title:

Regional SVM Classifiers with a Spatial Model for Object Detection

Authors:

Zhu Teng, Baopeng Zhang, Onecue Kim and Dong-Joong Kang

Abstract: This paper presents regional Support Vector Machine (SVM) classifiers with a spatial model for object detection. The conventional SVM maps all the features of training examples into a feature space, treats these features individually, and ignores the spatial relationship of the features. The regional SVMs with a spatial model we propose in this paper take into account a 3-dimentional relationship of features. One-dimensional relationship is incorporated into the regional SVMs. The other two-dimensional relationship is the pairwise relationship of regional SVM classifiers acting on features, and is modelled by a simple conditional random field (CRF). The object detection system based on the regional SVM classifiers with the spatial model is demonstrated on several public datasets, and the performance is compared with that of other object detection algorithms.

Paper Nr: 134
Title:

Group Tracking and Behavior Recognition in Long Video Surveillance Sequences

Authors:

Carolina Gárate, Sofia Zaidenberg, Julien Badie and François Brémond

Abstract: This paper makes use of recent advances in group tracking and behavior recognition to process large amounts of video surveillance data from an underground railway station and perform a statistical analysis. The most important advantages of our approach are the robustness to process long videos and the capacity to recognize several and different events at once. This analysis automatically brings forward data about the usage of the station and the various behaviors of groups in different hours of the day. This data would be very hard to obtain without an automatic group tracking and behavior recognition method. We present the results and interpretation of one month of processed data from a video surveillance camera in the Torino subway.

Paper Nr: 142
Title:

Collaborative Vision Network for Personalized Office Ergonomics

Authors:

Tommi Määttä, Chih-Wei Chen, Aki Härmä and Hamid Aghajan

Abstract: This paper proposes a collaborative vision network that leverages a personal webcam and cameras of the workplace to provide feedback relating to an office-worker’s adherence to ergonomic guidelines. This can lead to increased well-being for the individual and better productivity in their work. The proposed system is evaluated with a recorded multi-camera dataset from a regular office environment. First, analysis results on various ergonomic issues are presented based on personal webcams of the two workers. Second, both personal and ambient cameras are used through sensor fusion to infer the mobility state of one of the workers. Results for various fusion approaches are shown and their impact on vision network design is briefly discussed.

Paper Nr: 156
Title:

In Search of a Car - Utilizing a 3D Model with Context for Object Detection

Authors:

Mikael Nilsson and Håkan Ardö

Abstract: Automatic video analysis of interactions between road users is desired for city and road planning. A first step of such a system is object localization of road users. In this work, we present a method of detecting a specific car in an intersection from a monocular camera image. A camera calibration and segmentation are utilized as inputs by the method in order to detect a car. Using these inputs, a sampled search space in the ground plane, including rotations, is explored with a 3D model of a car in order to produce output in form of rectangle detections in the ground plane. Evaluation on real recorded data, with ground truth for one car using GPS, indicates that a car can be detected in over 90% of the time with an average error around 0.5m.

Paper Nr: 163
Title:

SVM-based Video Segmentation and Annotation of Lectures and Conferences

Authors:

Stefano Masneri and Oliver Schreer

Abstract: This paper presents a classification system for video lectures and conferences based on Support Vector Machines (SVM). The aim is to classify videos into four different classes (talk, presentation, blackboard, mix). On top of this, the system further analyses presentation segments to detect slide transitions, animations and dynamic content such as video inside the presentation. The developed approach uses various colour and facial features from two different datasets of several hundred hours of video to train an SVM classifier. The system performs the classification on frame-by-frame basis and does not require pre-computed shotcut information. To avoid over-segmentation and to take advantage of the temporal correlation of succeeding frames, the results are merged every 50 frames into a single class. The presented results prove the robustness and accuracy of the algorithm. Given the generality of the approach, the system can be easily adapted to other lecture datasets.

Paper Nr: 177
Title:

What to Show? - Automatic Stream Selection among Multiple Sensors

Authors:

Rémi Emonet, E. Oberzaucher and J.-M. Odobez

Abstract: The installation of surveillance networks has been growing exponentially in the last decade. In practice, videos from large surveillance networks are almost never watched, and it is frequent to see surveillance video wall monitors showing empty scenes. There is thus a need to design methods to continuously select streams to be shown to human operators. This paper addresses this issue and make three main contributions: it introduces and investigates, for the first time in the literature, the live stream selection task; based on the theory of social attention, it formalizes a way of obtaining some ground truth for the task and hence a way of evaluating stream selection algorithms; and finally, it proposes a two-step approach to solve this task and compares different approaches for interestingness rating using our framework. Experiments conducted on 9 cameras from a metro station and 5 hours of data randomly selected over one week show that, while complex unsupervised activity modeling algorithms achieve good performance, simpler approaches based on amount of motion perform almost as well for this type of indoor setting.

Paper Nr: 182
Title:

Dense Segmentation of Textured Fruits in Video Sequences

Authors:

Waqar S. Qureshi, Shin'ichi Satoh, Matthew N. Dailey and Mongkol Ekpanyapong

Abstract: Autonomous monitoring of fruit crops based on mobile camera sensors requires methods to segment fruit regions from the background in images. Previous methods based on color and shape cues have been successful in some cases, but the detection of textured green fruits among green plant material remains a challenging problem. A recently proposed method uses sparse keypoint detection, keypoint descriptor computation, and keypoint descriptor classification followed by morphological techniques to fill the gaps between positively classified keypoints. We propose a textured fruit segmentation method based on super-pixel oversegmentation, dense SIFT descriptors, and and bag-of-visual-word histogram classification within each super-pixel. An empirical evaluation of the proposed technique for textured fruit segmentation yields 96.67% detection rate, a per-pixel accuracy of 97.657%, and a per frame false alarm rate of 0.645%, compared to a detection rate of 90.0%, accuracy of 84.94%, and false alarm rate of 0.887% for the baseline sparse keypoint-based method. We conclude that super-pixel oversegmentation, dense SIFT descriptors, and bag-of-visual-word histogram classification are effective for in-field segmentation of textured green fruits from the background..

Paper Nr: 220
Title:

Fast Violence Detection in Video

Authors:

Oscar Deniz, Ismael Serrano, Gloria Bueno and Tae-Kyun Kim

Abstract: Whereas the action recognition problem has become a hot topic within computer vision, the detection of fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like in prisons, psychiatric centers or even in camera phones. Recent work has considered the well-known Bag-of-Words framework often used in generic action recognition for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which near 90% accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications, particularly in surveillance and media rating systems. The task of violence detection may have, however, specific features that can be leveraged. Inspired by results that suggest that kinematic features alone are discriminant for specific actions, this work proposes a novel method which uses extreme acceleration patterns as the main feature. These extreme accelerations are efficiently estimated by applying the Radon transform to the power spectrum of consecutive frames. Experiments show that accuracy improvements of up to 12% are achieved with respect to state-of-the-art action recognition methods. Most importantly, the proposed method is at least 15 times faster.

Paper Nr: 231
Title:

A Bottom-up Approach to Class-dependent Feature Selection for Material Classification

Authors:

Pascal Mettes, Robby Tan and Remco Veltkamp

Abstract: In this work, the merits of class-dependent image feature selection for real-world material classification is investigated. Current state-of-the-art approaches to material classification attempt to discriminate materials based on their surface properties by using a rich set of heterogeneous local features. The primary foundation of these approaches is the hypothesis that materials can be optimally discriminated using a single combination of features. Here, a method for determining the optimal subset of features for each material category separately is introduced. Furthermore, translation and scale-invariant polar grids have been designed in this work to show that, although materials are not restricted to a specific shape, there is a clear structure in the spatial allocation of local features. Experimental evaluation on a database of real-world materials indicates that indeed each material category has its own preference. The use of both the class-dependent feature selection and polar grids results in recognition rates which exceed the current state-of-the-art results.

Paper Nr: 238
Title:

Self-learning Voxel-based Multi-camera Occlusion Maps for 3D Reconstruction

Authors:

Maarten Slembrouck, Dimitri Van Cauwelaert, David Van Hamme, Dirk Van Haerenborgh, Peter Van Hese, Peter Veelaert and Wilfried Philips

Abstract: The quality of a shape-from-silhouettes 3D reconstruction technique strongly depends on the completeness of the silhouettes from each of the cameras. Static occlusion, due to e.g. furniture, makes reconstruction difficult, as we assume no prior knowledge concerning shape and size of occluding objects in the scene. In this paper we present a self-learning algorithm that is able to build an occlusion map for each camera from a voxel perspective. This information is then used to determine which cameras need to be evaluated when reconstructing the 3D model at every voxel in the scene. We show promising results in a multi-camera setup with seven cameras where the object is significantly better reconstructed compared to the state of the art methods, despite the occluding object in the center of the room.

Paper Nr: 248
Title:

Constructing Facial Expression Log from Video Sequences using Face Quality Assessment

Authors:

Mohammad A. Haque, Kamal Nasrollahi and Thomas B. Moeslund

Abstract: Facial expression logs from long video sequences effectively provide the opportunity to analyse facial expression changes for medical diagnosis, behaviour analysis, and smart home management. Generating facial expression log involves expression recognition from each frame of a video. However, expression recognition performance greatly depends on the quality of the face image in the video. When a facial video is captured, it can be subjected to problems like low resolution, pose variation, low brightness, and motion blur. Thus, this paper proposes a system for constructing facial expression log by employing a face quality assessment method and investigates its influence on the representations of facial expression logs of long video sequences. A framework is defined to incorporate face quality assessment with facial expression recognition and logging system. While assessing the face quality a face-completeness metric is used along with some other state-of-the-art metrics. Instead of discarding all of the low quality faces from a video sequence, a windowing approach has been applied to select best quality faces in regular intervals. Experimental results show a good agreement between the expression logs generated from all face frames and the expression logs generated by selecting best faces in regular intervals.

Paper Nr: 249
Title:

Vision based System for Vacant Parking Lot Detection: VPLD

Authors:

Imen Masmoudi, Ali Wali, Anis Jamoussi and Adel M. Alimi

Abstract: The proposed system comes in the context of intelligent parking lots management and presents an approach for vacant parking spots detection and localization. Our system provides a camera-based solution, which can deal with outdoor parking lots. It returns the real time states of the parking lots providing the number of available vacant places and its specific positions in order to guide the drivers through the roads. In order to eliminate the real world challenges, we propose a combination of the Adaptive Background Subtraction algorithm to overcome the problems of changing lighting and shadow effects with the Speeded Up Robust Features algorithm to benefit from its robustness to the scale changes and the rotation. Our approach presents also a new state ”Transition” for the classification of the parking places states.

Paper Nr: 275
Title:

Dynamic Scene Recognition based on Improved Visual Vocabulary Model

Authors:

Lin Yan-Hao and Lu-Fang GAO

Abstract: In this paper, we present a scene recognition framework, which could process the images and recognize the scene in the images. We demonstrate and evaluate the performance of our system on a dataset of Oxford typical landmarks. In this paper, we put forward a novel method of local k-meriod for building a vocabulary and introduce a novel quantization method of soft-assignment based on the Gaussian mixture model. Then we also introduced the Gaussian model in order to classify the images into different scenes by calculating the probability of whether an image belongs to the scene , and we further improve the model by drawing out the consistent features and filtering out the noise features. Our experiment proves that these methods actually improve the classifying performance.

Paper Nr: 279
Title:

2D-3D Face Recognition via Restricted Boltzmann Machines

Authors:

Xiaolong Wang, Vincent Ly, Rui Guo and Chandra Kambhamettu

Abstract: This paper proposes a new scheme for the 2D-3D face recognition problem. Our proposed framework mainly consists of Restricted Boltzmann Machines (RBMs) and a correlation learning model. In the framework, a single-layer network based on RBMs is adopted to extract latent features over two different modalities. Furthermore, the latent hidden layer features of different models are projected to formulate a shared space based on correlation learning. Then several different correlation learning schemes are evaluated against the proposed scheme. We evaluate the advocated approach on a popular face dataset-FRGCV2.0. Experimental results demonstrate that the latent features extracted using RBMs are effective in improving the performance of correlation mapping for 2D-3D face recognition.

Paper Nr: 289
Title:

Training Optimum-Path Forest on Graphics Processing Units

Authors:

Adriana S. Iwashita, Marcos V. T. Romero, Alexandro Baldassin, Kelton A. P. Costa and Joao P. Papa

Abstract: In this paper, we presented a Graphics Processing Unit (GPU)-based training algorithm for Optimum-Path Forest (OPF) classifier. The proposed approach employs the idea of a vector-matrix multiplication to speed up both traditional OPF training algorithm and a recently proposed Central Processing Unit (CPU)-based OPF training algorithm. Experiments in several public datasets have showed the efficiency of the proposed approach, which demonstrated to be up to 14 times faster for some datasets. To the best of our knowledge, this is the first GPU-based implementation for OPF training algorithm.

Paper Nr: 293
Title:

Face Recognition based on Binary Images for Link Selection

Authors:

Sanghun Lee, Soochang Kim, Young-hoon Kim and Chulhee Lee

Abstract: A face recognition system which utilizes binary facial images and a bitwise similarity calculation method is proposed for link selection between mobile devices. As a pre-processing step, normalized differences of Gaussian and facial region estimation were used to handle illumination conditions. Binary images were used to extract facial feature sets that did not exceed 700 bytes. Scale pyramids and XNOR+AND similarity scores were used for fast feature matching between reference data sets and pre-processed test data. The proposed method achieved about an 85.9% recognition rate with a database that consisted of 135 facial images with various head poses, obtained by enrolling one reference data set per subject.

Paper Nr: 296
Title:

Boosted Random Forest

Authors:

Yohei Mishina, Masamitsu Tsuchiya and Hironobu Fujiyoshi

Abstract: The ability of generalization by random forests is higher than that by other multi-class classifiers because of the effect of bagging and feature selection. Since random forests based on ensemble learning requires a lot of decision trees to obtain high performance, it is not suitable for implementing the algorithm on the small-scale hardware such as embedded system. In this paper, we propose a boosted random forests in which boosting algorithm is introduced into random forests. Experimental results show that the proposed method, which consists of fewer decision trees, has higher generalization ability comparing to the conventional method.

Paper Nr: 297
Title:

Deformable Part Model based Multiple Pedestrian Detection for Video Surveillance in Crowded Scenes

Authors:

Lu Wang, Xiaoli Ji, Qingxu Deng and Mingxing Jia

Abstract: Pedestrian detection is a challenging task for video surveillance. The problem becomes more difficult when occlusion is prevalent. In this paper, we extend a deformable part-based pedestrian detector to pedestrian detection in crowded scenes by considering both body part detection responses and detections' mutual spatial relationship. Specifically, we first decompose the full body detector into several body part detectors, whose detection responses can be computed efficiently from the response of the full body detector. Then, given the detection responses of the body part detectors, hypotheses are nominated by considering both detection scores and responses’ mutual spatial relationship. Finally, a local optimization process is applied to make the final decision, where an objective function encouraging detections with high confidence, high discriminability and low conflict with other detections is proposed to select the best candidate detections. Experimental results show the effectiveness of the proposed approach.

Paper Nr: 303
Title:

Logos Detection from Moving Vehicles

Authors:

A. Ben Hamida, M. Brulin and H. Nicolas

Abstract: To deal with road accidents, especially accidents caused by trucks containing dangerous products, the possible solution is to control these vehicles’ passage. We aim at developing a software technique confirming that all the entered engines inside a tunnel are securely quitted, to guarantee that no accidents, no breakdowns have occurred inside. To implement such solution, we identify the ingoing and outgoing trucks by extracting their significant marks. These marks help to differentiate each vehicle from the other. They are the mounted logos as license plates and pictograms. To ensure the safe exit of one truck, we look for the similarity between the ingoing and outgoing vehicle’s images by comparing their detected symbols. In this paper, we present a controlling system capable to extract logos from moving trucks to verify their safe entrees and exits. Both theoretical analyses and experimental results are provided to show the performance of the proposed system.

Paper Nr: 308
Title:

Fast Optimum-Path Forest Classification on Graphics Processors

Authors:

Marcos V. T. Romero, Adriana S. Iwashita, Luciene P. Papa, André N. Souza and João P. Papa

Abstract: Some pattern recognition techniques may present a high computational cost for learning samples’ behaviour. The Optimum-Path Forest (OPF) classifier has been recently developed in order to overcome such drawbacks. Although it can achieve faster training steps when compared to some state-of-art techniques, OPF can be slower for testing in some situations. Therefore, we propose in this paper an implementation in graphics cards of the OPF classification, which showed to be more efficient than traditional OPF with similar accuracies.

Paper Nr: 313
Title:

Ghost Pruning for People Localization in Overlapping Multicamera Systems

Authors:

Muhammad Owais Mehmood, Sebastien Ambellouis and Catherine Achard

Abstract: In this paper, we propose a novel ghost pruning technique for multicamera people localization in overlapping scenarios. First, synergy map is obtained from multiplanar projections across multiple overlapping cameras. Second, occupancy map is generated by back projection from the synergy map across various image layers. This back projected occupancy map is combined with constraints to remove ghosts. The novelty of this paper is the introduction of an intuitive ghost pruning technique, which does not require any temporal information. Experiments on a sequence of the PETS 2009 dataset show significant reduction in the number of ghosts. The purpose and novelty of this paper is focused to the ghost pruning module but detection metrics show results comparable to those of the complete, state-of-the-art multicamera object detection systems.

Paper Nr: 321
Title:

Fall Detection using Ceiling-mounted 3D Depth Camera

Authors:

Michal Kepski and Bogdan Kwolek

Abstract: This paper proposes an algorithm for fall detection using a ceiling-mounted 3D depth camera. The lying pose is separated from common daily activities by a k-NN classifier, which was trained on features expressing headfloor distance, person area and shape’s major length to width. In order to distinguish between intentional lying postures and accidental falls the algorithm also employs motion between static postures. The experimental validation of the algorithm was conducted on realistic depth image sequences of daily activities and simulated falls. It was evaluated on more than 45000 depth images and gave 0% error. To reduce the processing overload an accelerometer was used to indicate the potential impact of the person and to start an analysis of depth images.

Paper Nr: 339
Title:

Novel Parallel Algorithm for Object Recognition with the Ensemble of Classifiers based on the Higher-Order Singular Value Decomposition of Prototype Pattern Tensors

Authors:

Boguslaw Cyganek and Katarzyna Socha

Abstract: In this paper a novel parallel algorithm for the tensor based classifiers for object recognition in digital images is presented. Classification is performed with an ensemble of base classifiers, each operating in the orthogonal subspaces obtained with the Higher-Order Singular Value Decomposition (HOSVD) of the prototype pattern tensors. Parallelism of the system is realized through the functional and data decompositions on different levels of computations. First, the parallel implementation of the HOSVD is presented. Then, the second level of parallelism is gained by partitioning the input dataset. Each of the partitions is used to train a separate tensor classifiers of the ensemble. Despite the computational speed-up and lower memory requirements, also accuracy of the ensemble showed to be higher compared to a single classifier. The method was tested in the context of object recognition in computer vision. The experiments show high accuracy and accelerated performance both in the training and classification stages.

Paper Nr: 348
Title:

A Robust Metric for the Evaluation of Visual Saliency Models

Authors:

Puneet Sharma and Ali Alsam

Abstract: Finding a robust metric for evaluating the visual saliency algorithms has been the subject of research for decades. Motivated by the shuffled AUC metric in this paper, we propose a robust AUC metric that uses the statistical analysis of the fixations data to better judge the goodness of the different saliency algorithms. To calculate the robust AUC metric, we use the first eigenvector obtained from the statistical analysis to define the area from which non-fixations are selected thus mitigating the effect of the center bias. Our results show that the proposed metric results in similar performance when compared with the shuffled AUC metric, but given that the proposed metric is derived from the statistics for the data set, we believe that it is more robust.

Paper Nr: 398
Title:

Texture Classification with Fisher Kernel Extracted from the Continuous Models of RBM

Authors:

Tayyaba Azim and Mahesan Niranjan

Abstract: In this paper, we introduce a novel technique of deriving Fisher kernels from the Gaussian Bernoulli restricted Boltzmann machine (GBRBM) and factored 3-way restricted Boltzmann machine (FRBM) to yield better texture classification results. GBRBM and FRBM, both, are stochastic probabilistic models that have already shown their suitability for modelling real valued continuous data, however, they are not efficient models for classification based on their likelihood performances (Jaakkola and Haussler, 1999; Azim and Niranjan, 2013). We induce discrimination in these models with the help of Fisher kernel that is constructed from the gradients of the parameters of the generative model. From the empirical results shown on two different texture data sets, i.e. Emphysema and Brodatz, we demonstrate how a useful texture classifier could be built from a very compact generative model that represents the data in the Fisher score space discriminately. The proposed discriminative technique allows us to achieve competitive classification performance on texture data sets, without expanding the size of the generative model with large number of hidden units. Also, comparative analysis shows that factored 3-way RBM is a good representative model of textures, giving rise to a Fisher score space that is less sparse and efficient for classification.

Area 4 - Applications and Services

Full Papers
Paper Nr: 131
Title:

On the Usage of Sensor Pattern Noise for Picture-to-Identity Linking through Social Network Accounts

Authors:

Riccardo Satta and Pasquale Stirparo

Abstract: Digital imaging devices have gained an important role in everyone’s life, due to a continuously decreasing price, and of the growing interest on photo sharing through social networks. As a result of the above facts, everyone continuously leaves visual “traces” of his/her presence and life on the Internet, that can constitute precious data for forensic investigators. Digital Image Forensics is the task of analysing such digital images for collecting evidences. In this field, the recent introduction of techniques able to extract a unique “fingerprint” of the source camera of a picture, e.g. based on the Sensor Pattern Noise (SPN), has set the way for a series of useful tools for the forensic investigator. In this paper, we propose a novel usage of SPN, to find social network accounts belonging to a certain person of interest, who has shot a given photo. This task, that we name Picture-to-Identity linking, can be useful in a variety of forensic cases, e.g., finding stolen camera devices, cyber-bullying, or on-line child abuse. We experimentally test a method for Picture-to-Identity linking on a benchmark data set of publicly accessible social network accounts collected from the Internet. We report promising result, which show that such technique has a practical value for forensic practitioners.

Paper Nr: 198
Title:

Towards a Heuristic based Real Time Hybrid Rendering - A Strategy to Improve Real Time Rendering Quality using Heuristics and Ray Tracing

Authors:

Paulo Andrade, Thales Sabino and Esteban Clua

Abstract: Hybrid rendering combines the speed of raster-based rendering with the photorealism of ray trace rendering in order to achieve both speed and visual quality for interactive applications. Since ray tracing images is a demanding task, a hybrid renderer must use ray tracing carefully in order to maintain an acceptable frame rate. Fixed solutions, where only shadows or reflexive objects are ray traced not only cannot guarantee real time, but can represent a waste of processing, if the final result minimally differs from a raster only result. In our work, we present a method to improve hybrid rendering by analysing the scene in real time and decide what should be ray traced, in order to provide the best visual experience within acceptable frame rates.

Paper Nr: 200
Title:

Iris Liveness Detection Methods in Mobile Applications

Authors:

Ana F. Sequeira, Juliano Murari and Jaime S. Cardoso

Abstract: Biometric systems are vulnerable to different kinds of attacks. Particularly, the systems based on iris are vulnerable to direct attacks consisting on the presentation of a fake iris to the sensor trying to access the system as it was from a legitimate user. The analysis of some countermeasures against this type of attacking scheme is the problem addressed in the present paper. Several state-of-the-art methods were implemented and included in a feature selection framework so as to determine the best cardinality and the best subset that conducts to the highest classification rate. Three different classifiers were used: Discriminant analysis, K nearest neighbours and Support Vector Machines. The implemented methods were tested in existing databases for iris liveness purposes (Biosec and Clarkson) and in a new fake database which was constructed for evaluation of iris liveness detection methods in the mobile scenario. The results suggest that this new database is more challenging than the others. Therefore, improvements are required in this line of research to achieve good performance in real world mobile applications.

Paper Nr: 244
Title:

Virtual Touch Screen “VIRTOS” - Implementing Virtual Touch Buttons and Virtual Sliders using a Projector and Camera

Authors:

Takashi Homma and Katsuto Nakajima

Abstract: We propose a large interactive display with virtual touch buttons and sliders on a pale-colored flat wall. Our easy-to-install system consists of a front projector and a single commodity camera. A button touch is detected based on the area of the shadow cast by the user’s hand; this shadow becomes very small when the button is touched. The shadow area is segmented by a brief change of the button to a different color when a large foreground (i.e., the hand and its shadow) covers the button region. Therefore, no time consuming operations, such as morphing or shape analysis, are required. Background subtraction is used to extract the foreground region. The reference image for the background is continuously adjusted to match the ambient light. Our virtual slider is based on this touch-button mechanism. When tested, our scheme proved robust to differences in illumination. The response time for touch detection was about 150 ms. Our virtual slider has a quick response and proved suitable as a controller for a Breakout-style game.

Paper Nr: 245
Title:

Quantitative Analysis of Pulmonary Emphysema using Isotropic Gaussian Markov Random Fields

Authors:

Chathurika Dharmagunawardhana, Sasan Mahmoodi, Michael Bennett and Mahesan Niranjan

Abstract: A novel texture feature based on isotropic Gaussian Markov random fields is proposed for diagnosis and quantification of emphysema and its subtypes. Spatially varying parameters of isotropic Gaussian Markov random fields are estimated and their local distributions constructed using normalized histograms are used as effective texture features. These features integrate the essence of both statistical and structural properties of the texture. Isotropic Gaussian Markov Random Field parameter estimation is computationally efficient than the methods using other MRF models and is suitable for classification of emphysema and its subtypes. Results show that the novel texture features can perform well in discriminating different lung tissues, giving comparative results with the current state of the art texture based emphysema quantification. Furthermore supervised lung parenchyma tissue segmentation is carried out and the effective pathology extents and successful tissue quantification are achieved.

Paper Nr: 286
Title:

Event-driven Dynamic Platform Selection for Power-aware Real-time Anomaly Detection in Video

Authors:

Calum G Blair and Neil M Robertson

Abstract: In surveillance and scene awareness applications using power-constrained or battery-powered equipment, performance characteristics of processing hardware must be considered. We describe a novel framework for moving processing platform selection from a single design-time choice to a continuous run-time one, greatly increasing flexibility and responsiveness. Using Histogram of Oriented Gradients (HOG) object detectors and Mixture of Gaussians (MoG) motion detectors running on 3 platforms (FPGA, GPU, CPU), we characterise processing time, power consumption and accuracy of each task. Using a dynamic anomaly measure based on contextual object behaviour, we reallocate these tasks between processors to provide faster, more accurate detections when an increased anomaly level is seen, and reduced power consumption in routine or static scenes. We compare power- and speed- optimised processing arrangements with automatic event-driven platform selection, showing the power and accuracy tradeoffs between each. Real-time performance is evaluated on a parked vehicle detection scenario using the i-LIDS dataset. Automatic selection is 10% more accurate than power-optimised selection, at the cost of 12W higher average power consumption in a desktop system.

Paper Nr: 340
Title:

Optimization of Endoscopic Video Stabilization by Local Motion Exclusion

Authors:

Thomas Gross, Navya Amin, Marvin C. Offiah, Susanne Rosenthal, Nail El-Sourani and Markus Borschbach

Abstract: Hitherto video stabilization algorithms for different types of videos have been proposed. Our work majorly focuses on developing stabilization algorithms for endoscopic videos which include distortions peculiar to endoscopy. In this paper, we deal with the optimization of the motion detection procedure which is the most important step in the development of a video stabilization algorithm. It presents a robust motion estimation procedure to enhance the quality of the outcome. The outcome of the later steps in the stabilization, namely motion compensation and image composition depend on the level of precision of the motion estimation step. The results of a previous version of the stabilization algorithm are here compared to a new optimized version. Furthermore, the improvements of the outcomes using the video quality estimation methods are also discussed.

Short Papers
Paper Nr: 30
Title:

Mastering the Art of Persuasion - Intelligent Tutoring System for Presenters

Authors:

Anh-Tuan Nguyen, Wei Chen and Matthias Rauterberg

Abstract: Public speaking is a non-trivial task since it is affected by how nonverbal behaviors are expressed. Practicing to deliver the appropriate expressions is difficult while they are mostly given subconsciously. This paper presents our empirical study on the nonverbal behaviors of presenters. Such information was used as the ground truth to develop an intelligent tutoring system. The system can capture bodily characteristics of presenters via a depth camera, interpret this information in order to assess the quality of the presentation, and then give feedbacks to users. Feedbacks are delivered immediately through a virtual conference room, in which the reactions of the simulated avatars can be controlled based on the performance of presenters.

Paper Nr: 78
Title:

Improved Pulse Detection from Head Motions using DCT

Authors:

Ramin Irani, Kamal Nasrollahi and Thomas B. Moeslund

Abstract: The heart pulsation sends out the blood throughout the body. The rate in which the heart performs this vital task, heartbeat rate, is of curial importance to the body. Therefore, measuring heartbeat rate, a.k.a. pulse detection, is very important in many applications, especially the medical ones. To measure it, physicians traditionally, either sense the pulsations of some blood vessels or install some sensors on the body. In either case, there is a need for a physical contact between the sensor and the body to obtain the heartbeat rate. This might not be always feasible, for example, for applications like remote patient monitoring. In such cases, contactless sensors, mostly based on computer vision techniques, are emerging as interesting alternatives. This paper proposes such a system, in which the heartbeats (pulses) are detected by subtle motions that appear on the face due to blood circulation. The proposed system has been tested in different facial expressions. The experimental results show that the proposed system is correct and robust and outperforms state-of-the-art.

Paper Nr: 94
Title:

Client-side Mobile Visual Search

Authors:

Andreas Hartl, Dieter Schmalstieg and Gerhard Reitmayr

Abstract: Visual search systems present a simple way to obtain information about our surroundings, our location or an object of interest. Typically, mobile applications of visual search remotely connect to large-scale systems capable of dealing with millions of images. Querying such systems may induce considerable delays, which can severeley harm usability or even lead to complete rejection by the user. In this paper, we investigate an interim solution and system design using a local visual search system for embedded devices. We optimized a traditional visual search system to decrease runtime and also storage space in order to scale to thousands of training images on current off-the-shelf smartphones. We demonstrate practical applicability in a prototype for mobile visual search on the same target platform. Compared with the unmodified version of the pipeline we achieve up to a two-fold speed-up in runtime, save 85% of storage space and provide substantially increased recognition performance. In addition, we integrate the pipeline with a popular Augmented Reality SDK on Android devices and use it as a pre-selector for tracking datasets. This allows to instantly use a large number of tracking targets without requiring user intervention or costly server-side recognition.

Paper Nr: 116
Title:

MobBIO: A Multimodal Database Captured with a Portable Handheld Device

Authors:

Ana F. Sequeira, João C. Monteiro, Ana Rebelo and Hélder P. Oliveira

Abstract: Biometrics represents a return to a natural way of identification: testing someone by what (s)he is, instead of relying on something (s)he owns or knows seems likely to be the way forward. Biometric systems that include multiple sources of information are known as multimodal. Such systems are generally regarded as an alternative to fight a variety of problems all unimodal systems stumble upon. One of the main challenges found in the development of biometric recognition systems is the shortage of publicly available databases acquired under real unconstrained working conditions. Motivated by such need the MobBIO database was created using an Asus EeePad Transformer tablet, with mobile biometric systems in mind. The proposed database is composed by three modalities: iris, face and voice.

Paper Nr: 132
Title:

Towards Fully Automated Person Re-identification

Authors:

Matteo Taiana, Dario Figueira, Athira Nambiar, Jacinto Nascimento and Alexandre Bernardino

Abstract: In this work we propose an architecture for fully automated person re-identification in camera networks. Most works on re-identification operate with manually cropped images both for the gallery (training) and the probe (test) set. However, in a fully automated system, re-identification algorithms must work in series with person detection algorithms, whose output may contain false positives, detections of partially occluded people and detections with bounding boxes misaligned to the people. These effects, when left untreated, may significantly jeopardise the performance of the re-identification system. To tackle this problem we propose modifications to classical person detection and re-identification algorithms, which enable the full system to deal with occlusions and false positives. We show the advantages of the proposed method on a fully labelled video data set acquired by 8 high-resolution cameras in a typical office scenario at working hours.

Paper Nr: 135
Title:

Appearance-based Eye Control System by Manifold Learning

Authors:

Ke Liang, Youssef Chahir, Michèle Molina, Charles Tijus and François Jouen

Abstract: Eye-movements are increasingly employed to study usability issues in HCI (Human-Computer Interacetion) contexts. In this paper we introduce our appearance-based eye control system which utilizes 5 specific eye movements, such as closed-eye movement and eye movements with gaze fixation at the positions (up, down, right, left) for HCI applications. In order to measure these eye movements, we employ a fast appeance-based gaze tracking method with manifold learning technique. First we propose to concatenate local eye appearance Center-Symmetric Local Binary Pattern(CS-LBP) descriptor for each subregion of eye image to form an eye appearance feature vector. The calibration phase is then introduced to construct a trainning samples by spectral clustering. After that, Laplacian Eigenmaps will be applied to the trainning set and unseen input together to get the structure of eye manifolds. Finally we can infer the eye movement of the new input by its distances with the clusters in the trainning set. Experimental results demonstrate that our system with quick 4-points calibration not only can reduce the run-time cost, but also provide another way to mesure eye movements without mesuring gaze coordinates to a HCI application such as our eye control system.

Paper Nr: 176
Title:

Shape Segmentation using Medial Point Clouds with Applications to Dental Cast Analysis

Authors:

Jacek Kustra, Andrei Jalba and Alexandru Telea

Abstract: We present an automatic surface segmentation method for dental cast scans based on the point density properties of the surface skeleton of such shapes. We produce quasi-flat segments separated by soft ridges, in contrast to classical surface segmentation methods that require sharp ridges. We compute the surface skeleton by a fast 3D skeletonization technique followed by its regularization using surface geodesics. We segment the resulting skeleton by a mean-shift approach and transfer the segmentation results back to the surface. We demonstrate our results on an industrial dental-cast segmentation application and several generic 3D shape models.

Paper Nr: 261
Title:

Automated Arteriole and Venule Recognition in Retinal Images using Ensemble Classification

Authors:

M. M. Fraz, A. R. Rudincka, C. G. Owen, D. P. Strachan and S. A. Barman

Abstract: The shape and size of retinal vessels have been prospectively associated with cardiovascular outcomes in adult life, and with cardiovascular precursors in early life, suggesting life course patterning of vascular development. However, the shape and size of arterioles and venules may show similar or opposing associations with disease precursors / outcomes. Hence accurate detection of vessel type is important when considering cardio-metabolic influences on vascular health. This paper presents an automated method of identifying arterioles and venules, based on colour features using the ensemble classifier of boot strapped decision trees. The classifier utilizes pixel based features, vessel profile based features and vessel segment based features from both RGB and HIS colour spaces. To the best of our knowledge, the decision trees based ensemble classifier has been used for the first time for arteriole/venule classification. The classification is performed across the entire image, including the optic disc. The methodology is evaluated on 3149 vessel segments from 40 colour fundus images acquired from an adult population based study in the UK (EPIC Norfolk), resulting in 83% detection rate. This methodology can be further developed into an automated system for measurement of arterio-venous ratio and quantification of arterio-venous nicking in retinal images, which may be of use in identifying those at high risk of cardiovascular events, in need of early intervention.

Paper Nr: 276
Title:

Image Registration to Assist the Diagnosis of Pelvic Floor Disorder in MR Defecography

Authors:

Cicero L. Costa, Marcos A. Batista, Denise Guliato, Tulio A. A. Macedo and Celia Z. Barcelos

Abstract: Over the last decades, the interest in the use of the defecography for the investigation of defecation problems and pelvic floor disorders has increased. The MR defecography assists in the diagnosis of pelvic floor weakening, fecal incontinence, painful defection and genital prolapse. To identify an abnormal morphological variation of the structures relevant for the diagnosis, the radiologist derives several static measures at different moments and at different maneuvers during the exam. However, there is a poor agreement between independent observers for the measurement of the anorectal angle, which is a critical parameter for the interpretation of the defecography. With the aim of reducing the inter-observer variability and assisting the radiologist in the interpretation of the MR defecography for the diagnosis of fecal incontinence, we propose calculating dynamic changes of anorectal junction during the defecation activity. To that end we propose to propagate, automatically, the location of pre-defined landmarks throughout the frames of the MR defecography, for each maneuver, via image registration based on variational model. The analysis of the results shows that our proposal was well succeeded in the propagation of the initial landmarks and to calculate the dynamic changes during each maneuver.

Paper Nr: 307
Title:

People Re-identification using Deep Convolutional Neural Network

Authors:

Guanwen Zhang, Jien Kato, Yu Wang and Kenji Mase

Abstract: One key issue for people re-identification is to find good features or representation to bridge the gaps among different appearances of the same people, which is introduced by large variances in view point, illumination and non-rigid deformation. In this paper, we create a deep convolutional neural network (deep CNN) to solve this problem and integrate feature learning and re-identification into one framework. In order to deal with such ranking-like comparison problem, we introduce a linear support vector machine (linear SVM) to replace conventional softmax activation function. Instead of learning cross-entropy loss, we adopt a margin-based loss of pair-wise image to measure the similarity of the comparing pair. Although the proposed model is quite simple, the experimental result shows encouraging performance of our method.

Paper Nr: 375
Title:

Towards a More Effective Way of Presenting Virtual Reality Museums Exhibits

Authors:

Constantinos Terlikkas and Charalambos Poullis

Abstract: In this work, we present the design, development and comparison of two immersive applications with the use of Virtual Reality CAVE technology: a virtual museum following the traditional paradigm for the museum exhibit placement and a virtual museum where no spatial restrictions exist. Our goal is to identify the most effective method of arranging museum exhibits when no constraints are present. Additionally we will present the significance of the folklore museum in cyprus. Since this would affect the design process.

Paper Nr: 376
Title:

A QoS Control Method for Camera Network based People Detection Systems

Authors:

Toru Abe, Adrian Agusta, Yuto Mitsuhashi and Takuo Suganuma

Abstract: Various people detection systems based on camera networks have been developed, and their services (output of users’ locations) are utilized in a variety of applications. Usually, each application requires a people detection system to keep its quality-of-service (QoS) at a certain level. However, required system QoS levels vary widely among different applications, and the QoS requirements of each application range over various QoS factors, such as the coverage area, resolution, and frequency of users’ locations. Moreover, the tradeoff between QoS factors arises from limitations on the system resources, which fluctuate due to changes in circumstances. Consequently, it is difficult for such systems to stably fulfill the diverse QoS requirements of individual applications. To deal with these difficulties, we propose a QoS control method for camera network based people detection systems. Taking into account the trade-off between several QoS factors under limited and varied system resources, our method dynamically adjusts system parameters and controls system QoS to provide each application with users’ locations at a required QoS level. Experimental results indicate that our method well maintain system QoS for the changes in application requirements and system resources.

Paper Nr: 381
Title:

Effects of Stereoscopy on a Human-Computer Interface for Network Centric Operations

Authors:

Alessandro Zocco, Davide Cannone and Lucio Tommaso De Paolis

Abstract: Network Centric Warfare can be accomplished thanks to a network of geographically distributed forces, granting a flow of increased contents, quality and timeliness of information, building up a shared situational awareness. When this flow is displayed to an operator, there is the possibility of reaching a state of information overload. To avoid this situation, new ways to conceive the interface between human and computer must be evaluated. This paper proposes an experimental stereoscopic 3D synthetic environment aimed to improve the understanding of the modern battle spaces. This facility is part of the LOKI Project, a Command and Control system for Electronic Warfare developed by Elettronica S.p.A. We discuss technical details of the system and describe a preliminary usability study. This first evaluation is very positive and encourages continuing research into Human-Computer Interaction for military applications.

Posters
Paper Nr: 27
Title:

Grasping Guidance for Visually Impaired Persons based on Computed Visual-auditory Feedback

Authors:

Michael Hild and Fei Cheng

Abstract: We propose a system for guiding a visually impaired person toward a target product on a store shelf using visual–auditory feedback. The system uses a hand–held, monopod–mounted CCD camera as its sensor and recognizes a target product in the images using sparse feature vector matching. Processing is divided into two phases: In Phase1, the system acquires an image, recognizes the target product, and computes the product location on the image. Based on the location data, it issues a voice–based command to the user in response to which the user moves the camera closer toward the target product and adjusts the direction of the camera in order to keep the target product in the camera’s field of view. When the user’s hand has reached grasping range, the system enters Phase 2 in which it guides the user’s hand to the target product. The system is able to keep the camera’s direction steady during grasping even though the user has a tendency of unintentionally rotating the camera because of the twisting of his upper body while reaching out for the product. Camera direction correction is made possible due to utilization of a digital compass attached to the camera. The system is also able to guide the user’s hand right in front of the product even though the exact product position cannot be determined directly at the last stage because the product disappears behind the user’s hand. Experiments with our prototype system show that system performance is highly reliable in Phase 1 and reasonably reliable in Phase 2.

Paper Nr: 37
Title:

Multilevel Group Analysis on Bayesian in fMRI Time Series

Authors:

Feng Yang, Kuang Fu and Ai Zhou

Abstract: This paper suggests one method to process fMRI time series based on Bayesian inference for group analysis. The method uses multilevel divided by session, subject and group as pair comparison to reinforce posterior probability in group analysis from single subjects as priors. And also it combines classical statistics, i.e., t-test to obtain voxel activation at subject level as prior for Bayesian inference at group level. It effectively solved computation expensive and complexity. And it shows robust on Bayesian inference for group analysis.

Paper Nr: 53
Title:

Automated Segmentation of Cell Structure in Microscopy Images

Authors:

Nicole Kerrison and Andy Bulpitt

Abstract: Understanding cell movement is important in helping to prevent and cure damage and disease. Increasingly, this study is performed by obtaining video footage of cells in vitro. However, as the number of images obtained for cellular analysis increases, so does the need for automated segmentation of these images, since this is difficult and time consuming to perform manually. We propose to automate the process of segmenting all parts of a cell visible in DIC microscopy video frames by providing an efficient method for correcting the lighting bias and a novel combination of techniques to detect different cell areas and isolate parts of the cell vital to their movement. To the best of our knowledge we contribute the only method able to automatically detect the thin cellular membranes in DIC images. We show that the method can be used to isolate features in order to detect variations vital to motility in differently affected cells.

Paper Nr: 56
Title:

Online Brain Tissue Classification in Multiple Sclerosis using a Scanner-integrated Image Analysis Pipeline

Authors:

Refaat E. Gabr, Amol Pednekar, Xiaojun Sun and Ponnada A. Narayana

Abstract: With recent advances in the field, magnetic resonance imaging (MRI) has become a powerful quantitative imaging modality for the study of neurological disorders. The quantitative power of MRI is significantly enhanced with multi-contrast and high-resolution techniques. However, those techniques generate large volumes of data which, combined with the sophisticated state-of-the-art image analysis methods, result in a very high computational load. In order to keep the scanner workflow uninterrupted, processing has to be performed off-line leading to delayed access to the quantitative results. This time delay also precludes the evaluation of data quality, and prevents the care giver from using the results of quantitative analysis to guide subsequent studies. We developed a scanner-integrated system for fast online processing of dual-echo fast spin-echo and fluid-attenuated inversion recovery images to quickly classify different brain tissues and generate white matter lesion maps in patients with multiple sclerosis (MS). The segmented tissues were imported back into the patient database on the scanner for clinical interpretation by the radiologist. The analysis pipeline included rigid-body registration, skull stripping, nonuniformity correction, and tissue segmentation. In six MS patients, the average time taken by the processing pipeline to the final segmentation of the brain into white matter, grey matter, cerebrospinal fluid, and white matter lesions was ~2 min, making it feasible to generate lesion maps immediately after the scan.

Paper Nr: 68
Title:

Multi Touch Shape Recognition for Projected Capacitive Touch Screen

Authors:

I. Guarneri, A. Capra, G. M. Farinella, F. Cristaldi and S. Battiato

Abstract: Devices equipped by touch screens are nowadays widely diffused. One of the most meaningful factor which leads to this success is their easy and intuitive interface which allows a friendly user-device interaction. Touch shape recognition is a topic which has contributed to the realization of these types of interfaces. In this paper we propose a solution able to discriminate among different classes of touch shapes. We focus on the problem of recognizing typical touches performed in mid-sized devices as tablets and phablets. The proposed solution discriminates among single finger, multiple fingers and palm by reaching high recognition accuracy and maintaining a low computational complexity.

Paper Nr: 144
Title:

Remote Execution vs. Simplification for Mobile Real-time Computer Vision

Authors:

Philipp Hasper, Nils Petersen and Didier Stricker

Abstract: Mobile implementations of computationally complex algorithms are often prohibitive due to performance constraints. There are two possible solutions for this: (1) adopting a faster but less powerful approach which results in a loss of accuracy or robustness. (2) using remote data processing which suffers from limited bandwidth and communication latencies and is difficult to implement in real-time interactive applications. Using the example of a mobile Augmented Reality application, we investigate those two approaches and compare them in terms of performance. We examine different workload balances ranging from extensive remote execution to pure onboard processing. The performance behavior is systematically analyzed under different network qualities and device capabilities. We found that even with a fast network connection, optimizing for maximum offload (thin-client configuration) is at a disadvantage compared to splitting the workload between remote system and client. Compared to remote execution, a simplified onboard algorithm is only preferable if the classification data set is below a certain size.

Paper Nr: 189
Title:

Kinect based People Identification System using Fusion of Clustering and Classification

Authors:

Aniruddha Sinha, Diptesh Das, Kingshuk Chakravarty, Amit Konar and Sudeepto Dutta

Abstract: The demand of human identification in a non-intrusive manner has risen increasingly in recent years. Several works have already been done in this context using gait-cycle detection from human skeleton data using Microsoft Kinect as a data capture sensor. In this paper we have proposed a novel method for automatic human identification in real time using the fusion of both supervised and unsupervised learning on gait-based features in an efficient way using Dempster-Shafer (DS) theory. Performance comparison of the proposed fusion based algorithm is done with that of the standard supervised or unsupervised algorithm and it needs to be mentioned that the proposed algorithm is able to achieve 71% recognition accuracy.

Paper Nr: 204
Title:

Hand Recognition using Texture Histograms - A Proposed Technique for Image Acquisition and Recognition of the Human Palm

Authors:

Luiz Antônio Pereira Neves, Dionei José Müller, Fellipe Alexandre, Pedro Machado Guillen Trevisani, Pedro Santos Brandi and Rafael Junqueira

Abstract: This paper presents a technique for biometric identification based on image acquisition of the palm of the human hand. A computer system called Palm Print Authentication System (PPAS) was implemented using the technique exposed, it identifies human hand palm by processing image data through texture identification and geometrical data by employing the Local Binary Pattern (LBP) method. The methodology proposed has four steps: image acquisition; image pre-processing (normalization), and segmentation for biometric extraction and hand recognition. The technique has been tested utilizing 50 different images and the tests have proven promising results, showing that the approach is not only robust but also quite efficient.

Paper Nr: 250
Title:

Interfacing Assessment using Facial Expression Recognition

Authors:

Rune A. Andersen, Kamal Nasrollahi, Thomas B. Moeslund and Mohammad A. Haque

Abstract: One of the most important issues in gaming is deciding about the employed interfacing technology. Gamepad has traditionally been a popular interfacing technology for the gaming industry, but, recently motion controlled interfacing has been used widely in this industry. This is exactly the purpose of this paper to study whether the motion controlled interface is a feasible alternative to the gamepad, when evaluated from a user experience point of view. To do so, a custom game has been developed and 25 test subjects have been asked to play the game using both types of interfaces. To evaluate the users experiences during the game, their hedonic and pragmatic quality are assessed using both subjective and objective evaluation methods in order to crossvalidate the obtained results. An application of computer vision, facial expression recognition, has been used as a non-obtrusive objective and hedonic measure. While, the score obtained by the user during the game has been used as a pragmatic quality measure. The use of facial expression recognition has, to the best of our knowledge, not been used before to assess the hedonic quality of interfaces for games. The thorough experimental results show that the user experience of the motion controlled interface is significantly better than the gamepad interface, both in terms of hedonic and pragmatic quality. The facial expression recognition system proved to be a useful non-obtrusive way to objectively evaluate the hedonic quality of the interfacing technologies.

Paper Nr: 285
Title:

A General-purpose Crowdsourcing Platform for Mobile Devices

Authors:

Ariel Amato, Felipe Lumbreras and Angel D. Sappa

Abstract: This paper presents details of a general purpose micro-task on-demand platform based on the crowdsourcing philosophy. This platform was specifically developed for mobile devices in order to exploit the strengths of such devices; namely: i) massivity, ii) ubiquity and iii) embedded sensors. The combined use of mobile platforms and the crowdsourcing model allows to tackle from the simplest to the most complex tasks. Users experience is the highlighted feature of this platform (this fact is extended to both task-proposer and tasksolver). Proper tools according with a specific task are provided to a task-solver in order to perform his/her job in a simpler, faster and appealing way. Moreover, a task can be easily submitted by just selecting predefined templates, which cover a wide range of possible applications. Examples of its usage in computer vision and computer games are provided illustrating the potentiality of the platform.

Paper Nr: 370
Title:

An Application to Interact with 3D Models Reconstructed from Medical Images

Authors:

Félix Paulano, Juan J. Jiménez and Rubén Pulido

Abstract: Although the reconstruction of 3D models from medical images is not an easy task, there are many algorithms to perform it. However, the reconstructed models are usually large, have a lot of outliers and have not a correct topology. To interact with these models, the methods must be fast and robust. In this paper, we present an application that enables the interaction with models reconstructed from medical images. The application uses Marching Cubes to generate triangle soups from the medical scans. Then, the user can define models by selecting sets of triangles. Once the models have been defined, the application allows to interact with them. In addition, a detailed collision detection is calculated between the models in the scene not only to avoid that models in the scene collide, but also to determine which triangles are overlapping. In addition, the calculation of distances and nearest points provides visual aid when the user is interacting with the models. Finally, the Leonar3Do system have been incorporated to improve the interaction and to provide stereo visualization. The presented application can be used in the field of education since users can manipulate individual body parts to examine them. Moreover, the application can be utilized in the preparation of an intervention or even as a guide for it, since it enables the utilization of models reconstructed from real medical scans.

Paper Nr: 374
Title:

Immersive Visualizations in a VR Cave Environment for the Training and Enhancement of Social Skills for Children with Autism

Authors:

Skevi Matsentidou and Charalambos Poullis

Abstract: Autism is a complex developmental disorder characterized by severe impairment in social, communicative, cognitive and behavioral functioning. Several studies investigated the use of technology and Virtual Reality for social skills training for people with autism with promising and encouraging results (D. Strickland, 1997; Parsons S. & Cobb S., 2011). In addition, it has been demonstrated that Virtual Reality technologies can be used effectively by some people with autism, and that it had helped or could help them in the real world; (S. Parsons, A. Leonard, P. Mitchell, 2006; S. Parsons, P. Mitchell, 2002). The goal of this research is to design and develop an immersive visualization application in a VR CAVE environment for educating children with autism. The main goal of the project is to help children with autism learn and enhance their social skills and behaviours. Specifically, we will investigate whether a VR CAVE environment can be used in an effective way by children with mild autism, and whether children can benefit from that and apply the knowledge in their real life.

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 23
Title:

3D Object Recognition based on the Reference Point Ensemble

Authors:

Toshiaki Ejima, Shuichi Enokida, Toshiyuki Kouno, Hisashi Ideguchi and Tomoyuki Horiuchi

Abstract: In the present paper, we have proposed a high-performance 3D recognition method based on the reference point ensemble, which is a natural extension of the generalized Hough transform. The reference point ensemble consists of several reference points, each of which is color-coded by green or red, where the red reference points are used to verify the hypothesis, and the green reference points are used for Hough voting. The configuration of the reference points in the reference point ensemble is designed depending on the model shape. In the proposed method, a set of reference point ensembles is generated by the local features of a given 3D scene. Each generated reference point ensemble is a hypothetical 3D pose of a given object in the scene. Hypotheses passing through the verification by the red reference points are used for Hough voting. Hough voting is performed independently in each green point space, which reduces the voting space to three dimensions. Although a six-dimensional voting space is generally needed for 3D recognition, in the proposed method, the six-dimensional voting space is decomposed into a few three-dimensional spaces. This decomposition and the verification using green or red reference points have been demonstrated experimentally to be effective for 3D recognition. In other words, the effective recognition has been achieved by skillfully switching the following two different modes. (A) Individual mode: Voting of the hypothesis independently in each green Hough space and verifying of hypothesis with red reference points are done in this mode. (B) Ensemble mode : Verifying of registration into PHL(promising hypothesis list) and aggregating of total votes are done in this mode. This mode switching mechanism is the most significant characteristic of the proposed method.

Paper Nr: 71
Title:

Likelihood Functions for Errors-in-variables Models - Bias-free Local Estimation with Minimum Variance

Authors:

Kai Krajsek, Christian Heinemann and Hanno Scharr

Abstract: Parameter estimation in the presence of noisy measurements characterizes a wide range of computer vision problems. Thus, many of them can be formulated as errors-in-variables (EIV) problems. In this paper we provide a closed form likelihood function to EIV problems with arbitrary covariance structure. Previous approaches either do not offer a closed form, are restricted in the structure of the covariance matrix, or involve nuisance parameters. By using such a likelihood function, we provide a theoretical justification for well established estimators of EIV models. Furthermore we provide two maximum likelihood estimators for EIV parameters, a straight forward extension of a well known estimator and a novel, local estimator, as well as confidence bounds by means of the Cramer Rao Lower Bound. We show their performance by numerical experiments on optical flow estimation, as it is well explored and understood in literature. The straight forward extension turned out to have oscillating behavior, while the novel, local one performs favorably with respect to other methods. For small motions, it even performs better than an excellent global optical flow algorithm on the majority of pixel locations.

Paper Nr: 81
Title:

Feature Matching using CO-Inertia Analysis for People Tracking

Authors:

Srinidhi Mukanahallipatna Simha, Duc Phu Chau and Francois Bremond

Abstract: Robust object tracking is a challenging computer vision problem due to dynamic changes in object pose, illumination, appearance and occlusions. Tracking objects between frames requires accurate matching of their features. We investigate real time matching of mobile object features for frame to frame tracking. This paper presents a new feature matching approach between objects for tracking that incorporates one of the multivariate analysis method called Co-Inertia Analysis abbreviated as COIA. This approach is being introduced to compute the similarity between Histogram of Oriented Gradients (HOG) features of the tracked objects. Experiments conducted shows the effectiveness of this approach for mobile object feature tracking.

Paper Nr: 82
Title:

Image-based Modelling of Ocean Surface Circulation from Satellite Acquisitions

Authors:

Dominique Béréziat and Isabelle Herlin

Abstract: Satellite image sequences permit to visualise oceans’ surface and their underlying dynamics. Processing these images is then of major interest in order to better understanding of the observed processes. As demonstrated by state-of-the-art, image assimilation allows to retrieve surface motion from image sequences, based on assumptions on the dynamics. In this paper we demonstrate that a simple heuristics, such as the Lagrangian constancy of velocity, can be used, and successfully replaces the complex physical properties described by the Navier-Stokes equations, for assessing surface circulation from satellite images. A data assimilation method is proposed that includes an additional term a(t) to this Lagrangian constancy equation. That term summarises all physical processes other than advection. A cost function is designed, which quantifies discrepancy between satellite data and model values. The cost function is minimised by the BFGS solver with a dual method of data assimilation. The result is the motion field and the additional term a(t). This last component models the forces, other than advection, that contribute to surface circulation. The approach has been tested on Sea Surface Temperature of Black Sea. Results are given on four image sequences and compared with state-of-the-art methods.

Paper Nr: 84
Title:

3D Head Model Fitting Evaluation Protocol on Synthetic Databases for Acquisition System Comparison

Authors:

Catherine Herold, Vincent Despiegel, Stéphane Gentric, Séverine Dubuisson and Isabelle Bloch

Abstract: Automatic face recognition has been integrated in many systems thanks to the improvement of face comparison algorithms. One of the main applications using facial biometry is the identity authentication at border control, which has already been adopted by a lot of airports. In order to proceed to a fast identity control, gates have been developed, to extract the ID document information on the one hand, and to acquire the facial information of the user on the other hand. The design of such gates, and in particular their camera configuration, has a high impact on the output acquisitions and therefore on the quality of the extracted facial features. Since it is very difficult to validate such gates by testing different configurations on real data in exactly the same conditions, we propose a validation protocol based on simulated passages. This method relies on synthetic sequences, which can be generated using any camera configuration with fixed parameters of identities and poses, and can also integrate different lighting conditions. We detail this methodology and present results in terms of geometrical error obtained with different camera configurations, illustrating the impact of the gate design on the 3D head fitting accuracy, and hence on facial authentication performances.

Paper Nr: 87
Title:

Quality Assessment of Compressed Video for Automatic License Plate Recognition

Authors:

Anna Ukhanova, Jesper Støttrup-Andersen, Søren Forchhammer and John Madsen

Abstract: Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC standards on the recognition performance. We compare logarithmic and logistic functions for quality modeling. Our results show that a logistic function can better describe the dependence of recognition performance on the quality for both compression standards. We observe that automatic license plate recognition in our study has a behavior similar to human recognition, allowing the use of the same mathematical models. We furthermore propose an application of one of the models for video surveillance systems.

Paper Nr: 98
Title:

Perception-prediction-control Architecture for IP Pan-Tilt-Zoom Camera through Interacting Multiple Models

Authors:

Pierrick Paillet, Romaric Audigier, Frederic Lerasle and Quoc-Cuong Pham

Abstract: IP Pan-Tilt-Zoom cameras (IP PTZ) are now common in videosurveillance areas as they are easy to deploy and can take high resolution pictures of targets in a large field of view thanks to their pan-tilt and zoom capabilities. However the closer the view is, the higher is the risk to lose your target. Furthermore, off-the-shelf cameras used in large videosurveillance areas present important motion delays. In this paper, we suggest a new motion control architecture that manages tracking and zoom delays by an Interacting Multiple Models analysis of the target motion, increasing tracking performances and robustness.

Paper Nr: 104
Title:

3D Reconstruction with Mirrors and RGB-D Cameras

Authors:

Abdullah Akay and Yusuf Sinan Akgul

Abstract: RGB-D cameras such as Microsoft's Kinect have found many application areas in robotics, 3D modelling and indoor vision due to their low-costs and ease of use. 3D reconstruction with RGB-D cameras is relatively more convenient because they provide RGB and depth data simultaneously for each image element. However, for a full 3D reconstruction of a scene, a single fixed RGB-D camera is inadequate and using multiple cameras brings many challenges with them, such as bandwidth limitations and synchronization. To overcome these difficulties, we propose a solution that employs mirrors to introduce virtual RGB-D cameras into the system. The proposed system does not have any space limitations, data bandwidth constraints, synchronization problems and it is cheaper because we do not require extra cameras. We develop formulations for the simultaneous calibration of real and virtual RGB and RGB-D cameras and we also provide methods for 3D reconstruction from these cameras. We conduct several experiments to assess our system; numerical and visual results are found satisfying.

Paper Nr: 141
Title:

A Bayesian Framework for Enhanced Geometric Reconstruction of Complex Objects by Helmholtz Stereopsis

Authors:

Nadejda Roubtsova and Jean-Yves Guillemaut

Abstract: Helmholtz stereopsis is an advanced 3D reconstruction technique for objects with arbitrary reflectance properties that uniquely characterises surface points by both depth and normal. Traditionally, in Helmholtz stereopsis consistency of depth and normal estimates is assumed rather than explicitly enforced. Furthermore, conventional Helmholtz stereopsis performs maximum likelihood depth estimation without neighbourhood consideration. In this paper, we demonstrate that reconstruction accuracy of Helmholtz stereopsis can be greatly enhanced by formulating depth estimation as a Bayesian maximum a posteriori probability problem. In reformulating the problem we introduce neighbourhood support by formulating and comparing three priors: a depth-based, a normal-based and a novel depth-normal consistency enforcing one. Relative performance evaluation of the three priors against standard maximum likelihood Helmholtz stereopsis is performed on both real and synthetic data to facilitate both qualitative and quantitative assessment of reconstruction accuracy. Observed superior performance of our depth-normal consistency prior indicates a previously unexplored advantage in joint optimisation of depth and normal estimates.

Paper Nr: 188
Title:

A Graph-based MAP Solution for Multi-person Tracking using Multi-camera Systems

Authors:

Xiaoyan Jiang, Marco Körner, Daniel Haase and Joachim Denzler

Abstract: Accurate multi-person tracking under complex conditions is an important topic in computer vision with various application scenarios such as visual surveillance. Taking into account the difficulties caused by 2D occlusions, missing detections, and false positives, we propose a two-stage graph-based object tracking-by-detection approach using multiple calibrated cameras. Firstly, data association is formulated into a maximum a posteriori (MAP) problem. After transformation, we show that this single MAP problem is equivalent of finding min-cost paths in a two-stage directed acyclic graph. The first graph aims to extract an optimal set of tracklets based on the hypotheses on the ground plane by using both 2D appearance feature and 3D spatial distances. Subsequently, the tracklets are linked into complete tracks in the second graph utilizing spatial and temporal distances. This results in a global optimization over all the 2D detections obtained from multiple cameras. Finally, the experimental results on three difficult sequences of the PETS’09 dataset with comparison to the state-of-the-art methods show the precision and consistency of our approach.

Paper Nr: 208
Title:

Robust Pictorial Structures for X-ray Animal Skeleton Tracking

Authors:

Manuel Amthor, Daniel Haase and Joachim Denzler

Abstract: The detailed understanding of animals in locomotion is a relevant field of research in biology, biomechanics and robotics. To examine the locomotor system of birds in vivo and in a surgically non-invasive manner, high-speed X-ray acquisition is the state of the art. For a biological evaluation, it is crucial to locate relevant anatomical structures of the locomotor system. There is an urgent need for automating this task, as vast amounts of data exist and a manual annotation is extremely time-consuming. We present a biologically motivated skeleton model tracking framework based on a pictorial structure approach which is extended by robust sub-template matching. This combination makes it possible to deal with severe self-occlusions and challenging ambiguities. As opposed to model-driven methods which require a substantial amount of labeled training samples, our approach is entirely data-driven and can easily handle unseen cases. Thus, it is well suited for large scale biological applications at a minimum of manual interaction. We validate the performance of our approach based on 24 real-world X-ray locomotion datasets, and achieve results which are comparable to established methods while clearly outperforming more general approaches.

Paper Nr: 228
Title:

Improved ICP-based Pose Estimation by Distance-aware 3D Mapping

Authors:

Hani Javan Hemmat, Egor Bondarev, Gijs Dubbelman and Peter H. N. de With

Abstract: In this paper, we propose and evaluate various distance-aware weighting strategies to increase the accuracy of pose estimation by improving the accuracy of a voxel-based model, generated from the data obtained by low-cost depth sensors. We investigate two strategies: (a) weight definition to prioritize prominence of the sensed data according to the data accuracy, and (b) model updating to determine the influential level of the newly captured data on the existing synthetic 3D model. Specifically, we propose Distance-Aware (DA) and Distance-Aware Slow-Saturation (DASS) updating methods to intelligently integrate the depth data into the 3D model, according to the distance-sensitivity metric of a low-cost depth sensor. We validate the proposed methods by applying them to a benchmark of datasets and comparing the resulting pose trajectories to the corresponding ground-truth. The obtained improvements are measured in terms of Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) and compared against the performance of the original Kinfu. The validation shows that on the average, our most promising method called DASS, leads to a pose estimation improvement in terms of ATE and RPE by 43.40% and 48.29%, respectively. The method shows robust performance for all datasets, with best-case improvement reaching 90% of pose-error reduction.

Paper Nr: 234
Title:

Shape from Silhouette in Space, Time and Light Domains

Authors:

Maxim Mikhnevich and Denis Laurendeau

Abstract: This paper presents an image segmentation approach for obtaining a set of silhouettes along with the Visual Hull of an object observed from multiple viewpoints. The proposed approach can deal with mostly any type of appearance characteristics such as textured or textureless, shiny or lambertian surface reflectance, opaque or transparent objects. Compared to more classical methods for silhouette extraction from multiple views, for which certain assumptions are made on the object or scene, neither the background nor the object’s appearance properties are modeled. The only assumption is the constancy of the unknown background at a given camera viewpoint while the object is under motion. The principal idea of the method is the estimation of the temporal evolution of each pixel over time which leads to the ability to estimate the background likelihood. Furthermore, the object is captured under different lighting conditions in order to cope with shadows. All the information from the space, time and lighting domains is merged based on a MRF framework and the constructed energy function is minimized via graph cuts.

Paper Nr: 247
Title:

Free Viewpoint Video for Soccer using Histogram-based Validity Maps in Plane Sweeping

Authors:

Patrik Goorts, Steven Maesen, Maarten Dumont, Sammy Rogmans and Philippe Bekaert

Abstract: In this paper, we present a method to accomplish free viewpoint video for soccer scenes. This will allow the rendering of a virtual camera, such as a virtual rail camera, or a camera moving around a frozen scene. We use 7 static cameras in a wide baseline setup (10 meters apart from each other). After debayering and segmentation, a crude depth map is created using a plane sweep approach. Next, this depth map is filtered and used in a second, depth-selective plane sweep by creating validity maps per depth. The complete method employs NVIDIA CUDA and traditional GPU shaders, resulting in a fast and scalable solution. The results, using real images, show the effective removal of artifacts, yielding high quality images for a virtual camera.

Paper Nr: 265
Title:

Control of a PTZ Camera in a Hybrid Vision System

Authors:

Francois Rameau, Cédric Demonceaux, Désiré Sidibé and David Fofi

Abstract: In this paper, we propose a new approach to steer a PTZ camera in the direction of a detected object visible from another fixed camera equipped with a fisheye lens. This heterogeneous association of two cameras having different characteristics is called a hybrid stereo-vision system. The presented method employs epipolar geometry in a smart way in order to reduce the range of search of the desired region of interest. Furthermore, we proposed a target recognition method designed to cope with the illumination problems, the distortion of the omnidirectional image and the inherent dissimilarity of resolution and color responses between both cameras. Experimental results with synthetic and real images show the robustness of the proposed method.

Paper Nr: 283
Title:

Generalized Preemptive RANSAC - Making Preemptive RANSAC Feasible even in Low Resources Devices

Authors:

Severino Gomes-Neto and Bruno M. Carvalho

Abstract: This paper examines a generalized version of Preemptive RANSAC for visual motion estimation. The approach described employs the BRUMA function for dealing with varying block sizes and the percentages of hypotheses to be removed during the hypotheses rejection phase. The generation of a flexible number of hypotheses is also performed in order to balance the preemption scheme. Experiments were performed for both forward and side-wise motions in synthetic environment by using simulation and the ground-truth used to compare the Standard Preemptive RANSAC and its generalized version. Simulations confirmed that the quality of the results produced by the Standard Preemptive RANSAC degrade as the hardware resources used are decreased, as opposed to the results produced by the Generalized Preemptive RANSAC, with the results of the Standard Preemptive RANSAC having errors up to eleven times larger than the Generalized Preemptive RANSAC.

Short Papers
Paper Nr: 42
Title:

Extrinsic Parameter Self-Calibration and Nonlinear Filtering for in-Vehicle Stereo Vision Systems at Urban Environments

Authors:

Basam Musleh, David Martin, Jose Maria Armingol and Arturo De La Escalera

Abstract: Present work analyses the continuous self-calibration of extrinsic parameters of a stereo vision system for safe visual odometry applications in vehicles at urban environments. The calibration method determines the extrinsic parameters of a stereo vision system based on knowing the geometry of the ground in front of the cameras. The slight changes of the road profile cause variations in the extrinsic parameters of stereo rig that are necessary to filter and maintain between tolerance values. Then, height, pitch and roll parameters are filtered, to eliminate pose outliers of the stereo rig that appear when a vehicle is maneuvering. The reliable approach at urban environment is firstly composed of the calculation of the road profile slope, the theoretical horizon, and the slope of the straight line in the free map. Secondly, the nonlinear filtering is applied using Unscented Kalman Filter to improve the estimation of height, pitch and roll parameters.

Paper Nr: 65
Title:

Local Analysis of Confidence Measures for Optical Flow Quality Evaluation

Authors:

Patricia Márquez-Valle, Debora Gil, Rudolf Mester and Aura Hernàndez-Sabaté

Abstract: Optical Flow (OF) techniques facing the complexity of real sequences have been developed in the last years. Even using the most appropriate technique for our specific problem, at some points the output flow might fail to achieve the minimum error required for the system. Confidence measures computed from either input data or OF output should discard those points where OF is not accurate enough for its further use. It follows that evaluating the capabilities of a confidence measure for bounding OF error is as important as the definition itself. In this paper we analyze different confidence measures and point out their advantages and limitations for their use in real world settings. We also explore the agreement with current tools for their evaluation of confidence measures performance.

Paper Nr: 74
Title:

Enhanced 3D Face Processing using an Active Vision System

Authors:

Morten Lidegaard, Rasmus F. Larsen, Dirk Kraft, Jeppe B. Jessen, Richard Beck, Thiusius R. Savarimuthu, Claus Gramkow, Ole K. Neckelmann, Jonas Haustad and Norbert Krüger

Abstract: We present an active face processing system based on 3D shape information extracted by means of stereo information. We use two sets of stereo cameras with different field of views (FOV): One with a wide FOV is used for face tracking, while the other with a narrow FOV is used for face identification. We argue for two advantages of such a system: First, an extended work range, and second, the possibility to place the narrow FOV camera in a way such that a much better reconstruction quality can be achieved compared to a static camera even if the face had been fully visible in the periphery of the narrow FOV camera. We substantiate these two observations by qualitative results on face reconstruction and quantitative results on face recognition. As a consequence, such a set-up allows to achieve better and much more flexible system for 3D face reconstruction e.g. for recognition or emotion estimation based on the characteristics of a given face.

Paper Nr: 77
Title:

Fast Semi-automatic Target Initialization based on Visual Saliency for Airborne Thermal Imagery

Authors:

Çağlar Aytekin, Emre Tunalı and Sinan Öz

Abstract: In this study, a semi-automatic target initialization algorithm is introduced based on a recently proposed visual saliency approach. First, a center-surround difference based initial window selection is utilized around the input point coordinate provided by the user, in order to select the window which is most likely to contain the actual target and background satisfying piecewise connectivity. Then, a recently proposed visual saliency algorithm is exploited in order to detect the bounding box encapsulating the most salient part of the object. The experiments support that the saliency based tracking window initialization is capable of handling marking errors, i.e. erroneous user inputs, and boosts the performance of several tracking algorithms in terms of the number of frames in which successful tracking is achieved, when compared with several fixed window size initializations.

Paper Nr: 97
Title:

Human Body Orientation Estimation using a Committee based Approach

Authors:

Manuela Ichim, Robby T. Tan, Nico van der Aa and Remco Veltkamp

Abstract: Human body orientation estimation is useful for analyzing the activities of a single person or a group of people. Estimating body orientation can be subdivided in two tasks: human tracking and orientation estimation. In this paper, the second task of orientation estimation is accomplished by using HoG descriptors and other cues such as the velocity direction, the presence of face, and temporal smoothness. Three different classifiers: Gaussian Mixture Model, Neural Network and Support Vector Machine, are combined with the information from those cues to form a committee. The performance of the method is evaluated and the contribution to the final prediction of each classifier is assessed. Overall, the performance of the proposed approach outperforms the state-of-the-art method, both in terms of estimation accuracy, as well as computation time.

Paper Nr: 100
Title:

Map-based Lane and Obstacle-free Area Detection

Authors:

T. Kowsari, S. S. Beauchemin and M. A. Bauer

Abstract: With the emergence of intelligent Advanced Driving Assistance Systems (i-ADAS), the need for effective detection of vehicular surroundings is considered a necessity. The effectiveness of such systems directly depends on their performance in various environments such as rural and urban roads, and highways. Most of the current lane detection techniques are not suitable for urban roads with complex lane shapes and frequent occlusions. We propose a map-based lane detection approach which can robustly detect the lanes in urban and rural environments, and highways. We also present an algorithm for detecting obstacle-free areas in detected lanes based on the stereo depth maps of driving scenes. Experiments show that our approach reliably detects lanes and obstacle free areas within them, even in case of partially occluded or worn-off lane markers.

Paper Nr: 129
Title:

Multiview Point Cloud Filtering for Spatiotemporal Consistency

Authors:

Robert Skupin, Thilo Borgmann and Thomas Sikora

Abstract: This work presents algorithms to resample and filter point cloud data reconstructed from multiple cameras and multiple time instants. In an initial resampling stage, a voxel or a surface mesh based approach resamples the point cloud data into a common sampling grid. Subsequently, the resampled data undergoes a filtering stage based on clustering to remove artifacts and achieve spatiotemporal consistency across cameras and time instants. The presented algorithms are evaluated in a view synthesis scenario. Results show that view synthesis with enhanced depth maps as produced by the algorithms leads to less artifacts than synthesis with the original source data.

Paper Nr: 130
Title:

Towards Robust Image Registration for Underwater Visual SLAM

Authors:

Antoni Burguera, Francisco Bonin-Font and Gabriel Oliver

Abstract: This paper proposes a simple and practical approach to perform underwater visual SLAM. The proposal improves the traditional EKF-SLAM by adopting a Trajectory-based schema that reduces the computational requirements. Linearization errors are also reduced by means of an IEKF. One of the most important parts of the proposed SLAM approach is robust image registration, which is used in the data association step making it possible to close loops reliably. Thanks to that, as shown in the experiments, the presented approach provides accurate pose estimates using both a simulated robot and a real one.

Paper Nr: 138
Title:

Dense Long-term Motion Estimation via Statistical Multi-step Flow

Authors:

Pierre-Henri Conze, Philippe Robert, Tomás Crivelli and Luce Morin

Abstract: We present statistical multi-step flow, a new approach for dense motion estimation in long video sequences. Towards this goal, we propose a two-step framework including an initial dense motion candidates generation and a new iterative motion refinement stage. The first step performs a combinatorial integration of elementary optical flows combined with a statistical candidate displacement fields selection and focuses especially on reducing motion inconsistency. In the second step, the initial estimates are iteratively refined considering several motion candidates including candidates obtained from neighboring frames. For this refinement task, we introduce a new energy formulation which relies on strong temporal smoothness constraints. Experiments compare the proposed statistical multi-step flow approach to state-of-the-art methods through both quantitative assessment using the Flag benchmark dataset and qualitative assessment in the context of video editing.

Paper Nr: 162
Title:

Feature Evaluation and Management for Camera Pose Tracking on 3D Models

Authors:

Martin Schumann, Jan Hoppenheit and Stefan Müller

Abstract: Our tracking approach uses feature evaluation and management to estimate the camera pose on the camera image and a given geometric model. The aim is to gain a minimal but qualitative set of 2D image line and 3D model edge correspondences to improve accuracy and computation time. Reducing the amount of feature data makes it possible to use any complex model for tracking. Additionally, the presence of a 3D model delivers useful information to predict reliable features which can be matched in the camera image with high probability avoiding possible false matches. Therefore, a quality measure is defined to evaluate and select features best fitted for tracking upon criteria from rendering process and knowledge about the environment like geometry and topology, perspective projection, light and matching success feedback. We test the feature management to analyze the importance and influence of each quality criterion on the tracking and to find an optimal weighting.

Paper Nr: 167
Title:

Face Pose Tracking under Arbitrary Illumination Changes

Authors:

Ahmed Rekik, Achraf Ben-Hamadou and Walid Mahdi

Abstract: This paper presents a new method for 3D face pose tracking in arbitrary illumination change conditions using color image and depth data acquired by RGB-D cameras (e.g., Microsoft Kinect, Asus Xtion Pro Live, etc.). The method is based on an optimization process of an objective function combining photometric and geometric energy. The geometric energy is computed from depth data while the photometric energy is computed at each frame by comparing the current face texture to its corresponding in the reference face texture defined in the first frame. To handle the effect of changing lighting condition, we use a facial illumination model in order to solve which lighting variations has to be applied to the current face texture making it as close as possible to the reference texture. We demonstrate the accuracy and the robustness of our method in normal lighting conditions by performing a set of experiments on the Biwi Kinect head pose database. Moreover, the robustness to illumination changes is evaluated using a set of sequences for different persons recorded in severe lighting condition changes. These experiments show that our method is robust and precise under both normal and severe lighting conditions.

Paper Nr: 168
Title:

Swapping-based Annealed Particle Filter with Occlusion Handling for 3D Human Body Tracking

Authors:

Xuan Son Nguyen

Abstract: In this paper, we propose a new approach for 3D human body tracking. We first extend the idea of Swapping-based Partitioned Sampling (SBPS), which was introduced by Dubuisson et al. for solving the articulated object tracking problem in high dimensional state spaces. This extension aims to deal with self-occlusion and constraints between parts of the human body, which are not taken into account in SBPS. We prove that, under the same assumptions required by SBPS, the posterior distribution are correctly estimated in our framework. We then introduce a new approach for 3D human body tracking, based on this new framework and Annealed Particle Filter (APF). Experiments with multi-camera walking sequences from the HumanEva I dataset show the efficiency of the proposed approach in terms of both accuracy and computation time.

Paper Nr: 193
Title:

Hide and Seek - An Active Binocular Object Tracking System

Authors:

Pramod Chandrashekhariah and Jochen Triesch

Abstract: We introduce a novel active stereo vison-based object tracking system for a humanoid robot. The system tracks a moving object that is dynamically changing its appearance and scale. The system features an inbuilt learning process that simultaneously learns short term models for the object and potential distractors. These models evolve over time, rectifying the inaccuracies of the tracking in a cluttered scene and allowing the system to identify unusual events such as sudden displacement, hiding behind or being masked by an occluder, and sudden disappearance from the scene. The system deals with these through different response modes such as active search when the object is lost, intentional waiting for reappearance when the object is hidden, and reinitialization of the track when the object is suddenly displaced by the user. We demonstrate our system on the iCub robot in an indoor environment and evaluate its performance. Our experiments show a performance enhancement for long occlusions through the learning of distractor models.

Paper Nr: 206
Title:

Color Supported Generalized-ICP

Authors:

Michael Korn, Martin Holzkothen and Josef Pauli

Abstract: This paper presents a method to support point cloud registration with color information. For this purpose we integrate L*a*b* color space information into the Generalized Iterative Closest Point (GICP) algorithm, a state-of-the-art Plane-To-Plane ICP variant. A six-dimensional k-d tree based nearest neighbor search is used to match corresponding points between the clouds. We demonstrate that the additional effort in general does not have an immoderate impact on the runtime, since the number of iterations can be reduced. The influence on the estimated 6 DoF transformations is quantitatively evaluated on six different datasets. It will be shown that the modified algorithm can improve the results without needing any special parameter adjustment.

Paper Nr: 233
Title:

Three-Dimensional Visual Reconstruction of Path Shape Using a Cart with a Laser Scanner

Authors:

Kikuhito Kawasue, Ryunosuke Futami and Hajime Kobayashi

Abstract: A movable three-dimensional measurement system of the shape of a path (road) surface has been developed. The measurement can be taken by rolling the proposed measurement cart along the path. The measurement system is composed of a laser scanner, CCD camera, omni-directional camera and a computer. The laser scanner measures the cross-sectional shape of the path at a rate of 40 Hz. The direction of the CCD view is downward to observe the texture of the path surface. The relative movement of the measurement cart to the path is detected by analysing the optical flow of the texture movement. Cross-sectional shapes of the path are accumulated, and the three-dimensional path shape is reconstructed on the basis of the movement of the measurement cart. The image data recorded by the omni-directional camera are allocated to the three-dimensional shape data, and the three-dimensional path is visualized in color on the computer. The reconstructed path data can be used for repair and design of a path in the field of civil engineering. The experimental results show the feasibility of our system.

Paper Nr: 327
Title:

Calibration and 3D Ground Truth Data Generation with Orthogonal Camera-setup and Refraction Compensation for Aquaria in Real-time

Authors:

Klaus Müller, Jens Schlemper, Lars Kuhnert and Klaus-Dieter Kuhnert

Abstract: In this paper we present a novel approach to generate precise 3D ground-truth data considering the refraction of the fish tank. We used an accurate and easy-to-handle calibration method to calibrate two orthogonally aligned high-resolution cameras in real-time. For precise fish shape segmentation we combined two different background subtraction algorithms, which can also be trained while fish are swimming inside the aquarium. The presented approach takes also shadow segmentation removal and mirroring into account. For refraction compensation at the air-water border we developed an algorithm which calculates the ray-deflection of every shape-pixel and compute the 3D-model in real-time.

Paper Nr: 333
Title:

Ego-motion Recovery and Robust Tilt Estimation for Planar Motion using Several Homographies

Authors:

Mårten Wadenbäck and Anders Heyden

Abstract: In this paper we suggest an improvement to a recent algorithm for estimating the pose and ego-motion of a camera which is constrained to planar motion at a constant height above the floor, with a constant tilt. Such motion is common in robotics applications where a camera is mounted onto a mobile platform and directed towards the floor. Due to the planar nature of the scene, images taken with such a camera will be related by a planar homography, which may be used to extract the ego-motion and camera pose. Earlier algorithms for this particular kind of motion were not concerned with determining the tilt of the camera, focusing instead on recovering only the motion. Estimating the tilt is a necessary step in order to create a rectified map for a SLAM system. Our contribution extends the aforementioned recent method, and we demonstrate that our enhanced algorithm gives more accurate estimates of the motion parameters.

Paper Nr: 337
Title:

Uncalibrated Image Rectification for Coplanar Stereo Cameras

Authors:

Vinícius Cesar, Thiago Farias, Saulo Pessoa, Samuel Macedo, Judith Kelner and Ismael Santos

Abstract: Nowadays, underwater maintenance tasks, mostly in the case of oil and gas industries, have been assisted by computer vision algorithms. An important part of these procedures is the rectification of stereo images, which is the first step in the stereo 3D reconstruction pipeline. Some aspects of the underwater environment make the rectification process difficult: it presents a very noisy scenario; and the equipment is almost textureless. As a result of this demanding scenario, this article proposes a novel technique for a more accurate rectification of a set of images than the state-of-the-art methods. Tests were carried out proving the efficiency of the proposed technique.

Paper Nr: 351
Title:

An Efficient Solution to 3D Reconstruction from Two Uncalibrated Views under SV Constraint

Authors:

Shuyang Dou, Hiroshi Nagahashi and Xiaolin Zhang

Abstract: In this paper, an efficient solution is proposed to the problem of 3D reconstruction from two uncalibrated views under Standard Vergence (SV) constraint. This solution consists of three core steps: firstly, set up the camera configuration according to SV constraint; secondly, estimate camera's focal length and relative pose between two views; lastly, reconstruct the scene optimally by minimizing reprojection error. By analysing the degenerated camera motion under SV constraint, a novel method for efficiently estimating camera's focal length and relative pose is proposed. Both synthetic and real data experiments showed that this new method could provide close estimation, which resulted in fast convergence in the most time-consuming step of final optimization. The main contribution of this paper is that it is the first time to introduce SV constraint into 3D reconstruction problem, and an efficient solution which utilizes this constraint is proposed.

Paper Nr: 360
Title:

Using Robot Skills for Flexible Reprogramming of Pick Operations in Industrial Scenarios

Authors:

Rasmus S. Andersen, Lazaros Nalpantidis, Volker Krüger, Ole Madsen and Thomas B. Moeslund

Abstract: Traditional robots used in manufacturing are very efficient for solving specific tasks that are repeated many times. The robots are, however, difficult to (re-)configure and (re-)program. This can often only be done by expert robotic programmers, computer vision experts, etc., and it requires additionally lots of time. In this paper we present and use a skill based framework for robotic programming. In this framework, we develop a flexible pick skill, that can easily be reprogrammed to solve new specific tasks, even by non-experts. Using the pick skill, a robot can detect rotational symmetric objects on tabletops and pick them up in a user-specified manner. The programming itself is primarily done through kinesthetic teaching. We show that the skill has robustness towards the location and shape of the object to pick, and that objects from a real industrial production line can be handled. Also, preliminary tests indicate that non-expert users can learn to use the skill after only a short introduction.

Paper Nr: 380
Title:

Sampling based Bundle Adjustment using Feature Matches between Ground-view and Aerial Images

Authors:

Hideyuki Kume, Tomokazu Sato and Naokazu Yokoya

Abstract: This paper proposes a new pipeline of Structure-from-Motion that uses feature matches between ground- view and aerial images for removing accumulative errors. In order to find good matches from unreliable matches, we newly propose RANSAC based outlier elimination methods in both feature matching and bundle adjustment stages. To this end, in the feature matching stage, the consistency of orientation and scale extracted from images by a feature descriptor is checked. In the bundle adjustment stage, we focus on the consistency between estimated geometry and matches. In experiments, we quantitatively evaluate performances of the proposed feature matching and bundle adjustment.

Paper Nr: 396
Title:

Probabilistic Object Identification through On-demand Partial Views

Authors:

Susana Brandão, Manuela Veloso and João P. Costeira

Abstract: The current paper addresses the problem of object identification from multiple 3D partial views, collected from different view angles with the objective of disambiguating between similar objects. We assume a mobile robot equipped with a depth sensor that autonomously grasps an object from different positions, with no previous known pattern. The challenge is to efficiently combine the set of observations into a single classification. We approach the problem with a sequential importance resampling filter that allows to combine the sequence of observations and that, by its sampling nature, allows to handle the large number of possible partial views. In this context, we introduce innovations at the level of the partial view representation and at the formulation of the classification problem. We provide a qualitative comparison to support our representation and illustrate the identification process with a case study.

Posters
Paper Nr: 9
Title:

Part-based 3D Multi-person Tracking using Depth Cue in a Top View

Authors:

Cyrille Migniot and Fakhreddine Ababsa

Abstract: While the problem of tracking 3D human motion has been widely studied, the top view is never taken into consideration. However, for the video surveillance, the camera is most of the time placed above the persons. This is due to the human shape is more discriminative in the front view. We propose in this paper a markerless 3D human tracking on the top view. To do this we use the depth and color image sequences given by a kinect. First a 3D model is fitted to these cues in a particle filter framework. Then we introduce a process where the body parts are linked in a complete 3D model but weighted separately so as to reduce the computing time and optimize the resampling step. We find that this part-based tracking increases the accuracy. The process is real-time and works with multiple targets.

Paper Nr: 49
Title:

Objects Tracking in Catadioptric Images using Spherical Snake

Authors:

Anisse Khald, Amina Radgui and Mohammed Rziza

Abstract: The current work addresses the problem of 3D model tracking in the context of omnidirectional vision in order to object tracking. However, there is few articles dealing this problem in catadioptric vision. This paper is an attempt to describe a new approach of omnidirectional images (gray level) processing based on inverse stereographic projection in the half-sphere. We used the spherical model. For object tracking, The object tracking method used is snake, with optimization using the Greedy algorithm, by adapting its different operators. This method algorithm will respect the deformed geometry of omnidirectional images such as the spherical neighbourhood, the spherical gradient and reformulation of optimization algorithm on the spherical domain. This tracking method - that we call spherical snake - permit to know the change of the shape and the size of 2D object in different replacements in the spherical image.

Paper Nr: 51
Title:

High Performance Particle Tracking Velocimetry for Fluidized Beds

Authors:

Jouni Elfvengren, Jari Kolehmainen and Pentti Saarenrinne

Abstract: Fluidized beds are used in wide variety of industrial applications. These applications range from energy production to chemical industry. Particle tracking velocimetry (PTV) is an efficient way to study small scale behavior inside fluidized beds. An accurate PTV algorithm has to be able to perform also in relatively dense suspensions where particles may overlap and form clusters. PTV algorithms typically proceed from locating the particles to tracking their motion. Typically the particle locating has been based on either profile matching or image intensity thresholding. This study proposes a combined method that tries to take advantage of the both methods to overcome difficulties associated with dense suspensions. The method was tested in a synthetic case and in an experimental fluidized bed case. The synthetic tests showed a slight increase in error when the number of particles increased, but the error level remained acceptable. Results obtained from the fluidized bed were visually inspected. Visual inspection showed that most of the particles were tracked correctly, which suggests that the proposed method performs well also in practice.

Paper Nr: 70
Title:

Monocular 3D Pose Tracking of a Specular Object

Authors:

Nassir W. Oumer

Abstract: A space object such as a satellite consists of highly specular surface, and when exposed to directional source of light, it is very difficult for visual tracking. However, camera-based tracking provides an inexpensive solution to the problem of on-orbit servicing of a satellite, such as orbital-life extension by repairing and refuelling, and debris removal. In this paper we present a real time pose tracking method applied to a such object under direct Sunlight, by adapting keypoint and edge-based approach, with known simple geometry. The implemented algorithm is relatively accurate and robust to specular reflection. We show the results which are based on real images from a simulation system of on-orbit servicing, consisting of two six degree of freedom robots, the Sun simulator and a full scale satellite mock-up.

Paper Nr: 75
Title:

Monocular Rear Approach Indicator for Motorcycles

Authors:

Joerg Deigmoeller, Herbert Janssen, Oliver Fuchs and Julian Eggert

Abstract: Conventional rear-view mirrors on motorcycles only allow a limited visibility as they are shaky and cover a small field of view. Especially at high speeds with strong headwind, it is difficult for the rider to turn his head to observe blind spots. To support the rider in observing the rear and blind-spots, a monocular system that indicates approaching vehicles is proposed in this paper. The vision based indication relies on sparse optical flow estimation. In a first step, a rough separation of background and approaching object pixel motion is done in an efficient and computationally cheap way. In a post-processing step, pixel motion information is further checked on geometric meaningful transformations and continuity over time. As a prototype, the system has been mounted on a Honda Pan-European motorcycle plus monitor in the dashboard that shows the rear-view image to the rider. If an approaching object is detected, the rider gets an indication on the monitor. The rearview on the monitor not only acts as HMI (Human Machine Interface) for the indication, but also significantly extends the visibility compared to mirrors. The algorithm has been extensively evaluated for relative speeds from 20 km/h to 100 km/h (speed differences between motorcycle and approaching vehicle), at normal, rainy and night conditions. Results show that the approach offers a sensing range from 20 m at low speed up to 60 m at night.

Paper Nr: 76
Title:

Hand-eye Calibration with a Depth Camera: 2D or 3D?

Authors:

Svenja Kahn, Dominik Haumann and Volker Willert

Abstract: Real time 3D imaging applications such as on the fly 3D inspection or 3D reconstruction can be created by rigidly coupling a depth camera with an articulated measurement arm or a robot. For such applications, the "hand-eye transformation" between the depth camera and the measurement arm needs to be known. For depth cameras, the hand-eye transformation can either be estimated using 2D images or the 3D measurements captured by the depth camera. This paper investigates the comparison between 2D image based and 3D measurement based hand-eye-calibration. First, two hand-eye calibration approaches are introduced which differ in the way the camera pose is estimated (either with 2D or with 3D data). The main problem in view of the evaluation is, that the ground truth hand-eye transformation is not available and thus a direct evaluation of the accuracy is not possible. Therefore, we introduce quantitative 2D and 3D error measures that allow for an implicit evaluation of the accuracy of the calibration without explicitly knowing the real ground truth transformation. In view of 3D precision, the 3D calibration approach provides more accurate results on average but requires more manual preparation and much more computation time than the 2D approach.

Paper Nr: 79
Title:

3D Reconstruction of Dynamic Scenes from Two Asynchronous Video-streams

Authors:

Artashes Mkhitaryan and Darius Burschka

Abstract: We present an algorithm for reconstruction of dynamic scenes from a video input of an asynchronous stereo-pair. Our method presumes that the timestamps of the frame acquisitions and the intrinsic parameters of the stereo cameras are known, as well as, that the extrinsic parameters can be estimated from static regions of the scene. It computes the 3D trajectory of each individual point by modeling its motion parameters. Our algorithm requires in addition to the two asynchronous images from the camera pair, two additional images from one of the cameras for its processing. Since dynamic scene elements cause wrong reconstructions, here we introduce a two stage process in which the directions of the possible motion components are estimated prior to the 3D pose estimation of the corresponding point.

Paper Nr: 85
Title:

Fast Target Redetection for CAMSHIFT using Back-projection and Histogram Matching

Authors:

Abdul Basit, Matthew N. Dailey, Pudit Laksanacharoen and Jednipat Moonrinta

Abstract: Most visual tracking algorithms lose track of the target object (start tracking a different object or part of the background) or report an error when the object being tracked leaves the scene or becomes occluded in a cluttered environment. We propose a fast algorithm for mobile robots tracking humans or other objects in real-life scenarios to avoid these problems. The proposed method uses an adaptive histogram threshold matching algorithm to suspend the CAMSHIFT tracker when the target is insufficiently clear. While tracking is suspended, any method would need to continually scan the entire image in an attempt to redetect and reinitialize tracking of the specified object. However, searching the entire image for an arbitrary target object requires an extremely efficient algorithm to be feasible in real time. Our method, rather than a detailed search over the entire image, makes efficient use of the backprojection of the target object’s appearance model to hypothesize and test just a few candidate locations for the target in each image. Once the target object is redetected and sufficiently clear in a new image, the method reinitializes tracking. In a series of experiments with four real-world videos, we find that the method is successful at suspending and reinitializing CAMSHIFT tracking when the target leaves and reenters the scene, with successful reinitialization and very low false positive rates.

Paper Nr: 158
Title:

Global Camera Parameterization for Bundle Adjustment

Authors:

Čeněk Albl and Tomás Pajdla

Abstract: Bundle adjustment is an important optimization technique in computer vision. It is a key part of Structure from Motion computation. An important problem in Bundle Adjustment is to choose a proper parameterization of cameras, especially their orientations. In this paper we propose a new parameterization of a perspective camera based on quaternions, with no redundancy in dimensionality and no constraints on the rotations. We conducted extensive experiments comparing this parameterization to four other widely used parameterizations. The proposed parameterization is non-redundant, global, and achieving the same performance in all investigated parameters. It is a viable and practical choice for Bundle Adjustment.

Paper Nr: 251
Title:

Improving Visual Tracking Robustness in Cluttered and Occluded Environments using Particle Filter with Hybrid Resampling

Authors:

Flavio de Barros Vidal, Diego A. L. Cordoba, Alexandre Zaghetto and Carla M. C. C. Koike

Abstract: Occlusions and cluttered environments represent real challenges for visual tracking methods. In order to increase robustness for such situations, we present, in this article, a method for visual tracking using a Particle Filter with Hybrid Resampling. Our approach consists of using a particle filter to estimate the state of the tracked object, and both particles’ inertia and update information are used in the resampling stage. The proposed method is tested using a public benchmark and the results are compared with other tracking algorithms. The results show that our approach performs better in cluttered environments, as well as in situations with total or partial occlusions.

Paper Nr: 258
Title:

Robust Multi-Human Tracking by Detection Update using Reliable Temporal Information

Authors:

Lu Wang, Qingxu Deng and Mingxing Jia

Abstract: In this paper, we present a multiple human tracking approach that takes the single frame human detection results as input, and associates them hierarchically to form trajectories while improving the original detection results by making use of reliable temporal information. It works by first forming tracklets, from which reliable temporal information can be extracted, and then refining the detection responses inside the tracklets. After that, local conservative tracklets association is performed and reliable temporal information is propagated across tracklets. The global tracklet association is done lastly to resolve association ambiguities. Comparison with two state-of-the-art approaches demonstrates the effectiveness of the proposed approach.

Paper Nr: 274
Title:

Cross-spectral Stereo Correspondence using Dense Flow Fields

Authors:

Naveen Onkarappa, Cristhian A. Aguilera-Carrasco, Boris X. Vintimilla and Angel D. Sappa

Abstract: This manuscript addresses the cross-spectral stereo correspondence problem. It proposes the usage of a dense flow field based representation instead of the original cross-spectral images, which have a low correlation. In this way, working in the flow field space, classical cost functions can be used as similarity measures. Preliminary experimental results on urban environments have been obtained showing the validity of the proposed approach.

Paper Nr: 316
Title:

Precise 3D Pose Estimation of Human Faces

Authors:

Ákos Pernek and Levente Hajder

Abstract: Robust human face recognition is one of the most important open tasks in computer vision. This study deals with a challenging subproblem of face recognition: the aim of the paper is to give a precise estimation for the 3D head pose. The main contribution of this study is a novel non-rigid Structure from Motion (SfM) algorithm which utilizes the fact that the human face is quasi-symmetric. The input of the proposed algorithm is a set of tracked feature points of the face. In order to increase the precision of the head pose estimation, we improved one of the best eye corner detectors and fused the results with the input set of feature points. The proposed methods were evaluated on real and synthetic face sequences. The real sequences were captured using regular (low-cost) web-cams.

Paper Nr: 335
Title:

A Method of Eliminating Interreflection in 3D Reconstruction using Structured Light 3D Camera

Authors:

Lam Quang Bui and Sukhan Lee

Abstract: Interreflection, which is one of main components of the global illumination effect, degrades the performance of structured light 3D camera. In this paper, we present a method of eliminating interreflection in 3D reconstruction using structured light 3D camera without any modification of the structured light pattern. The key idea is to rely on the patterns in final layer of HOC (Hierarchical Orthogonal Coding), where the effect of interreflection is weakest due to small light stripes in the pattern, to eliminate the reflected boundaries as well as fill the missing boundaries in upper layers. Experimental results show that the effect of interreflection in proposed algorithm is significantly reduced in comparison with original decoding method of HOC. The proposed method can be readily incorporated into existing structured light 3D cameras without any extra pattern or hardware.

Paper Nr: 338
Title:

A Dense Map Building Approach from Spherical RGBD Images

Authors:

Tawsif Gokhool, Maxime Meilland, Patrick Rives and Eduardo Fernández-Moral

Abstract: Visual mapping is a required capability for practical autonomous mobile robots where there exists a growing industry with applications ranging from the service to industrial sectors. Prior to map building, Visual Odometry(VO) is an essential step required in the process of pose graph construction. In this work, we first propose to tackle the pose estimation problem by using both photometric and geometric information in a direct RGBD image registration method. Secondly, the mapping problem is tackled with a pose graph representation, whereby, given a database of augmented visual spheres, a travelled trajectory with redundant information is pruned out to a skeletal pose graph. Both methods are evaluated with data acquired with a recently proposed omnidirectional RGBD sensor for indoor environments.

Paper Nr: 359
Title:

A Framework for 3D Object Identification and Tracking

Authors:

Georgios Chliveros, Rui P. Figueiredo, Plinio Moreno, Maria Pateraki, Alexandre Bernardino, Jose Santos-Victor and Panos Trahanias

Abstract: In this paper we present a framework for the estimation of the pose of an object in 3D space: from the detection and subsequent recognition from a 3D point-cloud, to tracking in the 2D camera plane. The detection process proposes a way to remove redundant features, which leads to significant computational savings without affecting identification performance. The tracking process introduces a method that is less sensitive to outliers and is able to perform in soft real-time. We present preliminary results that illustrate the effectiveness of the approach both in terms of accuracy and computational speed.

Paper Nr: 368
Title:

FacialStereo: Facial Depth Estimation from a Stereo Pair

Authors:

Gagan Kanojia and Shanmuganathan Raman

Abstract: Consider the problem of sparse depth estimation from a given stereo image pair. This classic computer vision problem has been addressed by various algorithms over the past three decades. The traditional solution is to match the feature points in two images to estimate the disparity and therefore the depth. In this work, we consider a special case of scenes which have people with their front-on faces visible to the camera and we want to estimate how far a person is from the camera. This paper proposes a novel method to identify the depth of faces and even the depth of a single facial feature (eyebrows, eyes, nose, and lips) of a person from the camera using a stereo pair. The proposed technique employs active shape models (ASM) and face detection. ASM is a model-based technique consisting of a shape model which contains the data regarding the valid shapes of a face and a profile model which contains the texture of the face to localize the facial features in the stereo pair. We shall demonstrate how depth of faces can be obtained by the estimation of disparities from the landmark points.

Paper Nr: 382
Title:

An Approximation Algorithm for Computing the Visibility Region of a Point on a Terrain and Visibility Testing

Authors:

Sharareh Alipour, Mohammad Ghodsi, Ugur Gudukbay and Morteza Golkari

Abstract: Given a terrain and a query point p on or above it, we want to count the number of triangles of terrain that are visible from p. We present an approximation algorithm to solve this problem. We implement the algorithm and then we run it on the real data sets. The experimental results show that our approximation solution is very close to the real solution and compare to the other similar works, the running time of our algorithm is better than their algorithm. The analysis of time complexity of algorithm is also presented. Also, we consider visibility testing problem, where the goal is to test whether p and a given triangle of train are visible or not. We propose an algorithm for this problem and show that the average running time of this algorithm will be the same as running time of the case where we want to test the visibility between two query point p and q.

Paper Nr: 392
Title:

TV Minimization of Direct Algebraic Method of Optical Flow Detection Via Modulated Integral Imaging using Correlation Image Sensor.

Authors:

Toru Kurihara and Shigeru Ando

Abstract: A novel mathematical method and a sensing system that detects velocity vector distribution on an optical image with a pixel-wise spatial resolution and a frame-wise temporal resolution is extended by total variation minimization. We applied fast total variation minimization technique for exact algebraic method of optical flow detection. Simulation result showed that directional error caused by local aperture problem decreased effectively by the virtue of global optimization. Experimental results showed edge preserving characteristics on the boundary of motion.

Paper Nr: 395
Title:

A Semi-Lagrangian Approximation of the Oren–Nayar PDE for the Orthographic Shape–from–Shading Problem

Authors:

Silvia Tozza and Maurizio Falcone

Abstract: Several advances have been made in the last ten years to improve the Shape–from–Shading model in order to allow its use on real images. The classic Lambertian model, suitable to reconstruct 3D surfaces with uniform reflection properties has shown to be unsuitable for other types of surfaces, for example for rough objects consisting of materials such as clay. Other models have been proposed but it is still unclear what would be the best model. For this reason, we start our analysis for non-Lambertian surfaces. The goal being to find a unique model which should be flexible enough to deal with many kinds of real images. As a starting point for this big project, we consider the non-Lambertian Oren–Nayar reflectance model. In this paper we construct a semi-Lagrangian approximation scheme for its nonlinear partial differential equation and we compare its performances with the classical model in terms of some error indicators on series of benchmarks images.