VISAPP 2013 Abstracts


Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 66
Title:

A Novel Real-time Edge-Preserving Smoothing Filter

Authors:

Simon Reich, Alexey Abramov, Jeremie Papon, Florentin Wörgötter and Babette Dellen

Abstract: The segmentation of textured and noisy areas in images is a very challenging task due to the large variety of objects and materials in natural environments, which cannot be solved by a single similarity measure. In this paper, we address this problem by proposing a novel edge-preserving texture filter, which smudges the color values inside uniformly textured areas, thus making the processed image more workable for color-based image segmentation. Due to the highly parallel structure of the method, the implementation on a GPU runs in realtime, allowing us to process standard images within tens of milliseconds. By preprocessing images with this novel filter before applying a recent real-time color-based image segmentation method, we obtain significant improvements in performance for images from the Berkeley dataset, outperforming an alternative version using a standard bilateral filter for preprocessing. We further show that our combined approach leads to better segmentations in terms of a standard performance measure than graph-based and mean-shift segmentation for the Berkeley image dataset.

Paper Nr: 107
Title:

Bayesian Estimation of Camera Characteristics including Spectral Sensitivities from a Color Chart Image without Manual Parameter Tuning

Authors:

Yusuke Murayama, Pengchang Zhang and Ari Ide-Ektessabi

Abstract: We proposed a new practical method for identifying characteristics of a color digital camera: spectral sensitivity function, linearization function and noise variance of each color channel. The only input is an image of a color chart acquired by the objective camera with a spectral-content-known illuminant, and the camera characteristics are obtained automatically. The proposed method was developed in the Bayesian statistical framework in order to improve upon previous methods, namely, to eliminate trial-and-error parameter tuning and to identify linearization function as well as spectral sensitivities. The polyline linearization function and the noise variance of a color channel were considered as hyperparameters, and estimated by the marginalized likelihood criterion. Such hyperparameters associated with the smoothness of the sensitivity curves were also estimated similarly. Then the spectral sensitivity of a color channel was obtained as maximum a posteriori solution. In experiments using synthetic data, the proposed method was found to be widely adaptable to the forms of sensitivity curves and the levels of sensor noise.

Paper Nr: 201
Title:

A Mobile AR System for Sports Spectators using Multiple Viewpoint Cameras

Authors:

Ruiko Miyano, Takuya Inoue, Takuya Minagawa, Yuko Uematsu and Hideo Saito

Abstract: In this paper, we aimto develop an AR system which supports spectators who are watching a sports game using smartphones in a spectators’ stand. The final goal of this system is that a spectator can watch information of players through a smartphone and share experiences with other spectators. For this goal, we propose a system which consists of smartphones and fixed cameras. Fixed cameras are set to cover the whole sports field and used to analyze players. Smartphones held by spectators are used to estimate positions where they are looking on the sports field. We built an AR system which makes annotation of players’ information onto a smartphone image. And we evaluated the accuracy and the processing time of our system and revealed its practicality.

Short Papers
Paper Nr: 17
Title:

A Modified Inter-view Prediction Scheme for Multiview Video Coding to Improve View’s Interactivity

Authors:

Ayman Hamdan, Hussein A. Aly and Mohamed M. Fouad

Abstract: In this paper, we modify the inter-view prediction MVC-HBP scheme of the standard Multiview Video Coding (MVC). This modification yields reducing the extracted data size due to decoding a certain view from the MVC bit-stream, thus improving the random access. The proposed scheme is compared to the MVC standard using real data sequences. Clear improvements are shown using the proposed scheme over the competing approach in terms of data size and random access, with comparable values of rate-distortion (RD).

Paper Nr: 53
Title:

Low-cost Automatic Inpainting for Artifact Suppression in Facial Images

Authors:

André Sobiecki, Alexandru Telea, Gilson Giraldi, Luiz Antonio Neves and Carlos Eduardo Thomaz

Abstract: Facial images are often used in applications that need to recognize or identify persons. Many existing facial recognition tools have limitations with respect to facial image quality attributes such as resolution, face position, and artifacts present in the image. In this paper we describe a new low-cost framework for preprocessing low-quality facial images in order to render them suitable for automatic recognition. For this, we first detect artifacts based on the statistical difference between the target image and a set of pre-processed images in the database. Next, we eliminate artifacts by an inpainting method which combines information from the target image and similar images in our database. Our method has low computational cost and is simple to implement, which makes it attractive for usage in low-budget environments. We illustrate our method on several images taken from public surveillance databases, and compare our results with existing inpainting techniques.

Paper Nr: 88
Title:

Half Gaussian Kernels Based Shock Filter for Image Deblurring and Regularization

Authors:

Baptiste Magnier, Huanyu Xu and Philippe Montesinos

Abstract: In this paper, a shock-diffusion model is presented to restore both blurred and noisy image. The proposed approach uses a half smoothing kernel to get the precise edge directions, and use different shock-diffusion strategies for different image regions. Experiment results on real images show that the proposed model can effectively eliminate noise and enhance edges while preserving small objects and corners simultaneously. Compared to other approaches, the proposed method offers both better visual results and qualitative measurements.

Paper Nr: 94
Title:

SMQT-based Tone Mapping Operators for High Dynamic Range Images

Authors:

Mikael Nilsson

Abstract: In this paper, tone mapping operations based on the nonlinear Successive Mean Quantization Transform (SMQT) are proposed in order to convert high dynamic range images to low dynamic range images. A SMQTbased tone mapping applied on the luminance channel is derived as well as a SMQT-based method working directly on all RGB channels. Both methods are compared to other state-of-the-art methods and produce visually similar results. The processing speeds of the SMQT-based methods are discussed and found to be some of the fastest reported on a single CPU. Furthermore, additional improvement regarding the processing speed and its impact on image quality is investigated.

Paper Nr: 105
Title:

Evaluation of Sharpness Measures and Proposal of a Stop Criterion for Reverse Diffusion in the Context of Image Deblurring

Authors:

Pol Moreno and Felipe Calderero

Abstract: The heat equation can be used to model the diffusion process shown in a defocused (blurry) region of a picture taken with conventional camera lens. The original focused image can be recovered by reverting the heat equation, that is, by reverse diffusion. However, the main difficulty with this technique is that it becomes unstable very quickly due to the finite precision of pixel values and the image values blow up. For that reason, detecting the exact time when the reverse diffusion process should stop is crucial. The goal of this work it to evaluate the behavior of different non-reference state-of-the-art sharpness measures (that is, when a perfectly focused image is not available) for the forward and inverse diffusion processes and to propose a robust stop criterion to reliably detect the moment before each region becomes unstable. To find out a good stop criterion, we carry out a set of experiments with test and real images. The results in this paper can be valuable not only to estimate monocular depth from blur cues, but also to any other image processing fields that require image deblurring.

Paper Nr: 109
Title:

Color Quantization via Spatial Resolution Reduction

Authors:

Giuliana Ramella and Gabriella Sanniti di Baja

Abstract: A color quantization algorithm is presented, which is based on the reduction of the spatial resolution of the input image. The maximum number of colors nf desired for the output image is used to fix the proper spatial resolution reduction factor. This is used to build a lower resolution version of the input image with size nf. Colors found in the lower resolution image constitute the palette for the output image. The three components of each color of the palette are interpreted as the coordinates of a voxel in the 3D discrete space. The Voronoi Diagram of the set of voxels corresponding to the colors of the palette is computed and is used for color mapping of the input image.

Paper Nr: 215
Title:

Design of Focusing Catadioptric Systems using Differential Geometry

Authors:

Tobias Strauß

Abstract: In recent years catadioptric systems, consisting of lenses and mirrors, have gained increasing popularity for the task of environmental perception. However, focusing of such systems is a common problem as it is often not considered during the design process of the optical system. This paper presents a novel approach to address focus in the design of optics with rotational symmetry. The approach does not only adress the construction of catadioptric systems but can also be used to calculate conventional optics. The approach is based on the calculation of the first order approximation of meridional and sagittal focus using differential geometry. Additional conditions like a single-viewpoint can be considered as well. The derived equations are combined to a set of ordinary differential equations that is used to calculate the shape of the optical system via numerical integration. The design concept has been verified by multi-chromatic ray tracing simulations.

Paper Nr: 275
Title:

A Faster Method Aiming Iris Extraction

Authors:

Jeovane Honório Alves, Gilson Antonio Giraldi and Luiz Antônio Pereira Neves

Abstract: In this paper, we present a technique for iris segmentation. The method finds the pupil in the first step. Next, it segments the iris using the pupil location. The proposed approach is based on the mathematical morphology operators of opening and closing, as well as histogram expansion and thresholding. The CASIA Iris Database from the Institute of Automation of the Chinese Academy of Sciences has been used for the tests. Several tests were performed with 200 different images, showing the efficiency of the proposed method.

Posters
Paper Nr: 74
Title:

How to use Information Theory for Image Inpainting and Blind Spot Filling-in?

Authors:

J. M. Berthommé, T. Chateau and M. Dhome

Abstract: This paper shows how information theory can both drive the digital image inpainting process and the optical illusion due to the blind spot. The defended position is that the missing information is padded by the ``most probable information around'' via a simple filling-in scheme. Thus the proposed algorithm aims to keep the entropy constant. It cares not to create too much novelty as well as not to destroy too much information. For this, the image is broken down into regular squares in order to build a dictionary of unique words and to estimate the entropy. Then the occluded region is completed, word by word and layer by layer, by picking the element which respects the existing image, which minimizes the entropy deviation if there are several candidates, and which limits its potential increase in the case where no compatible word exists and where a new one must be introduced.

Paper Nr: 98
Title:

An Image Quality Assessment Technique using Defocused Blur as Evaluation Metric

Authors:

Huei-Yung Lin and Xin-Han Chou

Abstract: In this paper, an image quality assessment technique based on defocus blur identification is proposed. Some representative image regions containing edge features are first extracted automatically. A histogram analysis based on the comparison of real and synthesized defocused regions is then carried out to estimate the blur extent. By iteratively changing the convolution parameters, the best blur extent is identified from histogram matching. The image quality is finally evaluated based on the overall blur extent of the selected regions. We have performed the experiments using real scene images. It is shown that accurate image quality assessment results can be achieved using the proposed technique.

Paper Nr: 117
Title:

Pose Estimation using a Hierarchical 3D Representation of Contours and Surfaces

Authors:

Anders Glent Buch, Dirk Kraft, Joni-Kristian Kämäräinen and Norbert Krüger

Abstract: We present a system for detecting the pose of rigid objects using texture and contour information. From a stereo image view of a scene, a sparse hierarchical scene representation is reconstructed using an early cognitive vision system. We define an object model in terms of a simple context descriptor of the contour and texture features to provide a sparse, yet descriptive object representation. Using our descriptors, we do a search in the correspondence space to perform outlier removal and compute the object pose. We perform an extensive evaluation of our approach with stereo images of a variety of real-world objects rendered in a controlled virtual environment. Our experiments show the complementary role of 3D texture and contour information allowing for pose estimation with high robustness and accuracy.

Paper Nr: 126
Title:

Action Recognition by Matching Clustered Trajectories of Motion Vectors

Authors:

Michalis Vrigkas, Vasileios Karavasilis, Christophoros Nikou and Ioannis Kakadiaris

Abstract: A framework for action representation and recognition based on the description of an action by time series of optical flow motion features is presented. In the learning step, the motion curves representing each action are clustered using Gaussian mixture modeling (GMM). In the recognition step, the optical flow curves of a probe sequence are also clustered using a GMM and the probe curves are matched to the learned curves using a non-metric similarity function based on the longest common subsequence which is robust to noise and provides an intuitive notion of similarity between trajectories. Finally, the probe sequence is categorized to the learned action with the maximum similarity using a nearest neighbor classification scheme. Experimental results on common action databases demonstrate the effectiveness of the proposed method.

Paper Nr: 137
Title:

Signal Activity Estimation with Built-in Noise Management in Raw Digital Images

Authors:

Angelo Bosco, Davide Giacalone, Arcangelo Bruna, Sebastiano Battiato and Rosetta Rizzo

Abstract: Discriminating smooth image regions from areas in which significant signal activity occurs is a widely studied subject and is important in low level image processing as well as computer vision applications. In this paper we present a novel method for estimating signal activity in an image directly in the CFA (Color Filter Array) Bayer raw domain. The solution is robust against noise in that it utilizes low level noise characterization of the image sensor to automatically compensate for high noise levels that contaminate the image signal.

Paper Nr: 158
Title:

Fusion of Dehazing and Retinex using Transmission for Visibility Enhancement

Authors:

Jaepil Ko

Abstract: Outdoor images are easily degraded by aerosols such as haze and fog. The existing dehazing methods based on the atmospheric scattering model improve image contrast and color fidelity at the cost of its brightness. We propose a visibility enhancement method by combining dehazing and retinex with the transmission. The proposed method retains both color fidelity and brightness without over saturation.

Paper Nr: 177
Title:

Colour Processing in Tetrachromatic Spaces - Uses of Tetrachromatic Colour Spaces

Authors:

Alfredo Restrepo Palacios

Abstract: We exploit the geometry of the 4D hypercube in order to visualize tetrachromatic images.

Paper Nr: 233
Title:

Perceptual Comparison of Demosaicing Algorithms and In-camera Demosaicing with JPEG Compression

Authors:

Bartolomeo Montrucchio

Abstract: Color image acquisition in digital cameras is often performed by using CCD or CMOS sensor chips with a color filter array on the top of a single monochromatic sensor. In this paper, a perceptual comparison is performed among three well known demosaicing algorithms plus in-camera demosaicing with lossy compression JPEG, by means of subjective tests, that is with the help of human beings. The novelty of the approach is that chosen algorithms have been selected as representative of those used in commercial raw image converters used by professionals in graphics and that the test has been performed on a large number of people, achieving results only partially similar to the results got by means of computed metrics. The results show that in the greatest part of conditions and for non particularly expert users, the capability of the most advanced demosaicing algorithms of producing an almost perfect reconstruction on the full-color image is not strictly required. Only for selected categories of images it is possible to find a clear winner among the algorithms.

Paper Nr: 277
Title:

Splined-based Motion Vector Encoding Scheme

Authors:

Parnia Farokhian and Chris Joslin

Abstract: This paper presents a new motion vector (MV) encoding scheme based on a curve fitting technique. The MVs of collocated blocks along the video sequence form a set of data points that are mapped into a smaller set containing the key MVs representing the coefficients of the best fitted curve. In order to reconstruct each MV conforming to its original value, for achieving lossless MV compression, the MVs of each set are mapped into four categories; each represents the condition of recovering each MV from the curve. For reducing the number of key MVs the best matching block in the motion estimation process is found such that it results in the minimum number of key MVs while the number of bits required for encoding its residual energy is bounded to a pre-determined bitrate threshold. The algorithm utilizes rate control techniques for estimating the number of bits per candidate residual block within the search window to avoid computational complexity for selecting the best matching block. Experimental results show a saving of up to 42.7% in MV bitrate, compared to the method used in the H.264/AVC.

Paper Nr: 307
Title:

Experimental Evaluation of Bayesian Image Reconstruction Combined with Spatial-Superresolution and Spectral Reflectance Recovery

Authors:

Yusuke Murayama, Pengchang Zhang and Ari Ide-Ektessabi

Abstract: Acquisition of a multispectral image and analysis of the object based on spectral information recovered from the image has recently received attention in digital archiving of cultural assets. However multispectral imaging faces such problems as long image acquisition time and severe registration between band images. In order to solve them, we have proposed an extended method combining Bayesian image superresolution with spectral reflectance recovery. In this study we evaluated quantitatively the performance of the proposed technique using a typical 6-band multispectral scanner and a Japanese painting. The accuracy of recovered spectral reflectance was investigated with respect to the ratio of the capturing resolution to the recovering resolution. The experimental result indicated that the spatial resolution can be increased by around 1.7 times, which means image capturing time can be reduced almost by one third and besides the angle of view can be extended by 1.7 times.

Paper Nr: 309
Title:

Novel Wireless Capsule Endoscopy Diagnosis System with Adaptive Image Capturing Rate

Authors:

Zhi Jin, Tammam Tillo, Eng Gee Lim, Zhao Wang and Jimin Xiao

Abstract: Wireless Capsule Endoscopy (WCE) is a device used to diagnose the gastrointestinal (GI) track, and it is one of the most used tools to inspect the small intestine. Inspection by WCE is non-invasive, and consequently it is more popular if compared to other methods that are traditionally adopted in the examination of GI track. From the point of view of the physicians, WCE is a favorable approach in increasing both the efficiency and the accuracy of the diagnosis. The most significant drawback of WCE is the time consumption for a physician to check all the frames taken in the GI track, in fact it is too long, and could be up to 4 hours. Many anomaly-based techniques were proposed to help physician shorten the diagnosis time, however, these techniques still suffer from high false alarm rate, which limits their actual use. Therefore, in this paper we propose a two stage diagnosis system that firstly uses a normal capsule to capture the whole GI track, and then we use an automatic detection technique that detects anomalies with high false alarm rate. The low specificity of the first capsule ensures that no anomalies will be missed in the first stage of the process. The second stage of the proposed diagnosis system uses a different capsule with adaptive image capturing rate to re-capture the GI tract. In this stage the capsule will use high image capturing rate for segments of GI tract where an anomaly was detected in the first stage, whereas, in the other segments of the GI tract a lower image capturing rate will be used in order to have better use of the second capsule’s battery. Consequently, the second generated video, which will be inspected by the physician, will have higher resolution sequence around the areas with suspected lesion.

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 30
Title:

Segmentation of Tracheal Rings in Videobronchoscopy Combining Geometry and Appearance

Authors:

Carles Sánchez, Debora Gil, Antoni Rosell, Albert Andaluz and F. Javier Sánchez

Abstract: Videobronchoscopy is a medical imaging technique that allows interactive navigation inside the respiratory pathways and minimal invasive interventions. Tracheal procedures are ordinary interventions that require measurement of the percentage of obstructed pathway for injury (stenosis) assessment. Visual assessment of stenosis in videobronchoscopic sequences requires high expertise of trachea anatomy and is prone to human error. Accurate detection of tracheal rings is the basis for automated estimation of the size of stenosed trachea. Processing of videobronchoscopic images acquired at the operating room is a challenging task due to the wide range of artifacts and acquisition conditions. We present a model of the geometric-appearance of tracheal rings for its detection in videobronchoscopic videos. Experiments on sequences acquired at the operating room, show a performance close to inter-observer variability.

Paper Nr: 48
Title:

Blood Vessel Characterization in Colonoscopy Images to Improve Polyp Localization

Authors:

Joan M. Núñez, Jorge Bernal, Javier Sánchez and Fernando Vilariño

Abstract: This paper presents an approach to mitigate the contribution of blood vessels to the energy image used at different tasks of automatic colonoscopy image analysis. This goal is achieved by introducing a characterization of endoluminal scene objects which allows us to differentiate between the trace of 2-dimensional visual objects, such as vessels, and shades from 3-dimensional visual objects, such as folds. The proposed characterization is based on the influence that the object shape has in the resulting visual feature, and it leads to the development of a blood vessel attenuation algorithm. A database consisting of manually labelled masks was built in order to test the performance of our method, which shows an encouraging success in blood vessel mitigation while keeping other structures intact. Moreover, by extending our method to the only available polyp localization algorithm tested on a public database, blood vessel mitigation proved to have a positive influence on the overall performance.

Paper Nr: 100
Title:

Combining Depth Information for Image Retargeting

Authors:

Huei-Yung Lin, Chin-Chen Chang and Jhih-Yong Huang

Abstract: This paper presents a novel image retargeting approach for ranging cameras. The proposed approach first extracts three feature maps: depth map, saliency map, and gradient map. Then, the depth map and the saliency map are used to separate the main contents and the background and thus compute a map of saliency objects. After that, the proposed approach constructs an importance map which combines the four feature maps by the weighted sum. Finally, the proposed approach constructs the target image using the seam carving method based on the importance map. Unlike previous approaches, the proposed approach preserves the salient object well and maintains the gradient and visual effects in the background. Moreover, it protects the salient object from being destroyed by the seam carving algorithm. The experimental results show that the proposed approach performs well in terms of the resized quality.

Paper Nr: 146
Title:

Robust Iris Segmentation under Unconstrained Settings

Authors:

João C. Monteiro, Hélder P. Oliveira, Ana F. Sequeira and Jaime S. Cardoso

Abstract: The rising challenges in the field of iris recognition, concerning the development of accurate recognition algorithms using images acquired under an unconstrained set of conditions, is leading to the a renewed interest in the area. Although several works already report excellent recognition rates, these values are obtained by acquiring images in very controlled environments. The use of such systems in daily security activities, such as airport security and bank account management, is therefore hindered by the inherent unconstrained nature under which images are to be acquired. The proposed work focused on mutual context information from iris centre and iris limbic contour to perform robust and accurate iris segmentation in noisy images. A random subset of the UBIRIS.v2 database was tested with a promising E1 classification rate of 0.0109.

Paper Nr: 180
Title:

Simple, Fast, Accurate Melanocytic Lesion Segmentation in 1D Colour Space

Authors:

F. Peruch, F. Bogo, M. Bonazza, M. Bressan, V. Cappelleri and E. Peserico

Abstract: We present a novel technique for melanocytic lesion segmentation, based on one-dimensional Principal Component Analysis (PCA) in colour space. Our technique is simple and extremely fast, segmenting highresolution images in a fraction of a second even with the modest computational resources available on a cell phone – an improvement of an order of magnitude or more over state-of-the-art techniques. Our technique is also extremely accurate: very experienced dermatologists disagree with its segmentations less than they disagree with the segmentations of all state-of-the-art techniques we tested, and in fact less than they disagree with the segmentations of dermatologists of moderate experience.

Paper Nr: 186
Title:

A JMVC-based Error Concealment Method for Stereoscopic Video

Authors:

Xiaorui Zhu, Li Zhuo and Xiaoqin Song

Abstract: When transmitted over error-prone environments, the stereoscopic video data may undergo transmission errors and loss. In order to improve the quality of the reconstructed video, a Joint Multi-view Coding (JMVC) based error concealment method for stereoscopic video is proposed in this paper. In this method, the errors in the independent view are concealed by traditional two-dimensional (2-D) video error concealment algorithm. For the lost macroblock (MB) in the dependent view, intra and inter-view correlation are utilized to conceal the errors based on the characteristics and its coding mode of the stereoscopic videos. Combined with related reference MBs’ partition mode, the lost MBs are divided into two types: smooth block and texture block. Smooth block is to be processed by improved Boundary Smooth Degree (BSD), while reconstruction of texture block is done by unit of 8×8 block with related pixel Sum of the Absolute Differences (SAD). Experimental results show that, compared with the conventional error concealment methods for stereoscopic video coding, the proposed method can achieve better subjective and objective performances.

Paper Nr: 193
Title:

DCT based Temporal Image Signature Approach

Authors:

Haroon Qureshi

Abstract: A Saliency map can be defined as a visual representation of a corresponding scene. Automatic detection of salient regions in a scene is an important challenge in the area of video analysis. In this paper, a combination of Image Signature based approach (IS) (Hou et al., 2012) and Temporal Spectral Residual (TSR) approach (Cui et al., 2009) is shown. This is done by extending Image Signature approach to the temporal domain. This approach allows us to detect visually prominent features separately and more reliably. These salient features together with subsequently applied image processing steps enable a detection of salient regions that are not clearly visible or highlighted in other state-of-the art approaches. It is also shown how the extracted saliency map can be used to detect more prominent details for mask segmentation.

Paper Nr: 216
Title:

A Robust 3D Shape Descriptor based on the Electrical Charge Distribution

Authors:

Fattah Alizadeh and Alistair Sutherland

Abstract: Defining a robust shape descriptor is an enormous challenge in the 3D model retrieval domain. Therefore, great deals of research have been conducted to propose new shape descriptors which meet the retrieving criteria. This paper proposes a new shape descriptor based on the distribution of electrical charge which holds valuable characteristics such as insensitivity to translation, sale and rotation, robustness to noise as well as simplification operation. After extracting the canonical form representation of the models, they are treated as surfaces placed in a free space and charge Q is distributed over them. Following to calculating the amount of charge on each face of the model, a set of concentric spheres enclose the model and the total amount of distributed charge between the adjacent spheres on the model’s surface generates the Charge Distribution Descriptor (CDD). A beneficial two-phase description using the number of Charged-Dense Patches for each model is utilized to boost the discrimination power of the system. The strength of our approach is verified using experiments on the McGill dataset. The results demonstrate higher ability of our system compared to other well-known approaches.

Paper Nr: 225
Title:

Compressed Domain Moving Object Detection based on H.264/AVC Macroblock Types

Authors:

Marcus Laumer, Peter Amon, Andreas Hutter and André Kaup

Abstract: This paper introduces a low complexity frame-based object detection algorithm for H.264/AVC video streams. The method solely parses and evaluates H.264/AVC macroblock types extracted from the video stream, which requires only partial decoding. Different macroblock types indicate different properties of the video content. This fact is used to segment a scene in fore- and background or, more precisely, to detect moving objects within the scene. The main advantage of this algorithm is that it is most suitable for massively parallel processing, because it is very fast and combinable with several other pre- and post-processing algorithms, without decreasing their performance. The actual algorithm is able to process about 3600 frames per second of video streams in CIF resolution, measured on an Intel R CoreTM i5-2520M CPU @ 2.5 GHz with 4 GB RAM.

Short Papers
Paper Nr: 10
Title:

Automatic Image Matting Fusing Time-of-Flight and Color Cameras Data Streams

Authors:

Piercarlo Dondi, Luca Lombardi, Andrea La Rosa and Luigi Cinque

Abstract: In this paper we present a new approach for automatic image matting. Image matting is a set of techniques related to the accurate classification of background and foreground in image and video sequences. It is not a new problem, matting techniques are currently a prerequisite of several applications of image and video editing, but almost all existing solutions require an interactive step with a human expert. Our proposal solves automatically the problem by the fusion of two different video streams: the color one, coming from a standard RGB camera and the depth one, produced by a Time-of-Flight device. The proposed method extends the Soft Scissors interactive algorithm: the main novelties of our approach are the complete automation of the matting process and the computational efficiency, obtained by the porting of the most computational intensive sections of algorithm on a CUDA architecture.

Paper Nr: 25
Title:

Evaluation and Comparison of Textural Feature Representation for the Detection of Early Stage Cancer in Endoscopy

Authors:

Arnaud A. A. Setio, Fons van der Sommen, Svitlana Zinger, Erik J. Schoon and Peter H. N. de With

Abstract: Esophageal cancer is the fastest rising type of cancer in the Western world. The novel technology of High Definition (HD) endoscopy enables physicians to find texture patterns related to early cancer. It encourages the development of a Computer-Aided Decision (CAD) system in order to help physicians with faster identification of early cancer and decrease the miss rate. However, an appropriate texture feature extraction, which is needed for classification, has not been studied yet. In this paper, we compare several techniques for texture feature extraction, including co-occurrence matrix features, LBP and Gabor features and evaluate their performance in detecting early stage cancer in HD endoscopic images. In order to exploit more image characteristics, we introduce an efficient combination of the texture and color features. Furthermore, we add a specific preprocessing step designed for endoscopy images, which improves the classification accuracy. After reducing the feature dimensionality using Principal Component Analysis (PCA), we classify selected features with a Support Vector Machine (SVM). The experimental results validated by an expert gastroenterologist show that the proposed feature extraction is promising and reaches a classification accuracy up to 96.48%.

Paper Nr: 39
Title:

Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations

Authors:

Babette Dellen, Farzad Husain and Carme Torras

Abstract: A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasks.

Paper Nr: 47
Title:

A New Evaluation Framework and Image Dataset for Keypoint Extraction and Feature Descriptor Matching

Authors:

Iñigo Barandiaran, Camilo Cortes, Marcos Nieto, Manuel Graña and Oscar E. Ruiz

Abstract: Key point extraction and description mechanisms play a crucial role in image matching, where several image points must be accurately identified to robustly estimate a transformation or to recognize an object or a scene. New procedures for keypoint extraction and for feature description are continuously emerging. In order to assess them accurately, normalized data and evaluation protocols are required. In response to these needs, we present a (1) new evaluation framework that allow assessing the performance of the state-of-the-art feature point extraction and description mechanisms, (2) a new image dataset acquired under controlled affine and photometric transformations and (3) a testing image generator. Our evaluation framework allows generating detailed curves about the performance of different approaches, providing a valuable insight about their behavior. Also, it can be easily integrated in many research and development environments. The contributions mentioned above are available on-line for the use of the scientific community.

Paper Nr: 49
Title:

Dense Multi-modal Registration with Structural Integrity using Non-local Gradients

Authors:

Sheshadri Thiruvenkadam

Abstract: In this work, the challenging problem of dense non-rigid registration [NRR] for multi-modal data is addressed. We look at a class of differentiable metrics based on weighted L2 distance of non-local image gradients. For intensity dependent choice of weights, the metric is seen to give enhanced multi-modal capability than using just gradients. In a variational dense deformation setting, the metric is coupled with non-local regularization to make the framework feature based. The above combination maintains the visual quality of the registered image, and gives a good correspondence for features of similar geometry under the challenges of noise, large motion, and presence of small structures. We also address computational speed ups of the energy minimization using an approximation scheme. The proposed approach is demonstrated on synthetic and medical data, and results are quantitatively compared with MI based, diffeomorphic NRR.

Paper Nr: 51
Title:

Detecting Focal Regions using Superpixels

Authors:

Richard Lowe and Mark Nixon

Abstract: We introduce a new method that can automatically determine regions of focus within an image. The focus is determined by generating Content-Driven Superpixels and subsequently exploiting consistency properties of scale-space. These superpixels can be analysed to produce the focal image regions. In our new analysis, Light-Field Photography provides an efficient method to test our algorithm in a controlled manner. An image taken with a light-field camera can be viewed from different perspectives and focal planes, and so by manually modifying the focal plane we can determine if the extracted focal areas are correctly extracted. We show improved results of our new approach compared with some prior techniques and demonstrate the advantages that our new approach can accrue.

Paper Nr: 73
Title:

Recurrence Matrices for Human Action Recognition

Authors:

V. Javier Traver, Pau Agustí and Filiberto Pla

Abstract: One important issue for action characterization consists of properly capturing temporally related information. In this work, recurrence matrices are explored as a way to represent action sequences. A recurrence matrix (RM) encodes all pair-wise comparisons of the frame-level descriptors. By its nature, a recurrence matrix can be regarded as a temporally holistic action representation, but it can hardly be used directly and some descriptor is therefore required to compactly summarize its contents. Two simple RM-level descriptors computed from a given recurrence matrix are proposed. A general procedure to combine a set of RM-level descriptors is presented. This procedure relies on a combination of early and late fusion strategies. Recognition performances indicate the proposed descriptors are competitive provided that enough training examples are available. One important finding is the significant impact on performance of both, which feature subsets are selected, and how they are combined, an issue which is generally overlooked.

Paper Nr: 91
Title:

Curvature-Scale-based Contour Understanding for Leaf Margin Shape Recognition and Species Identification

Authors:

Guillaume Cerutti, Laure Tougne, Didier Coquin and Antoine Vacavant

Abstract: In the frame of a tree species identifying mobile application, designed for a wide scope of users, and with didactic purposes, we developed a method based on the computation of explicit leaf shape descriptors inspired by the criteria used in botany. This paper focuses on the characterization of the leaf contour, the extraction of its properties, and its description using botanical terms. Contour properties are investigated using the Curvature-Scale Space representation, the potential teeth explicitly extracted and described, and the margin classified into a set of inferred shape classes. Results are presented for both margin shape characterization, and leaf classification over nearly 80 tree species.

Paper Nr: 132
Title:

A Dense Medial Descriptor for Image Analysis

Authors:

Matthew van der Zwan, Yuri Meiburg and Alexandru Telea

Abstract: We present dense medial descriptors, a new technique which generalizes the well-known medial axes to encode and manipulate whole 2D grayvalue images, rather than binary shapes. To compute our descriptors, we first reduce an image to a set of threshold-sets in luminance space. Next, we compute a simplified representation of each threshold-set using a noise-resistant medial axis transform. Finally, we use these medial axis transforms to perform a range of operations on the input image, from perfect reconstruction to segmentation, simplification, and artistic effects. Our pipeline can robustly handle any 2D grayscale image, is easy to use, and allows an efficient CPU or GPU-based implementation. We demonstrate our dense medial descriptors with several image-processing applications.

Paper Nr: 145
Title:

Segmentation of Crystal Defects via Local Analysis of Crystal Distortion

Authors:

Matt Elsey and Benedikt Wirth

Abstract: We propose a variational method to simultaneously detect dislocations and grain boundaries in an image of a crystal as well as the local crystal distortion. To this end we extract a distortion field F from the image which is nearly curl-free except at dislocations and grain boundaries. The sparsity of the curl is promoted by an L1-regularization of curlF in the employed energy functional. The structure of the functional admits a fast and parallelizable minimization such that fairly large images can be analyzed in a few minutes.

Paper Nr: 149
Title:

Sampled Multi-scale Color Local Binary Patterns

Authors:

Yu Zhang, Stéphane Bres and Liming Chen

Abstract: In this paper, we propose a novel representation, called sampled multi-scale color Local Binary Pattern (SMCLBP), and apply it to Visual Object Classes (VOC) Recognition. The Local Binary Pattern (LBP) has been proven to be effective for image representation, but it is too local to be robust. Meanwhile such a design cannot fully exploit the discriminative capacity of the features available and deal with various changes in lighting and viewing conditions in real-world scenes. In order to address these problems, we propose SMC-LBP, which randomly samples the neighboring pixels across different scale circles, instead of pixels from individual circular in the original LBP scheme. The proposed descriptor presents several advantages: (1) It encodes not only single scale but also multiple scales of image patterns, and hence provides a more complete image information than the original LBP descriptor; (2) It cooperates with color information, therefore its photometric invariance property and discriminative power is enhanced. The experimental results on the PASCAL VOC 2007 image benchmark show significant accuracy improvement by the proposed descriptor compared with both the original LBP and other popular texture descriptors.

Paper Nr: 160
Title:

Unposed Object Recognition using an Active Approach

Authors:

Wallace Lawson and J. Gregory Trafton

Abstract: Object recognition is a practical problem with a wide variety of potential applications. Recognition becomes substantially more difficult when objects have not been presented in some logical, “posed” manner selected by a human observer. We propose to solve this problem using active object recognition, where the same object is viewed from multiple viewpoints when it is necessary to gain confidence in the classification decision. We demonstrate the effect of unposed objects on a state-of-the-art approach to object recognition, then show how an active approach can increase accuracy. The active approach works by attaching confidence to recognition, prompting further inspection when confidence is low. We demonstrate a performance increase on a wide variety of objects from the RGB-D database, showing a significant increase in recognition accuracy.

Paper Nr: 162
Title:

Combining Holistic Descriptors for Scene Classification

Authors:

Kelly Assis de Souza Gazolli and Evandro Ottoni Teatini Salles

Abstract: Scene classification is an important issue in the field of computer vision. To face this problem we explore in this paper a combination of Holistic Descriptors to scene categorization task. Therefore, we first describe the Contextual Mean Census Transform (CMCT), an image descriptor that combines distribution of local structures with contextual information. CMCT is a holistic descriptor based on CENTRIST and, as CENTRIST, encodes the structural properties within an image and suppresses detailed textural information. Second, we present the GistCMTC, a combination of Contextual Mean Census Transform descriptor with Gist in order to generate a new holistic descriptor representing scenes more accurately. Experimental results on four used datasets demonstrate that the proposed methods could achieve competitive performance against previous methods.

Paper Nr: 175
Title:

An Accurate Hand Segmentation Approach using a Structure based Shape Localization Technique

Authors:

Jose M. Saavedra, Benjamin Bustos and Violeta Chang

Abstract: Hand segmentation is an important stage for a variety of applications such as gesture recognition and biometrics. The accuracy of the hand segmentation process becomes more critical in applications that are based on hand measurements as in the case of biometrics. In this paper, we present a very accurate hand segmentation technique, relying on both hand localization and color information. First, our proposal locates a hand on an input image, the hand location is then used to extract a training region which will play a critical role for segmenting the whole hand in an accurate way. We use a structure-based method (STELA), originally proposed for 3D model retrieval, for the hand localization stage. STELA exploits not only locality but also structural information of the hand image and does not require a large image collection for training. Second, our proposal separates the hand region from the background using the color information captured from the training region. In this way, the segmentation depends only on the user skin color. This segmentation approach allows us to handle a variety of skin colors and illumination conditions. In addition, our proposal is characterized by being fully automatic, where a user calibration stage is not required. Our results show a 100% in the hand localization process under different kinds of images and a very accurate hand segmentation achieving over 90% of correct segmentation at the expense of having only 5% for false positives.

Paper Nr: 178
Title:

Segmentation of Kinect Captured Images using Grid based 3D Connected Component Labeling

Authors:

Aniruddha Sinha, T. Chattopadhyay and Apurbaa Mallik

Abstract: In this paper authors have presented a grid based 3-Dimensional (3D) connected component labeling method to segment the video frames captured using Kinect RGB-D sensor. The Kinect captures the RGB value of the object as well as its depth using two different cameras/sensors. A calibration between these two sensors enables us to generate the point cloud (a 6 tuple entry containing the RGB values as well as its position along x, y and z directions with respect to the camera) for each pixel in the depth image. In the proposed method we initially construct the point clouds for all the pixels in the depth image. Then the space comprising the cloud points is divided into 3D grids and then label the components using the same index which are connected in the 3D space. The proposed method can segment the images even where the projection of two spatially different objects overlaps in the projected plane. We have tested the segmentation method against the HARL dataset with different grid size and obtained an overall segmentation accuracy of 83.8% for the optimum grid size.

Paper Nr: 188
Title:

Measuring Bitumen Coverage of Stones using a Turntable and Specular Reflections

Authors:

Hanna Källén, Anders Heyden and Per Lindh

Abstract: The durability of a road is among other factors dependent on the affinity between stones in the top layer and bitumen that holds the stones together. Poor adherence will cause stones to detach from the surface of the road more easily. The rolling bottle method is the standard way to determine the affinity between stones and bitumen. In this test a number of stones covered in bitumen are put in a rolling bottle filled with water. After rolling a number of hours the bitumen coverage are estimated by visually investigating the stones. This paper describes a method for automatic estimation of the degree of bitumen coverage using image analysis instead of manual inspection. The proposed method is based on the observation that bitumen reflects light much better than raw stones. In this paper we propose a method based on the reflections to estimate the degree of bitumen coverage. The stones are put on a turntable which is illuminated and a camera is placed straight above the stones. Turning the table will illuminate different sides of the stones and cause reflections on different part of the images. The results are compared to manual inspection and are well in agreement with these.

Paper Nr: 194
Title:

A PCA and Statistic based Approach for Modeling Contextual and Operational TV News Characteristics

Authors:

Tarek Zlitni and Walid Mahdi

Abstract: The exploitation of numerical TV streams still raises particular difficulties. Indeed, characterized by their opaque contents, these streams represent a source of multimedia information whose access still requires pattern recognition technologies either for the sound or for the image. The goal being to improve the exploitation of this source of information by an easier access and diffusion on the mass-communication devices (i.e. TV programs Anytime Anywhere, TV-replay), and the Community platforms (Facebook, Twitter, Youtube, Dailymotion…). In this article, we suggest an automatic structuring approach of TV news programs into topics. The originality of the approach is initially the use at the same time of the contextual and operational characteristics which govern the organization of the contents of television news; then the modelling of these characteristics by PCA and statistical models.

Paper Nr: 197
Title:

DITEC - Experimental Analysis of an Image Characterization Method based on the Trace Transform

Authors:

Igor G. Olaizola, Iñigo Barandiaran, Basilio Sierra and Manuel Graña

Abstract: Global and local image feature extraction is one of the most common tasks in computer vision since they provide the basic information for further processes, and can be employed on several applications such as image search & retrieval, object recognition, 3D reconstruction, augmented reality, etc. The main parameters to evaluate a feature extraction algorithm are its discriminant capability, robustness and invariance behavior to certain transformations. However, other aspects such as computational performance or provided feature length can be crucial for domain specific applications with specific constraints (real-time, massive datasets, etc.). In this paper, we analyze the main characteristics of the DITEC method used both as global and local descriptor method. Our results show that DITEC can be effectively applied in both contexts.

Paper Nr: 203
Title:

A 2D Matching Method for Reconstruction of 3D Proximal Femur using X-ray Images

Authors:

Sonia Akkoul, Adel Hafiane, Rémy Leconge, Eric Lespessailles and Rachid Jennane

Abstract: The femur shape reconstruction from a limited number of 2D X-ray images is a challenging task but it is desired as it lowers both the acquisition costs and the radiation dose. The aim of this paper is to use a small number of 2D X-ray images to reconstruct a 3D proximal femur surface without any prior acknowledge of the shape model. The proposed method combines a 2D binary contour points coordinates and their normals to find the best matching between 2D point pairs. The obtained results are promising. The estimated error shows that it is possible to rebuild the proximal femur shape from a limited number of radiographs.

Paper Nr: 204
Title:

On Line-based Homographies in Urban Environments

Authors:

Nils Hering, Lutz Priese and Frank Schmitt

Abstract: This paper contributes to matching and registration in urban environments. We develop a new method to extract only those line segments that belong to contours of buildings. This method includes vanishing point detection, removal of wild structures, sky and roof detection. A registration of two building facades is achieved by computing a line-based homography between both. The known crucial instability of line homography is overcome with a line-panning and iteration technique. This homography approach is also able to separate single facades in a building.

Paper Nr: 265
Title:

Top-Down Visual Attention with Complex Templates

Authors:

Jan Tünnermann, Christian Born and Bärbel Mertsching

Abstract: Visual attention can support autonomous robots in visual tasks by assigning resources to relevant portions of an image. In this biologically inspired concept, conspicuous elements of the image are typically determined with regard to different features such as color, intensity or orientation. The assessment of human visual attention suggests that these bottom-up processes are complemented – and in many cases overruled – by top-down influences that modulate the attentional focus with respect to the current task or a priori knowledge. In artificial attention, one branch of research investigates visual search for a given object within a scene by the use of top-down attention. Current models require extensive training for a specific target or are limited to very simple templates. Here we propose a multi-region template model that can direct the attentional focus with respect to complex target appearances without any training. The template can be adaptively adjusted to compensate gradual changes of the object’s appearance. Furthermore, the model is integrated with the framework of region-based attention and can be combined with bottom-up saliency mechanisms. Our experimental results show that the proposed method outperforms an approach that uses single-region templates and performs equally well as state-of-the-art feature fusion approaches that require extensive training.

Paper Nr: 274
Title:

Automatic Pill Identification from Pillbox Images

Authors:

David E. Madsen, Katie S. Payne, Jason Hagerty, Nathan Szanto, Mark Wronkiewicz, Randy H. Moss and William V. Stoecker

Abstract: There is a vital need for fast and accurate recognition of medicinal tablets and capsules. Efforts to date have centered on automatic segmentation, color and shape identification. Our system combines these with pre-processing before imprint recognition. Using the National Library of Medicine Pillbox database, regression analysis applied to automatic color and shape recognition allows for successful pill identification. Measured errors for the subtasks of segmentation and color recognition for this database are 1.9% and 2.2%, respectively. Imprint recognition with optical character recognition (OCR) is key to exact pill ID, but remains a challenging problem, therefore overall recognition accuracy is not yet known.

Posters
Paper Nr: 7
Title:

Medical Volume Segmentation based on Level Sets of Probabilities

Authors:

Yugang Liu and Yizhou Yu

Abstract: In this paper, we present a robust and accurate method for biomedical image segmentation using level sets of probabilities. The level set method is a popular technique in biomedical image segmentation. Our method integrates a probabilistic classifier with the level set method, making the level set method less vulnerable to local minima. Given the local attributes within a neighborhood of a voxel, this classifier outputs an estimated likelihood of the voxel being part of an object of interest. Our method obtains a posterior probabilistic mask of the object of interest according to such estimated likelihoods, an edge field and a smoothness prior. We further alternate classifier training and the level set method to improve the performance of both. We have successfully applied our method to the segmentation of various organs and tissues in the Visible Human dataset. Experiments and comparisons demonstrate our method can accurately extract volumetric objects of interest, and outperforms traditional levelset-based segmentation algorithms.

Paper Nr: 33
Title:

Semi-automatic Endocardium Segmentation in Cine MRI based on Robust Description and Matching of Interest Points

Authors:

Manuel Grand-Brochier, Christophe Tilmant and Michel Dhome

Abstract: In this paper, we propose a new method of semi-automatic endocardium segmentation, based on an analysis of local interest points coupled with an estimation of its deformation by Thin-Plate Spline. On the medical point of view, this method allows to study the activity of a cardiac patient. In order to apply our approach on real sequences, we use the database miccai (following the challenge MICCAI 2009), which also provided a ground truth (segmentation by an expert). This approach allows us, on the one hand, to segment the phase such that dyastole and systole, and on the other hand, to track the deformation undergone by the endocardium during a cardiac cycle. Finally, we validate this method by comparison with a segmentation expert plot.

Paper Nr: 34
Title:

A Texture-based Classification Method for Proteins in Two-Dimensional Electrophoresis Gel Images - A Feature Selection Method using Support Vector Machines and Genetic Algorithms

Authors:

Carlos Fernandez-Lozano, Jose A. Seoane, Marcos Gestal, Daniel Rivero, Julian Dorado and Alejandro Pazos

Abstract: In this paper, the influence of textural information is studied in two-dimensional electrophoresis gel images. A Genetic Algorithm-based feature selection technique is used in order to select the most representative textural features and reduced the original set (296 feat.) to a more efficient subset. Such a method makes use of a Support Vector Machines classifier. Different experiments have been performed, the pattern set has been divided into two parts (training and validation) extracting a total of 30%, 20% and 0% of the training data, and a 10-fold cross validation is used for validation. In case of extracting 0% means that training set is used for validation. For each division 10 different trials have been done. Experiments have been carried out in order to measure the behaviour of the system and to achieve the most representative textural features for the classification of proteins in two-dimensional gel electrophoresis images. This information can be useful for a protein segmentation process.

Paper Nr: 38
Title:

Iterative Human Segmentation from Detection Windows using Contour Segment Analysis

Authors:

Cyrille Migniot, Pascal Bertolino and Jean-Marc Chassery

Abstract: This paper presents a new algorithm for human segmentation in images. The human silhouette is estimated in positive windows that are already obtained with an existing efficient detection method. This accurate segmentation uses the data previously computed in the detection. First, a pre-segmentation step computes the likelihood of contour segments as being a part of a human silhouette. Then, a contour segment oriented graph is constructed from the shape continuity cue and the prior cue obtained by the pre-segmentation. Segmentation is so posed as the computation of the shortest-path cycle which corresponds to the human silhouette. Additionally, the process is achieved iteratively to eliminate irrelevant paths and to increase the segmentation performance. The approach is tested on a human image database and the segmentation performance is evaluated quantitatively.

Paper Nr: 40
Title:

Drowsiness Detection based on Video Analysis Approach

Authors:

Belhassen Akrout, Walid Mahdi and Abdelmajid Ben Hamadou

Abstract: The lack of concentration due to the driver fatigue is a major cause that justifies the high number of accidents. This article describes a new approach to detect reduced alertness automatically from a system based on video analysis, to prevent the driver and also to reduce the number of accidents. Our approach is based on the temporal analysis of the state of opening and closing the eyes. Unlike many other works, our approach is based only on the analysis of geometric features captured form faces video sequence and does not need any elements linked to the human being.

Paper Nr: 42
Title:

An Entropy-based Method for Color Image Registration

Authors:

Shu-Kai S. Fan and Yu-Chiang Chuang

Abstract: In this paper, an entropy-based objective function is developed according to the histogram of the color intensity difference data. The proposed registration method is to orientate the sensed image toward the reference image by minimizing the entropy of the color intensity differences by iteratively updating the parameters of the similarity transformation. For performance evaluation, the proposed method is compared to two noted registration methods in terms of a suite of test images. The experimental study is conducted to verify the effectiveness of the proposed method. Through the experimental results, the proposed method is shown to be very effective in image registration and outperforms the other two methods in terms of the test image sets.

Paper Nr: 44
Title:

Neural Network Adult Videos Recognition using Jointly Face Shape and Skin Feature Extraction

Authors:

Hajar Bouirouga, Sanaa Elfkihi, Abdeilah JIlbab and Driss Aboutajdine

Abstract: This paper presents a novel approach for video adult detection using face shape, skin threshold technique and neural network. The goal of employing skin-color information is to select the appropriate color model that allows verifying pixels under different lighting conditions and other variations. Then, the output videos are classified by neural network. The simulation shows that this system achieved 95.4% of the true rate.

Paper Nr: 70
Title:

Gradient Color Tensor based Approach for Spectral Matting

Authors:

Adam Ghorbel, Marwen Nouri and Emmanuel Marilly

Abstract: Image matting aims to extract foreground objects from a given image in a fuzzy mode. One of the major state-of-the-art methods in this field is spectral matting. It automatically computes fuzzy matting components by using the smallest eigenvectors of a defined Laplacian matrix that is generated from affinities computation between adjacent pixels in an image. Results obtained by such approach are coarsely related to the ability of defining an affinity matrix that it should be able to well separate between different pixels’ clusters. To accomplish better matting and get better results, we propose a new spectral matting approach. We use a color tensor gradient of color images in order to enhance the affinity computation process.

Paper Nr: 71
Title:

Dual-mode Detection for Foreground Segmentation in Low-contrast Video Images

Authors:

Du-Ming Tsai and Wei-Yao Chiu

Abstract: In video surveillance, the detection of foreground objects in an image sequence from a still camera is critical for object tracking, activity recognition, and behavior understanding. In this paper, a dual-mode scheme for foreground segmentation is proposed. The mode is based on the most frequently occurring gray level of observed consecutive image frames, and is used to represent the background in the scene. In order to accommodate the dynamic changes of a background, the proposed method uses a dual-mode model for background representation. The dual-mode model can represent two main states of the background and detect a more complete silhouette of the foreground object in the dynamic background. The proposed method can promptly calculate the exact gray-level mode of individual pixels in image sequences by simply dropping the last image frame and adding the current image in an observed period. The comparative evaluation of foreground segmentation methods is performed on the Microsoft’s Wallflower dataset. The results show that the proposed method can quickly respond to illumination changes and well extract foreground objects in a low-contrast background.

Paper Nr: 75
Title:

An Image Segmentation Assessment Tool ISAT 1.0

Authors:

Anton Mazhurin and Nawwaf Kharma

Abstract: This paper presents algorithms and their software implementation, which assess the quality of segmentation of any image, given an ideal segmentation (or ground truth image) and a usually less-than-ideal segmentation result (or machine segmented image). The software first identifies every region in both the ground truth and machine segmented images, establishes as much correspondence as possible between the images, then computes two sets of measures of quality: one, region-based and the other, pixel-based. The paper describes the algorithms used to assess quality of segmentation and presents results of the application of the software to images from the Berkeley Segmentation Dataset. The software, which is freely available for download, facilitates R&D work in image segmentation, as it provides a tool for assessing the results of any image segmentation algorithm, allowing developers of such algorithms to focus their energies on solving the segmentation problem, and enabling them to tests large sets of images, swiftly and reliably.

Paper Nr: 92
Title:

Template Matching for Detection of Starry Milia-Like Cysts in Dermoscopic Images

Authors:

Viswanaath Subramanian, Randy H. Moss, Ryan K. Rader, Sneha K. Mahajan and William V. Stoecker

Abstract: Early detection of melanoma by magnified visible-light imaging (dermoscopy) is hindered by lesions which mimic melanoma. Automatic discrimination of melanoma from mimics could allow detection of melanoma at an earlier stage. Seborrheic keratoses are common mimics; these have distinctive bright structures: starry milia-like cysts (MLCs). We report discrimination of MLCs from mimics by features extracted from starry MLC (star) candidates. After pre-processing, 2D template matching is optimized with respect to star template size, histogram pre-processing, and 2D statistics. The novel aspects of this research were new details for region of interest (ROI) analysis of the centers of the star candidate, a new method for determining shape of hazy objects and multiple template matching, using unprocessed ROIs, shape-limited ROIs, and histogram-equalized ROIs. Features retained in the final model for the decision MLC vs. mimic by logistic regression include star size, 2D first correlation coefficient, correlation coefficient to the star shape template, equalized correlation coefficient, relative star brightness, and statistical features at the star center. These methods allow optimization of MLC features found by 2D template correlation. This research confirms the importance of fine ROI features and ROI neighborhoods in medical imaging.

Paper Nr: 115
Title:

On the Detection and Matching of Structures on Less-textured Scenes

Authors:

Wan-Lei Zhao, Wonmin Byeon and Thomas M. Breuel

Abstract: Due to the lack of non-zero gradients around the structures in the less textured scenes, current local feature can hardly be applied in less textured object detection. To deal with this issue, two types of local structures, namely, corner and closed region are proposed in this paper. They are based on purely object contours, which are easier to obtain in less textured scenes. Compare to existing detectors, these features describe objects’ local structures in a better way. In addition, these new type of local structures also bring the advantage that allows us to have different level of abstraction on the object structures. Its effectiveness has been evaluated under various transformations.

Paper Nr: 134
Title:

Object Colour Extraction for CCTV Video Annotation

Authors:

Muhammad Fraz, Iffat Zafar and Eran Edirisinghe

Abstract: In this paper, we have addressed the problem of object colour extraction in CCTV videos and proposed a frame work for efficient extraction of object colours by minimizing the effect of variable illumination. CCTV videos are generally very low quality videos due to significant presence of factors like noise, variable illumination, colour of light source, poor contrast, camera calibration etc. The proposed frame work makes use of conventional Grey World (GW) Colour Constancy (CC) method to reduce the effect of variable illumination. We have proposed a novel technique for the enhancement of colour information in video frames. The framework improves the results of colour constancy system while maintaining the actual colour balance within the image. Colour extraction has been done by quantizing HSV space into bins along ‘Hue’, ‘Value’ and ‘Saturation’. A novel set of procedures has also been proposed to fine tune the extraction of white colour. Finally, temporal accumulation of results is performed to increase the accuracy of extraction. The proposed system achieves accuracy up to 93% when tested on a comprehensive CCTV test dataset.

Paper Nr: 222
Title:

Linear Plane Border - A Primitive for Range Images Combining Depth Edges and Surface Points

Authors:

David Jimenez-Cabello, Sven Behnke and Daniel Pizarro Perez

Abstract: Detecting primitives, like lines and planes, is a popular first step for the interpretation of range images. Real scenes are, however, often cluttered and range measurements are noisy, such that the detection of pure lines and planes is unreliable. In this paper, we propose a new primitive that combines properties of planes and lines: Linear Plane Borders (LPB). These are planar stripes of a certain width that are delineated at one side by a linear edge (i.e. depth discontinuity). The design of this primitive is motivated by the contours of many man-made objects. We extend the J-Linkage algorithm to robustly detect multiple LPBs in range images from noisy sensors. We validated our method using qualitative and quantitative experiments with real scenes.

Paper Nr: 242
Title:

Using Visual Attention in a CBIR System - Experimental Results on Landmark and Object Recognition Tasks

Authors:

Franco Alberto Cardillo, Giuseppe Amato and Fabrizio Falchi

Abstract: Many novel applications in the field of object recognition and pose estimation have been built relying on local invariant features extracted from key points that rely on high-contrast regions of the images. The visual saliency of the those regions is not considered by state-of-the art detection algorithms that assume the user is interested in the whole image. In this paper we present the experimental results of the application of a biologically-inspired model of visual attention to the problem of local feature selection in landmark and object recognition tasks. The results show that the approach improves the accuracy of the classifier in the object recognition task and preserves a good accuracy in the landmark recognition task.

Paper Nr: 243
Title:

An Active Contour Model with Improved Shape Priors using Fourier Descriptors

Authors:

Fareed Ahmed, Huu Dien Khue Le, Julien Olivier and Romuald Boné

Abstract: Snakes or active contours are widely used for image segmentation. There are many different implementations of snakes. No matter which implementation is being employed, the segmentation results suffer greatly in presence of occlusions, noise, concavities or abnormal modification of shape. If some prior knowledge about the shape of the object is available, then its addition to an existing model can greatly improve the segmentation results. In this work inclusion of such shape constraints for explicit active contours is presented. These shape priors are introduced through the use of Fourier based descriptors which makes them invariant to the translation, scaling and rotation factors and enables the deformable model to converge towards the prior shape even in the presence of occlusion and context noise. These shape constraints have been computed in descriptor space so no reconstruction is required. Experimental results clearly indicate that the inclusion of these shape priors greatly improved the segmentation results in comparison with the original snake model.

Paper Nr: 261
Title:

3D Corner Detection and Matching for Manmade Scene/Object Structure Cognition

Authors:

Jiao Tian and Derek Molloy

Abstract: In this paper, we describe a novel framework for 3D corner detection and matching. The proposed method is based on the assumption that the viewed scene contains definite planar surfaces. The contribution of our method is the integration of constraints imposed by the existing planes and the local feature matches to achieve improved plane decomposition and also optimal feature grouping. We describe the foundation of the framework and show how it can be employed in applications including 3D reconstruction, plane extraction and robot navigation. The effectiveness of our framework is validated through experimentations on synthetic 3D object and real architecture images.

Paper Nr: 264
Title:

Can Feature Points Be Used with Low Resolution Disparate Images? - Application to Postcard Data Set for 4D City Modeling

Authors:

Lara Younes, Barbara Romaniuk and Eric Bittar

Abstract: We propose an experimental design for the comparison of state-of-the art feature detector-descriptor combination. Our aim is to rank potential detector-descriptor that best performs in our project. We deal with disparate images that represent building evolution of the city of Rheims over the time. We obtained promising results for matching buildings that evolve temporally.

Paper Nr: 283
Title:

Contour-based Shape Recognition using Perceptual Turning Points

Authors:

Loke Kar Seng

Abstract: This paper presents a new biological and psychologically motivated edge contour feature that could be used for shaped based object recognition. Our experiments indicate that this new feature perform as well or better than existing methods. This method have the advantage that computation is comparatively is simpler.

Paper Nr: 284
Title:

The Median Split Algorithm for Detection of Critical Melanoma Color Features

Authors:

Kaushik V. S. N. Ghantasala, Raeed H. Chowdhury, Uday Guntupalli, Jason Hagerty, Randy H. Moss, Ryan K. Rader and William V. Stoecker

Abstract: Detection of melanoma remains an empirical clinical science. New tools for automatic discrimination of melanoma from benign lesions in digitized dermoscopy images may allow an improvement in early detection of melanoma. This research implements a fast version of the median split algorithm in an open source format and applied to four-color splitting of the lesion area to capture the architectural disorder apparent in melanoma colors. Our version of the median split algorithm splits colors along the color axis with maximum Range. For a set of 888 dermoscopy images, the best model for discrimination produces an area under the receiver operating characteristic curve of 0.821. Logistic regression analysis of 242 parameter variables obtained from 888 images shows that the most important features in the final model, measured by Wald Chi-square significance, are the lengths of two peripheral inter-color boundaries and one measure of boundary overlay by different colors. The median split algorithm is fast, requiring less than one second per image and only a four-color splitting, but it captures sufficient critical information regarding color disorder, with peripheral inter-color boundaries showing the highest significance for melanoma discrimination.

Paper Nr: 285
Title:

Keypoints Detection in RGB-D Space - A Hybrid Approach

Authors:

Nizar Sallem, Michel Devy, Radu Rusu and Suat Gedikili

Abstract: Features detection is an important technique of image processing which aim is to find a subset, often discrete, of a query image satisfying uniqueness and discrimination criteria so that an image can be abstracted to the computed features. Detected features are then used in video indexing, registration, object and scene reconstruction, structure from motion, etc. In this article we discuss the definition and implementation of such features in the RGB-Depth space RGB-D.We focus on the corners as they are the most used features in image processing. We show the advantage of using 3D data over image only techniques and the power of combining geometric and colorimetric information to find corners in a scene.

Paper Nr: 310
Title:

An Efficient Method for Surface Registration

Authors:

Tomislav Pribanic, Yago Diez, Sergio Fernandez and Joaquim Salvi

Abstract: 3D object data acquired from different viewpoints are usually expressed in different spatial coordinate systems where systems’ spatial relations are defined by Euclidean transformation parameters: three rotation angles and a translation vector. The computation of those Euclidean parameters is a task of surface registration. In a nutshell all registration methods revolve around two goals: first how to extract the most reliable features for correspondence search between views in order to come up with the set of candidate solutions, secondly how to quickly pinpoint the best, i.e. satisfying, solution. Occasionally some registration method expects also other data, e.g. normal vectors, to be provided besides 3D position data. However, no method assumed the possibility that part of Euclidean parameters could be reliably known in advance. Acknowledging technology advancements we argue that it become relatively convenient to include in 3D reconstruction system some inertial sensor which readily provides info about data orientation. Assuming that such data is provided, we demonstrate a simple, but yet time efficient and accurate registration method.

Paper Nr: 321
Title:

Automatic Detection of Skin Cancer - Current Status, Path for the Future

Authors:

William V. Stoecker, Nabin Mishra, Robert LeAnder, Ryan K. Rader and R. Joe Stanley

Abstract: How far are we away from a Star-Trek-like device that can analyze a lesion and assess its malignancy? We review the main challenges in this field in light of the Blois paradigm of clinical judgment and computers. The research community has failed to adequately address several challenges ripe for the application of digital technology: 1) early detection of changing lesions, 2) detection of non-melanoma skin cancers, and 3) detection of benign melanoma mimics. We highlight a new device and recent image analysis advances in abnormal color and texture detection. Anthropomorphic paradigms can be applied to machine vision. Data fusion has the potential to move automatic diagnosis of skin lesions closer to clinical practice. The fusion of Blois’ high-level clinical information with low-level image data can yield high sensitivity and specificity. Synergy between detection devices and humans can get us closer to this Star-Trek-like device.

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 13
Title:

Facial Landmarks Localization Estimation by Cascaded Boosted Regression

Authors:

Louis Chevallier, Jean-Ronan Vigouroux, Alix Goguey and Alexey Ozerov

Abstract: Accurate detection of facial landmarks is very important for many applications like face recognition or analysis. In this paper we describe an efficient detector of facial landmarks based on a cascade of boosted regressors of arbitrary number of levels. We define as many regressors as landmarks and we train them separately. We describe how the training is conducted for the series of regressors by supplying training samples centered on the predictions of the previous levels. We employ gradient boosted regression and evaluate three different kinds of weak elementary regressors, each one based on Haar features: non parametric regressors, simple linear regressors and gradient boosted trees. We discuss trade-offs between the number of levels and the number of weak regressors for optimal detection speed. Experiments performed on three datasets suggest that our approach is competitive compared to state-of-the art systems regarding precision, speed as well as stability of the prediction on video streams.

Paper Nr: 23
Title:

Skeleton Point Trajectories for Human Daily Activity Recognition

Authors:

Adrien Chan-Hon-Tong, Nicolas Ballas, Catherine Achard, Bertrand Delezoide, Laurent Lucat, Patrick Sayd and Françoise Prêteux

Abstract: Automatic human action annotation is a challenging problem, which overlaps with many computer vision fields such as video-surveillance, human-computer interaction or video mining. In this work, we offer a skeleton based algorithm to classify segmented human-action sequences. Our contribution is twofold. First, we offer and evaluate different trajectory descriptors on skeleton datasets. Six short term trajectory features based on position, speed or acceleration are first introduced. The last descriptor is the most original since it extends the well-known bag-of-words approach to the bag-of-gestures ones for 3D position of articulations. All these descriptors are evaluated on two public databases with state-of-the art machine learning algorithms. The second contribution is to measure the influence of missing data on algorithms based on skeleton. Indeed skeleton extraction algorithms commonly fail on real sequences, with side or back views and very complex postures. Thus on these real data, we offer to compare recognition methods based on image and those based on skeleton with many missing data.

Paper Nr: 31
Title:

Linear Subspace Learning based on a Learned Discriminative Dictionary for Sparse Coding

Authors:

Shibo Gao, Yizhou Yu and Yongmei Cheng

Abstract: Learning linear subspaces for high-dimensional data is an important task in pattern recognition. A modern approach for linear subspace learning decomposes every training image into a more discriminative part (MDP) and a less discriminative part (LDP) via sparse coding before learning the projection matrix. In this paper, we present a new linear subspace learning algorithm through discriminative dictionary learning. Our main contribution is a new objective function and its associated algorithm for learning an overcomplete discriminative dictionary from a set of labeled training examples. We use a Fisher ratio defined over sparse coding coefficients as the objective function. Atoms from the optimized dictionary are used for subsequent image decomposition. We obtain local MDPs and LDPs by dividing images into rectangular blocks, followed by blockwise feature grouping and image decomposition. We learn a global linear projection with higher classification accuracy through the local MDPs and LDPs. Experimental results on benchmark face image databases demonstrate the effectiveness of our method.

Paper Nr: 104
Title:

Social Cues in Group Formation and Local Interactions for Collective Activity Analysis

Authors:

Khai N. Tran, Apurva Bedagkar-Gala, Ioannis A. Kakadiaris and Shishir K. Shah

Abstract: This paper presents a novel and efficient framework for group activity analysis. People in a scene can be intuitively represented by an undirected graph where vertices are people and the edges between two people are weighted by how much they are interacting. Social signaling cues are used to describe the degree of interaction between people. We propose a graph-based clustering algorithm to discover interacting groups in crowded scenes. The grouping of people in the scene serves to isolate the groups engaged in the dominant activity, effectively eliminating dataset contamination. Using discovered interacting groups, we create a descriptor capturing the motion and interaction of people within it. A bag-of-words approach is used to represent group activity and a SVM classifier is used for activity recognition. The proposed framework is evaluated in its ability to discover interacting groups and perform group activity recognition using two public datasets. The results of both the steps show that our method outperforms state-of-the-art methods for group discovery and achieves recognition rates comparable to state-of-the-art methods for group activity recognition.

Paper Nr: 171
Title:

Extension of Robust Principal Component Analysis for Incremental Face Recognition

Authors:

Haïfa Nakouri and Limam Mohamed

Abstract: Face recognition performance is highly affected by image corruption, shadowing and various face expressions. In this paper, an efficient incremental face recognition algorithm, robust to image occlusion, is proposed. This algorithm is based on robust alignment by sparse and low-rank decomposition for linearly correlated images, extended to be incrementally applied for large face data sets. Based on the latter, incremental robust principal component analysis (PCA) is used to recover the intrinsic data of a sequence of images of one subject. A new similarity metric is defined for face recognition and classification. Experiments on five databases, based on four different criteria, illustrate the efficiency of the proposed method. We show that our method outperforms other existing incremental PCA approaches such as incremental singular value decomposition, add block singular value decomposition and candid covariance-free incremental PCA in terms of recognition rate under occlusions, facial expressions and image perspectives.

Paper Nr: 206
Title:

Automated Classification of Therapeutic Face Exercises using the Kinect

Authors:

Cornelia Lanz, Birant Sibel Olgay, Joachim Denzler and Horst-Michael Gross

Abstract: In this work, we propose an approach for the unexplored topic of therapeutic facial exercise recognition using depth images. In cooperation with speech therapists, we determined nine exercises that are beneficial for therapy of patients suffering from dysfunction of facial movements. Our approach employs 2.5D images and 3D point clouds, which were recorded using Microsoft’s Kinect. Extracted features comprise the curvature of the face surface and characteristic profiles that are derived using distinctive landmarks. We evaluate the discriminative power and the robustness of the features with respect to the above-mentioned application scenario. Using manually located face regions for feature extraction, we achieve an average recognition accuracy of about 91% for the nine facial exercises. However in a real-world scenario manual localization of regions for feature extraction is not feasible. Therefore, we additionally examine the robustness of the features and show, that they are beneficial for a real-world, fully automated scenario as well.

Paper Nr: 217
Title:

Vision-based Hand Pose Estimation - A Mixed Bottom-up and Top-down Approach

Authors:

Davide Periquito, Jacinto C. Nascimento, Alexandre Bernardino and João Sequeira

Abstract: Tracking a human hand position and orientation in image sequences is nowadays possible with local search methods, given that a good initialization is provided and that the hand pose and appearance have small frame-to-frame changes. However, if the target moves too quickly or disappears from the field of view, reinitialization of the tracker is necessary. Fully automatic initialization is a very challenging problem due to multiple factors, including the difficulty in identifying landmarks on individual fingers and reconstructing the hand pose from their position. In this paper, we propose an appearance based approach to generate candidates for hand postures given a single image. The method is based on matching hand silhouettes to a previously trained database, therefore circumventing the need for explicit geometric pose reconstruction. A dense sampling of the hand appearance space is obtained through a simulation environment and the corresponding silhouettes stored in a database. In run time, the acquired silhouettes are efficiently retrieved from the database using a mixture of bottom-up and top-down processes. We assess the performance of our approach in a series of simulations, evaluating the influence of the bottom-up and top-down processes in terms of estimation error and computation time, and show promising results obtained with real sequences.

Short Papers
Paper Nr: 4
Title:

Detection of Symmetry Points in Images

Authors:

Christoph Dalitz, Regina Pohle-Fröhlich and Tobias Bolten

Abstract: This article proposes a new method for detecting symmetry points in images. Like other symmetry detection algorithms, it assigns a “symmetry score” to each image point. Our symmetry measure is only based on scalar products between gradients and is therefore both easy to implement and of low runtime complexity. Moreover, our approach also yields the size of the symmetry region without additional computational effort. As both axial symmetries as well as some rotational symmetries can result in a point symmetry, we propose and evaluate different methods for identifying the rotational symmetries. We evaluate our method on two different test sets of real world images and compare it to several other rotational symmetry detection methods.

Paper Nr: 19
Title:

Visual Estimation of Object Density Distribution through Observation of its Impulse Response

Authors:

Artashes Mkhitaryan and Darius Burschka

Abstract: In this paper we introduce a novel vision based approach for estimating physical properties of an object such as its center of mass and mass distribution. Passive observation only allows to approximate the center of mass with the centroid of the object. This special case is only true for objects that consist of one material and have unified mass distribution. We introduce an active interaction technique with the object derived from the analogon to system identification with impulse functions. We treat the object as a black box and estimate its internal structure by analyzing the response of the object to external impulses. The impulses are realized by striking the object at points computed based on its external geometry. We determine the center of mass from the profile of the observed angular motion of the object that is captured by a high frame-rate camera. We use the motion profiles from multiple strikes to compute the mass distribution. Knowledge of these properties of the object leads to more energy efficient and stable object manipulation. As we show in our real world experiments, our approach is able to estimate the intrinsic layered density structure of an object.

Paper Nr: 41
Title:

Unsupervised Feature Learning using Self-organizing Maps

Authors:

Marco Vanetti, Ignazio Gallo and Angelo Nodari

Abstract: In recent years a great amount of research has focused on algorithms that learn features from unlabeled data. In this work we propose a model based on the Self-Organizing Map (SOM) neural network to learn features useful for the problem of automatic natural images classification. In particular we use the SOM model to learn single-layer features from the extremely challenging CIFAR-10 dataset, containing 60.000 tiny labeled natural images, and subsequently use these features with a pyramidal histogram encoding to train a linear SVM classifier. Despite the large number of images, the proposed feature learning method requires only few minutes on an entry-level system, however we show that a supervised classifier trained with learned features provides significantly better results than using raw pixels values or other handcrafted features designed specifically for image classification. Moreover, exploiting the topological property of the SOM neural network, it is possible to reduce the number of features and speed up the supervised training process combining topologically close neurons, without repeating the feature learning process.

Paper Nr: 52
Title:

A Prior-knowledge based Casted Shadows Prediction Model Featuring OpenStreetMap Data

Authors:

M. Rogez, L. Tougne and L. Robinault

Abstract: We present a prior-knowledge based shadow prediction model, focused on outdoors scene, which allows to predict pixels, on the camera, which are likely to be part of shadows casted by surrounded buildings. We employ a geometrical approach which models surrounding buildings, their shadow and the camera. One innovative aspect of our method is to retrieve building datas automatically from OpenStreetMap, a community project providing free geographic data. We provide both qualitative and quantitative results in two different contexts to assess performance of our prediction model. While our method cannot achieve pixel precision easily alone, it opens opportunities for more elaborate shadow detection algorithms and occlusion-aware models.

Paper Nr: 61
Title:

Fusion of Color and Depth Camera Data for Robust Fall Detection

Authors:

Wouter Josemans, Gwenn Englebienne and Ben Kröse

Abstract: The availability of cheap imaging sensors makes it possible to increase the robustness of vision-based alarm systems. This paper explores the benefit of data fusion in the application of fall detection. Falls are a common source of injury for elderly people and automatic fall detection is, therefore, an important development in automated home care. We first evaluate a skeleton-based classification method that uses the Microsoft Kinect as a sensor. Next, we evaluate an overhead camera-based method that looks at bounding ellipse features. Then, we fuse the data from these two methods by validating the skeleton tracked by the Kinect. Data fusion proves beneficial, since the data fusion approach outperforms the other methods.

Paper Nr: 77
Title:

Bag-of-Words for Action Recognition using Random Projections - An Exploratory Study

Authors:

Pau Agustí, V. Javier Traver, Filiberto Pla and Raúl Montoliu

Abstract: During the last years, the bag-of-words (BoW) approach has become quite popular for representing actions from video sequences. While the BoW is conceptually very simple and practically effective, it suffers from some drawbacks. In particular, the quantization procedure behind the BoW usually relies on a computationally heavy k-means clustering. In this work we explore whether alternative approaches as simple as random projections, which are data agnostic, can represent a practical alternative. Results reveal that this randomized quantization offers an interesting computational-accuracy trade-off, because although recognition performance is not yet as high as with k-means, it is still competitive with an speed-up higher than one order of magnitude.

Paper Nr: 79
Title:

Gesture Recognition using Skeleton Data with Weighted Dynamic Time Warping

Authors:

Sait Celebi, Ali S. Aydin, Talha T. Temiz and Tarik Arici

Abstract: With Microsoft’s launch of Kinect in 2010, and release of Kinect SDK in 2011, numerous applications and research projects exploring new ways in human-computer interaction have been enabled. Gesture recognition is a technology often used in human-computer interaction applications. Dynamic time warping (DTW) is a template matching algorithm and is one of the techniques used in gesture recognition. To recognize a gesture, DTW warps a time sequence of joint positions to reference time sequences and produces a similarity value. However, all body joints are not equally important in computing the similarity of two sequences. We propose a weighted DTW method that weights joints by optimizing a discriminant ratio. Finally, we demonstrate the recognition performance of our proposed weighted DTW with respect to the conventional DTW and state-ofthe-art.

Paper Nr: 128
Title:

Integrating Spatial Layout of Object Parts into Classification without Pairwise Terms - Application to Fast Body Parts Estimation from Depth Images

Authors:

Mingyuan Jiu, Christian Wolf and Atilla Baskurt

Abstract: Object recognition or human pose estimation methods often resort to a decomposition into a collection of parts. This local representation has significant advantages, especially in case of occlusions and when the “object” is non-rigid. Detection and recognition requires modeling the appearance of the different object parts as well as their spatial layout. The latter can be complex and requires the minimization of complex energy functions, which is prohibitive in most real world applications and therefore often omitted. However, ignoring the spatial layout puts all the burden on the classifier, whose only available information is local appearance. We propose a new method to integrate the spatial layout into the parts classification without costly pairwise terms. We present an application to body parts classification for human pose estimation. As a second contribution, we introduce edge features from gray images as a complement to the well known depth features used for body parts classification from Kinect data.

Paper Nr: 143
Title:

Extending Recognition in a Changing Environment

Authors:

Daniel Harari and Shimon Ullman

Abstract: We consider the task of visual recognition of objects and their parts in a dynamic environment, where the appearances, as well as the relative positions between parts, change over time. We start with a model of an object class learned from a limited set of view directions (such as side views of cars or airplanes). The algorithm is then given a video input which contains the object moving and changing its viewing direction. Our aim is to reliably detect the object as it changes beyond its known views, and use the dynamically changing views to extend the initial object model. To achieve this goal, we construct an object model at each time instant by combining two sources: consistency with the measured optical flow, together with similarity to the object model at an earlier time. We introduce a simple new way of updating the object model dynamically by combining approximate nearest neighbors search with kernel density estimation. Unlike tracking-by-detection methods that focus on tracking a specific object over time, we demonstrate how the proposed method can be used for learning, by extending the initial generic object model to cope with novel viewing directions, without further supervision. The results show that the adaptive combination of the initial model with even a single video sequence already provides useful generalization of the class model to novel views.

Paper Nr: 153
Title:

Detection and Classification of Facades

Authors:

Panagiotis Panagiotopoulos and Anastasios Delopoulos

Abstract: This paper presents a framework that exploits the expressive power of probabilistic geometric grammars to cope with the task of facade classification. In particular, we work on a dataset of rectified facades and we attempt to discover the origin of a number of query facade segments, contaminated with noise. The building block of our description are the windows of the facade. To this direction we develop an algorithm that achieves to accurately detect them. Our core contribution though, lies on the probabilistic manipulation of the geometry of the detected windows. In particular, we propose a simple probabilistic grammar to model this geometry and we propose a methodology for learning the parameters of the grammar from a single instance of each facade through a MAP estimation procedure. The produced generative model is essentially a detector of the particular facade. After producing one model per facade in our dataset, we proceed with the classification of the query segments. Promising results indicate that the simultaneous use of an appearance model together with our geometric formulation always achieved superior classification rates than the exclusive use of the appearance model itself, justifying the value of probabilistic geometric grammars for the task of facade classification.

Paper Nr: 155
Title:

Depth-Assisted Rectification of Patches - Using RGB-D Consumer Devices to Improve Real-time Keypoint Matching

Authors:

João Paulo Lima, Francisco Simões, Hideaki Uchiyama, Veronica Teichrieb and Eric Marchand

Abstract: This paper presents a method named Depth-Assisted Rectification of Patches (DARP), which exploits depth information available in RGB-D consumer devices to improve keypoint matching of perspectively distorted images. This is achieved by generating a projective rectification of a patch around the keypoint, which is normalized with respect to perspective distortions and scale. The DARP method runs in real-time and can be used with any local feature detector and descriptor. Evaluations with planar and non-planar scenes show that DARP can obtain better results than existing keypoint matching approaches in oblique poses.

Paper Nr: 183
Title:

On Reducing the Number of Visual Words in the Bag-of-Features Representation

Authors:

Giuseppe Amato, Fabrizio Falchi and Claudio Gennaro

Abstract: A new class of applications based on visual search engines are emerging, especially on smart-phones that have evolved into powerful tools for processing images and videos. The state-of-the-art algorithms for large visual content recognition and content based similarity search today use the “Bag of Features” (BoF) or “Bag of Words” (BoW) approach. The idea, borrowed from text retrieval, enables the use of inverted files. A very well known issue with this approach is that the query images, as well as the stored data, are described with thousands of words. This poses obvious efficiency problems when using inverted files to perform efficient image matching. In this paper, we propose and compare various techniques to reduce the number of words describing an image to improve efficiency and we study the effects of this reduction on effectiveness in landmark recognition and retrieval scenarios. We show that very relevant improvement in performance are achievable still preserving the advantages of the BoF base approach.

Paper Nr: 184
Title:

Facial Age Simulation using Age-specific 3D Models and Recursive PCA

Authors:

Anastasios Maronidis and Andreas Lanitis

Abstract: Facial age simulation is a topic that has been gaining increasing interest in computer vision. In this paper, a novel age simulation method that utilizes age-specific shape and texture models is proposed. During the process of generating age-specific shape models, 3D face measurements acquired from real human faces are used in order to tune a generic 3D face shape model to represent face shapes belonging to certain age groups. A number of diagnostic studies have been conducted in order to validate the compatibility of the tuned shape models with the corresponding age groups. The shape age-simulation process utilizes age-specific shape models that incorporate age-related constraints during a 3D shape reconstruction phase. Age simulation is completed by predicting the texture at the target age based on a recursive PCA method that aims to superimpose age-related texture modifications in a way that preserves identity-related characteristics of the subject in the source image. Preliminary results indicate the potential of the proposed method.

Paper Nr: 219
Title:

Wavelet-based Circular Hough Transform and Its Application in Embryo Development Analysis

Authors:

Marcelo Cicconet, Davi Geiger and Kris Gunsalus

Abstract: Detecting object shapes from images remains a challenging problem in computer vision, especially in cases where some a priori knowledge of the shape of the objects of interest exists (such as circle-like shapes) and/or multiple object shapes overlap. This problem is important in the field of biology, particularly in the area of early-embryo development, where the dynamics is given by a set of cells (nearly-circular shapes) that overlap and eventually divide. We propose an approach to this problem that relies mainly on a variation of the circular Hough Transform where votes are weighted by wavelet kernels, and a fine-tuning stage based on dynamic programming. The wavelet-based circular Hough transform can be seen as a geometric-driven pulling mechanism in a set of convolved images, thus having important connections with well-stablished machine learning methods such as convolution networks.

Paper Nr: 224
Title:

Multi-view People Detection on Arbitrary Ground in Real-time

Authors:

Ákos Kiss and Tamás Szirányi

Abstract: We show a method to detect accurate 3D position of people from multiple views, regardless of the geometry of the ground. In our new method we search for intersections of 3D primitives (cones) to find positions of feet. The cones are computed by back-projecting ellipses covering feet in input images. Instead of computing complex intersection body, we use approximation to speed up intersection computing. We found that feet positions are determined accurately, and the height map of the ground can be reconstructed with small error. We compared our method to other multiview-detectors - using somewhat different test methodology -, and achieved comparable results, with the benefit of handling arbitrary ground. We also present accurately recon- structed height map of non-planar ground. Our algorithm is fast and most of steps are parallelizable, making it possibly available for smart camera systems.

Paper Nr: 231
Title:

Using Whole and Part-based HOG Filters in Succession to Detect Cars in Aerial Images

Authors:

Satish Madhogaria, Marek Schikora and Wolfgang Koch

Abstract: Vehicle detection in aerial images plays a key role in surveillance, transportation control and traffic monitoring. It forms an important aspect in the deployment of autonomous Unmanned Aerial System (UAS) in rescue and surveillance missions. In this paper, we propose a two-stage algorithm for efficient detection of cars in aerial images. We discuss how sophisticated detection technique may not give the best result when applied to large scale images with complicated backgrounds. We use a relaxed version of HOG (Histogram of Oriented Gradients) and SVM (Support Vector Machine) to extract hypothesis windows in the first stage. The second stage is based on discriminatively trained part-based models. We create a richer model to be used for detection from the hypothesis windows by detecting and locating parts in the root object. Using a two-stage detection procedure not only improves the accuracy of the overall detection but also helps us take complete advantage of the accuracy of sophisticated algorithms ruling out it’s incompetence in real scenarios. We analyze the results obtained from Google Earth dataset and also the images taken from a camera mounted beneath a flying aircraft. With our approach we could achieve a recall rate of 90% with a precision of 94%.

Paper Nr: 240
Title:

A Pyramid of Concentric Circular Regions to Improve Rotation Invariance in Bag-of-Words Approach for Object Categorization

Authors:

Arnaldo Câmara Lara and Roberto Hirata Jr.

Abstract: The bag-of-words (BoW) approach has shown to be effective in image categorization. Spatial pyramids in conjunction to the original BoW approach improve overall performance in the categorization process. This work proposes a new way of partitioning an image in concentric circular regions and calculating histograms of codewords for each circular region. The histogram of the entire image is concatenated forming the image descriptor. This slight and simple modification preserves the performance of the original spatial information and adds robustness to image rotation. The pyramid of concentric circular regions showed to be almost 78% more robust to rotation of images in our tests compared to the traditional rectangular spatial pyramids.

Paper Nr: 246
Title:

Using Skin Segmentation to Improve Similar Product Recommendations in Online Clothing Stores

Authors:

Noran Hasan, Ahmed Hamouda, Tamer Deif, Motaz El-Sabban and Ramy Shahin

Abstract: Image matching and retrieval in the domain of clothing, as used in online shopping for recommending similar products, is often distracted by the existence of a mannequin/model wearing the product. The existence of a model adds clutter to both the shape and color features of the product. In this paper, we propose a novel image pre-processing pipeline that minimizes skin and background segments generated from generic GraphCut segmentation. Experiments judged by human subjects show very promising gains of around 23% in retrieval precision of the top 25 similar products compared to the baseline system.

Paper Nr: 249
Title:

Audio/Visual Recurrences and Decision Trees for Unsupervised TV Program Structuring

Authors:

Alina Elma Abduraman, Sid-Ahmed Berrani and Bernard Merialdo

Abstract: This paper addresses the problem of unsupervised TV program structuring. Program structuring allows direct and non linear access to the desired parts of a program. Our work addresses the structuring of programs like news, entertainment, shows, magazines... It is based on the detection of audio and visual recurrences. It proposes an effective classification and selection system, based on decision trees, that allows the detection of “separators” among these recurrences. Separators are short audio/visual sequences that delimit the different parts of a program. The decision trees are built based on attributes issued from techniques like applause detection, scenes segmentation, face/speaker detection and clustering. The approach has been evaluated on a 112 hours dataset corresponding to 169 episodes of TV programs.

Paper Nr: 257
Title:

Learning and Classification of Car Trajectories in Road Video by String Kernels

Authors:

Luc Brun, Alessia Saggese and Mario Vento

Abstract: An abnormal behavior of a moving vehicle or a moving person is characterized by an unusual or not expected trajectory. The definition of expected trajectories refers to supervised learning, where an human operator should define expected behaviors. Conversely, definition of usual trajectories, requires to learn automatically the dynamic of a scene in order to extract its typical trajectories. We propose, in this paper, a method able to identify abnormal behaviors based on a new unsupervised learning algorithm. The original contributions of the paper lies in the following aspects: first, the evaluation of similarities between trajectories is based on string kernels. Such kernels allow us to define a kernel-based clustering algorithm in order to obtain groups of similar trajectories. Finally, identification of abnormal trajectories is performed according to the typical trajectories characterized during the clustering step. The experimentation, conducted over a real dataset, confirms the efficiency of the proposed method.

Paper Nr: 269
Title:

Vehicle Detection with Context

Authors:

Yang Hu and Larry S. Davis

Abstract: Detecting vehicles in satellite images has a wide range of applications. Existing approaches usually identify vehicles from their appearance. They typically generate many false positives due to the existence of a large number of structures that resemble vehicles in the images. In this paper, we explore the use of context information to improve vehicle detection performance. In particular, we use shadows and the ground appearance around vehicles as context clues to validate putative detections. A data driven approach is applied to learn typical patterns of vehicle shadows and the surrounding “road-like” areas. By observing that vehicles often appear in parallel groups in urban areas, we also use the orientations of nearby detections as another context clue. A conditional random field (CRF) is employed to systematically model and integrate these different contextual knowledge. We present results on two sets of images from Google Earth. The proposed method significantly improves the performance of the base appearance based vehicle detector. It also outperforms another state-of-the-art context model.

Paper Nr: 298
Title:

Spatio-temporal Video Retrieval by Animated Sketching

Authors:

Steven Verstockt, Olivier Janssens, Sofie Van Hoecke and Rik Van de Walle

Abstract: In order to improve content-based searching in digital video, this paper proposes a novel intuitive querying method based on animated sketching. By sketching two or more frames of the desired scene, users can intuitively find the video sequences they are looking for. To find the best match for the user input, the proposed algorithm generates the edge histogram descriptors of both the sketches’ static background and its moving foreground objects. Based on these spatial descriptors, the set of videos is queried a first time to find video sequences in which similar background and foreground objects appear. This spatial filtering already results in sequences with similar scene characteristics as the sketch. However, further temporal analysis is needed to find the sequences in which the specific action, i.e. the sketched animation, occurs. This is done by matching the motion descriptors of the motion history images of the sketch and the video sequences. The sequences with the highest match are returned to the user. Experiments on a heterogeneous set of videos demonstrate that the system allows more intuitive video retrieval and yields appropriate query results, which match the sketches.

Paper Nr: 322
Title:

Human-centered Region Selection and Weighting for Image Retrieval

Authors:

Jean Martinet

Abstract: We present an application of gaze tracking to image and video indexing, in the form of a model for selecting and weighting Regions of Interest (RoIs). Image/video indexing refers to the process of creating a synthetic representation of the media, for instance for retrieval purposes. It usually consists in labeling the media with semantic keywords describing its content. When automatized, this process is based on the analysis of visual features, which can be extracted either from the whole image or keyframe, or locally from regions. Since most of the times the whole image is not relevant for indexing (e.g. large flat regions with no specific semantic interpretation, blur regions, background regions that may not be relevant for retrieval purposes, and that should be filtered out), it would be preferable to concentrate the labeling process on specific RoIs that are considered representative of the scene, like the main subjects. The objective of the work presented here is to take advantage of natural human gaze information in order to define a human-centered Region of Interest selection and weighting technique in the context of media retrieval.

Posters
Paper Nr: 15
Title:

Text Recognition in Natural Images using Multiclass Hough Forests

Authors:

Gökhan Yildirim, Radhakrishna Achanta and Sabine Süsstrunk

Abstract: Text detection and recognition in natural images are popular yet unsolved problems in computer vision. In this paper, we propose a technique that attempts to detect and recognize text in a unified manner by searching for words directly without reducing the image into text regions or individual characters. We present three contributions. First, we modify an object detection framework called Hough Forests (Gall et al., 2011) by introducing “Cross-Scale Binary Features” that compares the information between the same image patch at different scales. We use this modified technique to produce likelihood maps for every text character. Second, our word-formation cost function and computed likelihood maps are used to detect and recognize the text in natural images. We test our technique with the Street View House Numbers (Netzer et al., 2011) and the ICDAR 2003† (Lucas et al., 2003) datasets. For the SVHN dataset, our algorithm outperforms recent methods and has comparable performance using fewer training samples. We also exceed the state-of-the-art word recognition performance for ICDAR 2003 dataset by 4%. Our final contribution is a realistic dataset generation code for text characters.

Paper Nr: 29
Title:

An Improved Feature Vector for Content-based Image Retrieval in DCT Domain

Authors:

Cong Bai, Kidiyo Kpalma and Joseph Ronsin

Abstract: This paper proposes an improved approach for content-based image retrieval in Discrete Cosine Transform domain. For each 4x4 DCT block, we calculate the statistical information of three groups of AC coefficients and propose to use these values to form the AC-Pattern and use DC coefficients of neighboring blocks to construct DC-Pattern. The histograms of these two patterns are constructed and their selections are concatenated as feature descriptor. Similarity between the feature descriptors is measured by c2 distance. Experiments executed on widely used face and texture databases show that better performance can be observed with the proposal compared with other classical method and state-of-the-art approaches.

Paper Nr: 95
Title:

A POMDP-based Camera Selection Method

Authors:

Li Qian, Sun Zheng-Xing and Chen Song-Le

Abstract: This paper addresses the problem of camera selection in multi-camera systems and proposes a novel selection method based on a partially observable Markov decision process model (POMDP). And an innovative evaluation function identifies the most informative of several multi-view video streams by extracting and scoring features related to global motion, attributes of moving objects, and special events such as the appearance of new objects. The experiments show that these proposed visual evaluation criteria successfully measure changes in scenes and our camera selection method effectively reduces camera switching.

Paper Nr: 96
Title:

Smart Video Orchestration for Immersive Communication

Authors:

Alaeddine Mihoub and Emmanuel Marilly

Abstract: In the context of immersive communication and in order to enrich attentional immersion in videoconferences for remote attendants, the problem of camera orchestration has been evoked. It consists of selecting and displaying the most relevant view or camera. HMMs have been chosen to model the different video events and video orchestration models. A specific algorithm taking as input high level observations and enabling non expert users to train the videoconferencing system has been developed.

Paper Nr: 121
Title:

Computer Assisted Quantification of Hyoid Bone Motion in Fluoroscopic Videos

Authors:

Ishtiaque Hossain, Angela Roberts-South, Mandar Jog and Mahmoud R. El-Sakka

Abstract: The Videofluoroscopic Swallowing Study is a technique commonly used by radiologists to detect abnormalities in the swallowing process. While the subject swallows the food, X-ray images are taken and then compiled in a video form. The video is later analyzed by the radiologist using visual means. Since the nature of the inspection is highly subjective, the result of the inspection can barely be reliable. One of the assessed measures is the elevation of the hyoid bone during the swallow. This research introduces a semi-automatic method which identifies the hyoid bone in fluoroscopic videos and quantifies its motion. Before identifying the hyoid bone, the region-of-interest is automatically identified using a classification-based approach and subsequent image processing procedures are applied to the identified region-of-interest. Results show that the proposed method can accurately quantify the motion of the hyoid bone.

Paper Nr: 127
Title:

Speed Up Learning based Descriptor for Face Verification

Authors:

Hai Wang, Bongnam Kang, Jongmin Yoon and Daijin Kim

Abstract: Many state of the art face recognition algorithms use local feature descriptors known as Local Binary Pattern (LBP). Many extensions of LBP exist, but the performance is still limited. Recently Learning Based Descriptor was introduced for face verification, it showed high discrimination power, but compared with LBP, it’s expensive to compute. In this paper, we propose a novel coding approach for Learning Based Descriptor (LE) descriptor which can keep the most discriminative LBP like feature as well as significantly shorten the feature extraction time. Since the proposed method speed up the LE descriptor’s feature extraction time, we call it Speeded Up Learning Descriptor or SULE for short. Tests on LFW standard benchmark show the superiority of SULE with respect of several state of the art feature descriptors regularly used in face verification applications.

Paper Nr: 136
Title:

Small Vocabulary with Saliency Matching for Video Copy Detection

Authors:

Huamin Ren, Thomas B. Moeslund, Sheng Tang and Heri Ramampiaro

Abstract: The importance of copy detection has led to a substantial amount of research in recent years, among which Bag of visual Words (BoW) plays an important role due to its ability to effectively handling occlusion and some minor transformations. One crucial issue in BoW approaches is the size of vocabulary. BoW descriptors under a small vocabulary can be both robust and efficient, while keeping high recall rate compared with large vocabulary. However, the high false positives exists in small vocabulary also limits its application. To address this problem in small vocabulary, we propose a novel matching algorithm based on salient visual words selection. More specifically, the variation of visual words across a given video are represented as trajectories and those containing locally asymptotically stable points are selected as salient visual words. Then we attempt to measure the similarity of two videos through saliency matching merely based on the selected salient visual words to remove false positives. Our experiments show that a small codebook with saliency matching is quite competitive in video copy detection. With the incorporation of the proposed saliency matching, the precision can be improved by 30% on average compared with the state-of-the-art technique. Moreover, our proposed method is capable of detecting severe transformations, e.g. picture in picture and post production.

Paper Nr: 139
Title:

Enhanced Micro-structure Descriptor based Image Retrieval

Authors:

Jie Yuan, Baogang Wei, Li Liu, Yin Zhang and Lidong Wang

Abstract: We present a new image retrieval method based on enhanced micro-structure descriptor in this paper. Traditional micro-structure based descriptors such as multi-texton histogram (MTH) and micro-structure descriptor (MSD) can integrate local texture, shape and color feature from the image into a feature vector, but their description abilities for natural images are not sufficient. In this paper we firstly use a new local pattern map to create filter map, and the enhanced MSD descriptor (EMSD) is extracted based on the color co-occurrence relationship on the filtered and quantized image. We then normalize the distance between images to range [0, 1]. The proposed method is extensively tested on Corel-5000 and Corel-10000 image sets and experimental results show that our method can achieve a higher retrieval precision than MSD and other representative methods.

Paper Nr: 167
Title:

Pupil Localization by a Template Matching Method

Authors:

Donatello Conte, Rosario Di Lascio, Pasquale Foggia, Gennaro Percannella and Mario Vento

Abstract: In this paper, a new algorithm for pupil localization is proposed. The algorithm is based on a template matching approach; the original contribution is that the model of the pupil that is used is not fixed, but it is automatically constructed on the first frame of the video sequence to be examined. Therefore the model is adaptively tuned to each subject, in order to improve the robustness and the accuracy of the detection. The results show the effectiveness of the proposed algorithm.

Paper Nr: 181
Title:

Secure Image Retrieval Scheme in the Encrypted Domain

Authors:

Pei Zhang, Li Zhuo, Yingdi Zhao, Bo Cheng, Jing Zhang and Xiaoqin Song

Abstract: Currently, the image retrieval methods focus on improving the retrieval performance, but ignoring preserving the problem of preserving privacy. Images contain a great deal of personal privacy information, and leakage of information will result in seriously negative effect. Ensuring the image retrieval performance while preserving the confidentiality of data has become the key issue in the field of image retrieval. Based on the Content-based Image Retrieval (CBIR), we propose a secure image retrieval scheme in the encrypted domain, where the encrypted features can be used in similarity comparison directly. This paper compares the ciphertext retrieval with plaintext retrieval to illustrate that the proposed scheme could achieve the comparable retrieval performance, while ensuring the image information security at the same time.

Paper Nr: 185
Title:

Video Shot Boundary Detection using Visual Bag-of-Words

Authors:

Jukka Lankinen and Joni-Kristian Kämäräinen

Abstract: Recently, convergence of techniques used in image analysis and video processing has occurred. Many computation and memory intensive image analysis methods have become available for per frame processing of videos due to increased computing power of desktop computers and efficient implementations on multiple cores and graphical processing units (GPUs). As our main contribution in this work, we solve the problem of shot boundary detection using a popular image analysis (object detection) approach: visual bag-of-words (BoW). The baseline approach for the shot boundary detection has been colour histogram and it is at the core of many top methods, but our BoW method of similar complexity in the terms of parameters clearly outperforms colour histograms. Interestingly, an “AND-combination” of colour and BoW histogram detection is clearly superior indicating that colour and local features provide complimentary information for video analysis.

Paper Nr: 195
Title:

A Video Copy Detection System based on Human Visual System

Authors:

Yu Bai, Li Zhuo, YingDi Zhao and Xiaoqin Song

Abstract: The technology of near-duplicate video detection is currently a research hot spot in the field of multimedia information processing. It has great value in the areas such as large scale video information indexing and copyright protection. In the case of large-scale data, it is very important to ensure the accuracy of detection and robustness, in the meanwhile improving the processing speed of video copy detection. In this respect, a HVS(Human Visual System)-based video copy detection system is proposed in this paper.This system utilizes the visual attention model to extract the region of interest(ROI) in keyframes, which extracts the Surfgram feature only from the information in ROI, rather than all of the information in the keyframe, thus effectively reducing the amount of the data to process. The experimental results have shown that the proposed algorithm can effectively improve the speed of detection and perform good robustness against brightness changes, contrast changes, frame drops and Gaussian noise.

Paper Nr: 199
Title:

A Region Driven and Contextualized Pedestrian Detector

Authors:

Thierry Chesnais, Thierry Chateau, Nicolas Allezard, Yoann Dhome, Boris Meden, Mohamed Tamaazousti and Adrien Chan-Hon-Tong

Abstract: This paper tackles the real-time pedestrian detection problem using a stationary calibrated camera. Problems frequently encountered are: a generic classifier can not be adjusted to each situation and the perspective deformations of the camera can profoundly change the appearance of a person. To avoid these drawbacks we contextualized a detector with information coming directly from the scene. Our method comprises three distinct parts. First an oracle gathers examples from the scene. Then, the scene is split in different regions and one classifier is trained for each one. Finally each detector are automatically tuned to achieve the best performances. Designed for making camera network installation procedure easier, our method is completely automatic and does not need any knowledge about the scene.

Paper Nr: 218
Title:

Multi-class Image Classification - Sparsity does it Better

Authors:

Sean Ryan Fanello, Nicoletta Noceti, Giorgio Metta and Francesca Odone

Abstract: It is well assessed that sparse representations improve the overall accuracy and the systems performances of many image classification problems. This paper deals with the problem of finding sparse and discriminative representations of images in multi-class settings. We propose a new regularized functional, which is a modification of the standard dictionary learning problem, designed to learn one dictionary per class. With this new formulation, while positive examples are constrained to have sparse descriptions, we also consider a contribution from negative examples which are forced to be described in a denser and smoother way. The descriptions we obtain are meaningful for a given class and highly discriminative with respect to other classes, and at the same time they guarantee real-time performances. We also propose a new approach to the classification of single image features which is based on the dictionary response. Thanks to this formulation it is possible to directly classify local features based on their sparsity factor without losing statistical information or spatial configuration and being more robust to clutter and occlusions. We validate the proposed approach in two image classification scenarios, namely single instance object recognition and object categorization. The experiments show the effectiveness in terms of performances and speak in favor of the generality of our method.

Paper Nr: 239
Title:

Using n-grams Models for Visual Semantic Place Recognition

Authors:

Mathieu Dubois, Emmanuelle Frenoux and Philippe Tarroux

Abstract: The aim of this paper is to present a new method for visual place recognition. Our system combines global image characterization and visual words, which allows to use efficient Bayesian filtering methods to integrate several images. More precisely, we extend the classical HMM model with techniques inspired by the field of Natural Language Processing. This paper presents our system and the Bayesian filtering algorithm. The performance of our system and the influence of the main parameters are evaluated on a standard database. The discussion highlights the interest of using such models and proposes improvements.

Paper Nr: 258
Title:

Automatic Text Localisation in Scanned Comic Books

Authors:

Christophe Rigaud, Dimosthenis Karatzas, Joost van de Weijer, Jean Christophe Burie and Jean-Marc Ogier

Abstract: Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.

Paper Nr: 279
Title:

Optimized Cascade of Classifiers for People Detection using Covariance Features

Authors:

Malik Souded and Francois Bremond

Abstract: People detection on static images and video sequences is a critical task in many computer vision applications, like image retrieval and video surveillance. It is also one of most challenging task due to the large number of possible situations, including variations in people appearance and poses. The proposed approach optimizes an existing approach based on classification on Riemannian manifolds using covariance matrices in a boosting scheme, making training and detection faster while maintaining equivalent performances. This optimisation is achieved by clustering negative samples before training, providing a smaller number of cascade levels and less weak classifiers in most levels in comparison with the original approach. Our work was evaluated and validated on INRIA Person dataset.

Paper Nr: 299
Title:

How to Exploit Scene Constraints to Improve Object Categorization Algorithms for Industrial Applications?

Authors:

Steven Puttemans and Toon Goedemé

Abstract: State-of-the-art object categorization algorithms are designed to be heavily robust against scene variations like illumination changes, occlusions, scale changes, orientation and location differences, background clutter and object intra-class variability. However, in industrial machine vision applications where objects with variable appearance have to be detected, many of these variations are in fact constant and can be seen as constraints on the scene, which in turn can reduce the enormous search space for object instances. In this position paper we explore the possibility to fixate certain of these variations according to the application specific scene constraints and investigate the influence of these adaptations on three main aspects of object categorization algorithms: the amount of training data needed, the speed of the detection and the amount of false detections. Moreover, we propose steps to simplify the training process under such scene constraints.

Area 4 - Applications and Services

Full Papers
Paper Nr: 172
Title:

Performance Evaluation of BRISK Algorithm on Mobile Devices

Authors:

Alexander Gularte, Camila Thomasi, Rodrigo de Bem and Diana Adamatti

Abstract: The great number of researches about local features extraction algorithms in the last years, allied to the popularization of mobile devices, makes desirable efficient and accurate algorithms suitable to run on such devices. Despite this, there are few approaches adequate to run efficiently on the complexity-, cost- and power-constrained mobile environments. The main objective of this work is to evaluate the performance of the recently proposed BRISK algorithm on mobile devices. In this way, a mobile implementation, named M-BRISK, is proposed. Some implementation strategies are considered and successful applied to execute the algorithm in a real-world mobile device. As evaluation criterion repeatability, recall, precision and running time metrics are used, as well as the comparison with the classical well established algorithm SURF and also with the more recently proposed ORB. The results confirm that proposed mobile implementation of BRISK (M-BRISK) performs well and it is adequate to mobile devices.

Paper Nr: 270
Title:

Non-rigid Surface Tracking for Virtual Fitting System

Authors:

Naoki Shimizu, Takumi Yoshida, Tomoki Hayashi, Francois de Sorbier and Hideo Saito

Abstract: In this paper, we describe a method for overlaying a texture onto a T-shirt, for improving current virtual fitting system. In such systems, users can try on clothes virtually. In order to realize such a system, a depth camera has been used. These depth cameras can capture 3D data in real time and have been used by some industrial virtual cloth fitting systems. However, these systems roughly, or just do not, consider the shape of the clothes that user is wearing. So the appearance of these virtual fitting systems looks unnaturally. For a better fitting, we need to estimate 3D shape of cloth surface, and overlay a texture of the cloth that the user wants to see onto the surface. There are some methods that register a 3D deformable mesh onto captured depth data of a target surface. Although those registration methods are very accurate, most of them require large amount of processing time or either manually-set markers or special rectangles. The main contribution of our method is to overlay a texture onto a texture of T-shirt in real-time without modifying the surface.

Paper Nr: 291
Title:

iRep3D: Efficient Semantic 3D Scene Retrieval

Authors:

Xiaoqi Cao and Matthias Klusch

Abstract: In this paper, we present a new repository, called iRep3D, for efficient retrieval of semantically annotated 3D scenes in XML3D, X3D or COLLADA. The semantics of a 3D scene can be described by means of its annotations with concepts and services which are defined in appropriate OWL2 ontologies. The iRep3D repository indexes annotated scenes with respect to these annotations and geometric features in three different scene indices. For concept and service-based scene indexing iRep3D utilizes a new approximated logical subsumption-based measure while the geometric feature-based indexing adheres to the standard specifications of XML-based 3D scene graph models. Each query for 3D scenes is processed by iRep3D in these indices in parallel and answered with the top-k relevant scenes of the final aggregation of the resulting rank lists. Results of experimental performance evaluation over a preliminary test collection of more than 600 X3D and XML3D scenes shows that iRep3D can significantly outperform both semantic-driven multimedia retrieval systems FB3D and RIR, as well as the non-semantic-based 3D model repository ADL in terms of precision and with reasonable response time in average.

Short Papers
Paper Nr: 147
Title:

GAIL: Geometry-aware Automatic Image Localization

Authors:

Luca Benedetti, Massimiliano Corsini, Matteo Dellepiane, Paolo Cignoni and Roberto Scopigno

Abstract: The access and integration of the massive amount of information, that can be provided by the web, can be of great help in a number of fields, including tourism and advertising of artistic sites. A “virtual visit” of a place can be a valuable experience before, during and after the experience on-site. For this reason, the contribution from the public could be merged to provide a realistic and immersive visit of known places. We propose an automatic image localization system, which is able to recognize the site that has been framed, and calibrate it on a pre-existing 3D representation. The system is characterized by very high accuracy and it is able to validate, in a completely unsupervised manner, the result of the localization. Given an unlocalized image, the system selects a relevant set of pre-localized images, performs a Structure from Motion partial reconstruction of this set and then obtain an accurate camera calibration of the image with respect to the model by minimizing distances between projections on the model surface of corresponding image features. The accuracy reached is enough to seamlessly view the input image correctly super-imposed in the 3D scene.

Paper Nr: 148
Title:

Geo-positional Image Forensics through Scene-terrain Registration

Authors:

P. Chippendale, M. Zanin and M. Dalla Mura

Abstract: In this paper, we explore the topic of geo-tagged photo authentication and introduce a novel forensic tool created to semi-automate the process. We will demonstrate how a photo’s location and time can be corroborated through the correlation of geo-modellable features to embedded visual content. Unlike previous approaches, a machine-vision processing engine iteratively guides users through the photo registration process, building upon available meta-data evidence. By integrating state-of-the-art visual-feature to 3D-model correlation algorithms, camera intrinsic and extrinsic calibration parameters can also be derived in an automatic or semi-supervised interactive manner. Experimental results, considering forensic scenarios, demonstrate the validity of the system introduced.

Paper Nr: 253
Title:

Towards Automatic Direct Observation of Procedure and Skill (DOPS) in Colonoscopy

Authors:

Mirko Arnold, Anarta Ghosh, Glen Doherty, Hugh Mulcahy, Christopher Steele, Stephen Patchett and Gerard Lacey

Abstract: The quality of individual colonoscopy procedures is currently assessed by the performing endoscopist. In light of the recently reported quality issues in colonoscopy screening, there may be significant benefits in augmenting this form of self-assessment by automatic assistance systems. In this paper, we propose a system for the assessment of individual colonoscopy procedures, based on image analysis and machine learning. The system rates the procedures according to criteria of the validated Direct Observation of Procedure and Skill (DOPS) assessment, developed by the Joint Advisory Group on GI Endoscopy (JAG) in the UK, a system involving expert assessment of procedures based on an assessment form.

Paper Nr: 308
Title:

Making Digital Signage Adaptive through a Genetic Algorithm - Utilizing Viewers’ Involuntary Behaviors

Authors:

Ken Nagao and Issei Fujishiro

Abstract: Digital signage has been becoming more popular due to the recent development of underlying hardware technology and improvement in installing environments. In digital signage, it is important to make the content more attractive to the viewers by evaluating its current attractiveness on the fly, in order to deliver the message from the sender more effectively. Most previous works for this evaluation do not take the viewers’ feeling towards the content into account, and the content is improved manually if needed in an off-line manner. In this paper, we present a novel method which does not rely on such manual evaluation and automatically makes the content more adapted to the viewers. To this end, we take advantage of the viewers’ involuntary behaviors in front of the digital signage for online updates through the usage of a genetic algorithm.

Posters
Paper Nr: 26
Title:

Counterfeit Detection and Value Recognition of Euro Banknotes

Authors:

Sebastiano Battiato, Giovanni Maria Farinella, Arcangelo Bruna and Giuseppe Claudio Guarnera

Abstract: This paper describes both hardware and software components to detect counterfeits of Euro banknotes. The system is also able to recognize the banknote values. The proposed method makes use of images acquired with near infrared camera and works without mechanical parts. This makes the overall system low-cost. The effectiveness of the proposed solution has been properly tested on a dataset composed by genuine and fake Euro banknotes provided by Italy’s central bank.

Paper Nr: 84
Title:

Application to Quantify Fetal Lung Branching on Rat Explants

Authors:

Pedro L. Rodrigues, Sara Granja, António Moreira, Nuno Rodrigues and João L. Vilaça

Abstract: Recently, regulating mechanisms of branching morphogenesis of fetal lung rat explants have been an essential tool for molecular research. The development of accurate and reliable segmentation techniques may be essential to improve research outcomes. This work presents an image processing method to measure the perimeter and area of lung branches on fetal rat explants. The algorithm starts by reducing the noise corrupting the image with a pre-processing stage. The outcome is input to a watershed operation that automatically segments the image into primitive regions. Then, an image pixel is selected within the lung explant epithelial, allowing a region growing between neighbouring watershed regions. This growing process is controlled by a statistical distribution of each region. When compared with manual segmentation, the results show the same tendency for lung development. High similarities were harder to obtain in the last two days of culture, due to the increased number of peripheral airway buds and complexity of lung architecture. However, using semiautomatic measurements, the standard deviation was lower and the results between independent researchers were more coherent.

Paper Nr: 85
Title:

Automatic Modeling of an Orthotic Bracing for Nonoperative Correction of Pectus Carinatum

Authors:

João L. Vilaça, Pedro L. Rodrigues, António H. J. Moreira, João Gomes Fonseca, A. C. M. Pinho, Jaime C. Fonseca and Nuno Rodrigues

Abstract: Pectus Carinatum is a deformity of the chest wall, characterized by an anterior protrusion of the sternum, often corrected surgically due to cosmetic motivation. This work presents an alternative approach to the current open surgery option, proposing a novel technique based on a personalized orthosis. Two different processes for the orthosis’ personalization are presented. One based on a 3D laser scan of the patient chest, followed by the reconstruction of the thoracic wall mesh using a radial basis function, and a second one, based on a computer tomography scan followed by a neighbouring cells algorithm. The axial position where the orthosis is to be located is automatically calculated using a Ray-Triangle intersection method, whose outcome is input to a pseudo Kochenek interpolating spline method to define the orthosis curvature. Results show that no significant differences exist between the patient chest physiognomy and the curvature angle and size of the orthosis, allowing a better cosmetic outcome and less initial discomfort.

Paper Nr: 89
Title:

A Tool for Brain Magnetic Resonance Image Segmentation

Authors:

Baptiste Magnier, Philippe Montesinos and Daniel Diep

Abstract: This paper is dedicated to a brain magnetic resonance images regularization method, preserving grey/white matter edges using rotating smoothing filters. After a preprocessing, the originality of this approach resides in the mixing of ideas coming both from pixel classification which determines roughly if a pixel belongs to a homogenous region or an edge and an anisotropic edge detector which computes two precise diffusion directions. These directions are used by an anisotropic diffusion scheme which is accurately controlled near edges and corners. Comparing our results with existing algorithms allows us to validate the robustness of our method.

Paper Nr: 97
Title:

Reconstructing Archeological Vessels from Fragments using Anchor Points Residing on Shard Fragment Borders

Authors:

Zexi Liu, Fernand Cohen and Ezgi Taslidere

Abstract: This paper presents a method to assist in the tedious process of reconstructing ceramic vessels from excavated fragments. The method models the fragment borders as 3D curves and uses intrinsic differential anchor points on the curves. Corresponding anchors on different fragments are identified using absolute invariants and a longest string search technique. A rigid transformation is computed from the corresponding anchors, allowing the fragments to be virtually mended. A global constraint induced by the surface of revolution (basis shape) to decide on how all pairs of mended fragments are coming together as one global mended vessel is used. The accuracy of mending is measured using a distance error map metric. The method is tested on a set of 3D scanned fragments (313 pieces) coming from 19 broken vessels. 80% of the pieces were properly mended and resulted into alignment error at the scanner-resolution-level. The method took 59 seconds for mending pieces plus 60 minutes for 3D scans as compared to 12 hours for stitching manually.

Paper Nr: 111
Title:

An Interactive Appearance-based Document Retrieval System for Historical Newspapers

Authors:

Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Apostolos Antonacopoulos and Josep Lladós

Abstract: In this paper we present a retrieval-based application aimed at assisting a user to semi-automatically segment an incoming flow of historical newspaper images by automatically detecting a particular type of pages based on their appearance. A visual descriptor is used to assess page similarity while a relevance feedback process allow refining the results iteratively. The application is tested on a large dataset of digitised historic newspapers.

Paper Nr: 113
Title:

Classification of Text and Image Areas in Digitized Documents for Mobile Devices

Authors:

Anne-Sophie Ettl, Axel Zeilner, Ralf Köster and Arjan Kuijper

Abstract: Post processing and automatic interpretation of images plays an increasingly important role in the mobile area. Both for the efficient compression and for the automatic evaluation of text, it is useful to store text content as textual information rather than as graphics information. For this purpose pictures from magazines are recorded with the camera of a smartphone and classified according to text and image areas. In this work established desktop procedures are presented and analyzed in terms of their applications on mobile devices. Based on these methods, an approach for image segmentation and classification on mobile devices is developed, taking into account the limited resources of these mobile devices.

Paper Nr: 116
Title:

Robust Descriptors Fusion for Pedestrians’ Re-identification and Tracking Across a Camera Network

Authors:

Ahmed Derbel, Yousra Ben Jemaa, Sylvie Treuillet, Bruno Emile, Raphael Canals and Abdelmajid Ben Hamadou

Abstract: In this paper, we introduce a new approach to identify people in multi-camera based on AdaBoost descriptors cascade. Given the complexity of this task, we propose a new regional color feature vector based on intra and inter color histograms fusion to characterize a person in multi-camera. This descriptor is then integrated into an extensive comparative study with several existing color, texture and shape feature vectors in order to choose the best ones. We prove through a comparative study with the main existing approaches on the VIPeR dataset and using Cumulative Matching Characteristic measurement that the proposed approach is very suitable to identify a person and provides very satisfactory performances.

Paper Nr: 133
Title:

A Machine Vision based Lumber Tracing System

Authors:

Riku Hietaniemi, Sami Varjo and Jari Hannuksela

Abstract: In this paper, we introduce a machine vision system for wooden board tracing in sawmills. The goal is to match images taken from boards in the beginning and at the end of the manufacturing process in order to track the movement of individual boards. The task is challenging due to the changing appearance of boards during the process. There are changes in color, texture and physical form. Lighting conditions and camera parameters are also unknown and can change between different camera systems inside the sawmill. Before matching, image alignment is carried out by using 2-D to 1-D projection signals. Signals are generated using the statistical properties of gray scale images. Aligned images are then matched using fast and compact local descriptors. The performance of the system was evaluated using over 1000 real life images captured with visual quality control cameras integrated into the production line. A tracing accuracy of over 95% was achieved with a high confidence of the individual match.

Paper Nr: 211
Title:

Generic 3D Segmentation in Medicine based on a Self-learning Topological Model

Authors:

Gerald Zwettler and Werner Backfrieder

Abstract: Three-dimensional segmentation of medical image data is crucial in modern diagnostics and still subject of intensive research efforts. Most fully automated methods, e.g. the segmentation of the hippocampus, are highly specific for certain morphological regions and very sensitive to variations in input data, thus robustness is not sufficient to achieve sufficient accuracy to serve in differential diagnosis. In this work a processing pipeline for robust segmentation is presented. The flexibility of this novel generic segmentation method is based on entirely parameter-free pre-segmentation. Therefore a hybrid modification of the watershed algorithm is developed, employing both gradient and intensity metrics for the identification of connected regions depending on similar properties. In a further optimization step the vast number of small regions is condensed to anatomically meaningful structures by feature based classification. The core of the classification process is a topographical model of the segmented body region, representing a sufficient number of features from geometry and the texture domain. The model may learn from manual segmentation by experts or from its own results. The novel method is demonstrated for the human brain, based on the reference data set from brainweb. Results show high accuracy and the method proves to be robust. The method is easily extensible to other body regions and the novel concept shows high potential to introduce generic segmentation in the three-dimensional domain into a clinical work-flow.

Paper Nr: 221
Title:

Forensic Authentication of Data Bearing Halftones

Authors:

Stephen Pollard, Robert Ulichney, Matthew Gaubatz and Steven Simske

Abstract: This paper introduces a practical system for combining overt, covert and forensic information in a single, small printed feature. The overt “carrier” feature need not be a dedicated security mark such as a 2D or color barcode, but can instead be integrated into a desirable object such as a logo as part of the aesthetically-desired layout using steganographic halftones (Stegatones). High-resolution imaging in combination with highly accurate and robust image registration is used to recover, simultaneously, a unique identity suitable for associating a unique print with an on-line database and a unique forensic signature that is both tamper and copy sensitive.

Paper Nr: 235
Title:

Evaluation Methodology for Descriptors in Neuroimaging Studies

Authors:

M. Luna, F. Gayá, C. Cáceres, José M. Tormos and E. J. Gómez

Abstract: Automatic identification and location of brain structures is one of the main stages to process neuroimaging studies. The proposed approach consists of identifying landmarks over an image. These landmarks must have values of location and intensity variation to obtain a direct relation between detected landmarks and brain structures. Descriptors are algorithms whose function is to select and store points featuring these two types of information. There are many algorithms used to obtain descriptors. Therefore, it is necessary to select the most adequate to the type of images and context of application. It is advisable to design and develop an evaluation methodology to objectively identify appropriate algorithms. This paper proposes a new evaluation methodology for descriptors used on neuroimaging studies.

Paper Nr: 245
Title:

Image Processing Supports HCI in Museum Application

Authors:

Niki Martinel, Marco Vernier, Gian Luca Foresti and Elisabetta Lamedica

Abstract: This work introduces a novel information visualization technique for mobile devices through Augmented Reality (AR). A painting boundary detector and a features extraction modules have been implemented to compute paintings signatures. The computed signatures are matched using a linear weighted combination of the extracted features. The detected boundaries and the features are exploited to compute the homography transformations. The homography transformations are used to introduce a novel user interaction technique for AR. Three different user interfaces have been evaluated using standard usability methods.

Paper Nr: 254
Title:

Assessment of 3D Scanners for Modeling Pectus Carinatum Corrective Bar

Authors:

António H. J. Moreira, João Gomes Fonseca, Pedro L. Rodrigues, Jaime C. Fonseca, A. C. M. Pinho, Jorge Correia-Pinto, Nuno F. Rodrigues and João L. Vilaça

Abstract: Pectus Carinatum (PC) is a chest deformity consisting on the anterior protrusion of the sternum and adjacent costal cartilages. Non-operative corrections, such as the orthotic compression brace, require previous information of the patient chest surface, to improve the overall brace fit. This paper focuses on the validation of the Kinect scanner for the modelling of an orthotic compression brace for the correction of Pectus Carinatum. To this extent, a phantom chest wall surface was acquired using two scanner systems – Kinect and Polhemus FastSCAN – and compared through CT. The results show a RMS error of 3.25mm between the CT data and the surface mesh from the Kinect sensor and 1.5mm from the FastSCAN sensor.

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 78
Title:

Real-time Video-based View Interpolation of Soccer Events using Depth-selective Plane Sweeping

Authors:

Patrik Goorts, Cosmin Ancuti, Maarten Dumont, Sammy Rogmans and Philippe Bekaert

Abstract: In this paper we present a novel technique to synthesize virtual camera viewpoints for soccer events. Our real-time video-based rendering technique does not require a precise estimation of the scene geometry. We initially segment the dynamic parts of the scene to consequently estimate a depth map of the filtered foreground regions using a plane sweep strategy. The depth map is indicatively segmented to depth information per player. A consecutive plane sweep is used, where the depth sweep is limited to the depth range of each player individually, effectively removing major ghosting artifacts, such as third legs or ghost players. The background and shadows are interpolated independently. For maximum performance our technique is implemented using a combination of NVIDIA’s shaders language Cg and NVIDIA’s general purpose computing framework CUDA. The rendered results of an actual soccer game demonstrate the usability and accuracy of our framework.

Paper Nr: 101
Title:

Disjunctive Normal Form of Weak Classifiers for Online Learning based Object Tracking

Authors:

Zhu Teng and Dong-Joong Kang

Abstract: The use of a strong classifier that is combined by an ensemble of weak classifiers has been prevalent in tracking, classification etc. In the conventional ensemble tracking, one weak classifier selects a 1D feature, and the strong classifier is combined by a number of 1D weak classifiers. In this paper, we present a novel tracking algorithm where weak classifiers are 2D disjunctive normal form (DNF) of these 1D weak classifiers. The final strong classifier is then a linear combination of weak classifiers and 2D DNF cell classifiers. We treat tracking as a binary classification problem, and one full DNF can express any particular Boolean function; therefore 2D DNF classifiers have the capacity to represent more complex distributions than original weak classifiers. This can strengthen any original weak classifier. We implement the algorithm and run the experiments on several video sequences.

Paper Nr: 125
Title:

A Robust Least Squares Solution to the Relative Pose Problem on Calibrated Cameras with Two Known Orientation Angles

Authors:

Gaku Nakano and Jun Takada

Abstract: This paper proposes a robust least squares solution to the relative pose problem on calibrated cameras with two known orientation angles based on a physically meaningful optimization. The problem is expressed as a minimization problem of the smallest eigenvalue of a coefficient matrix, and is solved by using 3-point correspondences in the minimal case and more than 4-point correspondences in the least squares case. To obtain the minimum error, a new cost function based on the determinant of a matrix is proposed instead of solving the eigenvalue problem. The new cost function is not only physically meaningful, but also common in the minimal and the least squares case. Therefore, the proposed least squares solution is a true extension of the minimal case solution. Experimental results of synthetic data show that the proposed solution is identical to the conventional solutions in the minimal case and it is approximately 3 times more robust to noisy data than the conventional solution in the least squares case.

Paper Nr: 163
Title:

3D Object Reconstruction with a Single RGB-Depth Image

Authors:

Silvia Rodríguez-Jiménez, Nicolas Burrus and Mohamed Abderrahim

Abstract: This paper presents a fast method for acquiring 3D models of unknown objects lying on a table, using a single viewpoint. The proposed algorithm is able to reconstruct a full model using a single RGB + Depth image, such as those provided by available low-cost range cameras. It estimates the hidden parts by exploiting the geometrical properties of everyday objects, and combines depth and color information for a better segmentation of the object of interest. A quantitative evaluation on a set of 12 common objects shows that our approach is not only simple and effective, but also the reconstructed model is accurate enough for tasks such as robotic grasping.

Paper Nr: 205
Title:

Planar Motion and Hand-eye Calibration using Inter-image Homographies from a Planar Scene

Authors:

Mårten Wadenbäck and Anders Heyden

Abstract: In this paper we consider a mobile platform performing partial hand-eye calibration and Simultaneous Localisation and Mapping (SLAM) using images of the floor along with the assumptions of planar motion and constant internal camera parameters. The method used is based on a direct parametrisation of the camera motion, combined with an iterative scheme for determining the motion parameters from inter-image homographies. Experiments are carried out on both real and synthetic data. For the real data, the estimates obtained are compared to measurements by an industrial robot, which serve as ground truth. The results demonstrate that our method produces consistent estimates of the camera position and orientation. We also make some remarks about patterns of motion for which the method fails.

Paper Nr: 207
Title:

Let it Learn - A Curious Vision System for Autonomous Object Learning

Authors:

Pramod Chandrashekhariah, Gabriele Spina and Jochen Triesch

Abstract: We present a “curious” active vision system for a humanoid robot that autonomously explores its environment and learns object representations without any human assistance. Similar to an infant, who is intrinsically motivated to seek out new information, our system is endowed with an attention and learning mechanism designed to search for new information that has not been learned yet. Our method can deal with dynamic changes of object appearance which are incorporated into the object models. Our experiments demonstrate improved learning speed and accuracy through curiosity-driven learning.

Paper Nr: 278
Title:

Direct Depth Recovery from Motion Blur Caused by Random Camera Rotations Imitating Fixational Eye Movements

Authors:

Norio Tagawa, Shoei Koizumi and Kan Okubo

Abstract: It has been reported that small involuntary vibrations of a human eyeball for fixation called ”fixational eye movements” play a role of image analysis, for example contrast enhancement and edge detection. This mechanism can be interpreted as an instance of stochastic resonance, which is inspired by biology, more specifically by neuron dynamics. A depth recovery method has been proposed, which uses many successive image pairs generated by random camera rotations imitating fixational eye movements. This method, however, is not adequate for images having fine texture details because of an aliasing problem. To overcome this problem, we propose a new integral formed method for recovering depth, which uses motion blur caused by the same camera motions, i.e. many random small camera rotations. As an algorithm, we examine a method directly recovering depth without computing a blur function. To confirm the feasibility of our scheme, we perform simulations using artificial images.

Short Papers
Paper Nr: 21
Title:

Anisotropic Median Filtering for Stereo Disparity Map Refinement

Authors:

Nils Einecke and Julian Eggert

Abstract: In this paper we present a novel method for refining stereo disparity maps that is inspired by both simple median filtering and edge-preserving anisotropic filtering. We argue that a combination of these two techniques is particularly effective for reducing the fattening effect that typically occurs for block-matching stereo algorithms. Experiments show that the newly proposed post-refinement can propel simple patch-based algorithms to much higher ranks in theMiddlebury stereo benchmark. Furthermore, a comparison to state-of-the-artmethods for disparity refinement shows a similar accuracy improvement but at only a fraction of the computational effort. Hence, this approach can be used in systems with restricted computational power.

Paper Nr: 57
Title:

Image Guided Cost Aggregation for Hierarchical Depth Map Fusion

Authors:

Thilo Borgmann and Thomas Sikora

Abstract: Estimating depth from a video sequence is still a challenging task in computer vision with numerous applications. Like other authors we utilize two major concepts developed in this field to achieve that task which are the hierarchical estimation of depth within an image pyramid as well as the fusion of depth maps from different views. We compare the application of various local matching methods within such a combined approach and can show the relative performance of local image guided methods in contrast to commonly used fixed–window aggregation. Since efficient implementations of these image guided methods exist and the available hardware is rapidly enhanced, the disadvantage of their more complex but also parallel computation vanishes and they will become feasible for more applications.

Paper Nr: 63
Title:

3D Reconstruction of Interreflection-affected Surface Concavities using Photometric Stereo

Authors:

Steffen Herbort, Daniel Schugk and Christian Wöhler

Abstract: Image-based reconstruction of 3D shapes is inherently biased under the occurrence of interreflections, since the observed intensity at surface concavities consists of direct and global illumination components. This issue is commonly not considered in a Photometric Stereo (PS) framework. Under the usual assumption of only direct reflections, this corrupts the normal estimation process in concave regions and thus leads to inaccurate results. For this reason, global illumination effects need to be considered for the correct reconstruction of surfaces affected by interreflections. While there is ongoing research in the field of inverse lighting (i.e. separation of global and direct illumination components), the interreflection aspect remains oftentimes neglected in the field of 3D shape reconstruction. In this study, we present a computationally driven approach for iteratively solving that problem. Initially, we introduce a photometric stereo approach that roughly reconstructs a surface with at first unknown reflectance properties. Then, we show that the initial surface reconstruction result can be refined iteratively regarding non-distant light sources and, especially, interreflections. The benefit for the reconstruction accuracy is evaluated on real Lambertian surfaces using laser range scanner data as ground truth.

Paper Nr: 64
Title:

Real-Time Estimation of Camera Orientation by Tracking Orthogonal Vanishing Points in Videos

Authors:

Wael Elloumi, Sylvie Treuillet and Rémy Leconge

Abstract: In man-made urban environments, vanishing points are pertinent visual cues for navigation task. But estimating the orientation of an embedded camera relies on the ability to find a reliable triplet of orthogonal vanishing points in real-time. Based on previous works, we propose a pipeline to achieve an accurate estimation of the camera orientation while preserving a short processing time. Our algorithm pipeline relies on two contributions: a novel sampling strategy among finite and infinite vanishing points extracted with a RANSAC-based line clustering and a tracking along a video sequence to enforce the accuracy and the robustness by extracting the three most pertinent orthogonal directions. Experiments on real images and video sequences show that the proposed strategy for selecting the triplet of vanishing points is pertinent as our algorithm gives better results than the recently published RNS optimal method (Mirzaei, 2011), in particular for the yaw angle, which is actually essential for navigation task.

Paper Nr: 83
Title:

3D Face Pose Tracking using Low Quality Depth Cameras

Authors:

Ahmed Rekik, Achraf Ben-Hamadou and Walid Mahdi

Abstract: This paper presents a new method for 3D face pose tracking in color image and depth data acquired by RGB-D (i.e., color and depth) cameras (e.g., Microsoft Kinect, Canesta, etc.). The method is based on a particle filter formalism and its main contribution lies in the combination of depth and image data to face the poor signal-to-noise ratio of low quality RGB-D cameras. Moreover, we consider a visibility constraint to handle partial occlusions of the face. We demonstrate the accuracy and the robustness of our method by performing a set of experiments on the Biwi Kinect head pose database.

Paper Nr: 110
Title:

Human Motion Analysis under Actual Sports Game Situations - Sequential Multi-decay Motion History Image Matching

Authors:

Dan Mikami, Toshitaka Kimura, Koji Kadota, Harumi Kawamura and Akira Kojima

Abstract: This paper proposes a sequential multi-decay motion history image matching with the aim of analyzing human motions captured in actual game situations without subjecting people to any intrusive measures. The motion history image (MHI) is a well- known motion representation method, which can be used without foreground detection. In MHIs, pixels on which motion is detected have large pixel values. As time elapses following the latest motion detection, the values decrease according to a decay parameter. Two improvements were made to enable MHI-based template matching to be applied to motion analysis; introducing a template MHI sequence matching process that enables analysis of the temporal development of motions and extending MHIs to include multiple decay parameters. Due to the MHI sequence, a reference motion includes target motions of various speeds. Since the appropriate decay parameter varies with motion speed, no one predefined decay parameter can be the best one. These improvements enable our method to effectively analyze human motions in actual game situations. Experiments carried out indoors with capturing of 3D motion data and outdoors under real games situations verified the effectiveness of the proposed method.

Paper Nr: 120
Title:

Probabilistic View-based 3D Curve Skeleton Computation on the GPU

Authors:

Jacek Kustra, Andrei Jalba and Alexandru Telea

Abstract: Computing curve skeletons of 3D shapes is a challenging task. Recently, a high-potential technique for this task was proposed, based on integrating medial information obtained from several 2D projections of a 3D shape (Livesu et al., 2012). However effective, this technique is strongly influenced in terms of complexity by the quality of a so-called skeleton probability volume, which encodes potential 3D curve-skeleton locations. In this paper, we extend the above method to deliver a highly accurate and discriminative curve-skeleton probability volume. For this, we analyze the error sources of the original technique, and propose improvements in terms of accuracy, culling false positives, and speed. We show that our technique can deliver point-cloud curve-skeletons which are close to the desired locations, even in the absence of complex postprocessing. We demonstrate our technique on several 3D models.

Paper Nr: 135
Title:

RGB-D Tracking and Reconstruction for TV Broadcasts

Authors:

Tommi Tykkälä, Hannu Hartikainen, Andrew I. Comport and Joni-Kristian Kämäräinen

Abstract: In this work, a real-time image-based camera tracking solution is developed for television broadcasting studio environments. An affordable vision-based system is proposed which can compete with expensive matchmoving systems. The system requires merely commodity hardware: a low cost RGB-D sensor and a standard laptop. The main contribution is avoiding time-evolving drift by tracking relative to a pre-recorded keyframe model. Camera tracking is defined as a registration problem between the current RGB-D measurement and the nearest keyframe. The keyframe poses contain only a small error and therefore the proposed method is virtually driftless. Camera tracking precision is compared to KinectFusion, which is a recent method for simultaneous camera tracking and 3D reconstruction. The proposed method is tested in a television broadcasting studio, where it demonstrates driftless and precise camera tracking in real-time.

Paper Nr: 142
Title:

Articulated Object Modeling based on Visual and Haptic Observations

Authors:

Wei Wang, Vasiliki Koropouli, Dongheui Lee and Kolja Kühnlenz

Abstract: Manipulation of articulated objects constitutes an important and hard challenge for robots. This paper proposes an approach to model articulated objects by integrating visual and haptic information. Line-shaped skeletonization based on depth image data is realized to extract the skeleton of an object given different configurations. Using observations of the extracted object’s skeleton topology, the kinematic joints of the object are characterized and localized. Haptic data in the form of task-space force required to manipulate the object, are collected by kinesthetic teaching and learned by Gaussian Mixture Regression in object joint state space. Following modeling, manipulation of the object is realized by first identifying the current object joint states from visual observations and second generalizing learned force to accomplish the new task.

Paper Nr: 150
Title:

OpenOF - Framework for Sparse Non-linear Least Squares Optimization on a GPU

Authors:

Cornelius Wefelscheid and Olaf Hellwich

Abstract: In the area of computer vision and robotics non-linear optimization methods have become an important tool. For instance, all structure from motion approaches apply optimizations such as bundle adjustment (BA). Most often, the structure of the problem is sparse regarding the functional relations of parameters and measurements. The sparsity of the system has to be modeled within the optimization in order to achieve good performance. With OpenOF, a framework is presented, which enables developers to design sparse optimizations regarding parameters and measurements and utilize the parallel power of a GPU. We demonstrate the universality of our framework using BA as example. The performance and accuracy is compared to published implementations for synthetic and real world data.

Paper Nr: 159
Title:

Direct Estimation of the Backward Flow

Authors:

Javier Sánchez, Agustin Salgado and Nelson Monzón

Abstract: The aim of this work is to propose a new method for estimating the backward flow directly from the optical flow. We assume that the optical flow has already been computed and we need to estimate the inverse mapping. This mapping is not bijective due to the presence of occlusions and disocclusions, therefore it is not possible to estimate the inverse function in the whole domain. Values in these regions has to be guessed from the available information. We propose an accurate algorithm to calculate the backward flow uniquely from the optical flow, using a simple relation. Occlusions are filled by selecting the maximum motion and disocclusions are filled with two different strategies: a min-fill strategy, which fills each disoccluded region with the minimum value around the region; and a restricted min-fill approach that selects the minimum value in a close neighborhood. In the experimental results, we show the accuracy of the method and compare the results using these two strategies.

Paper Nr: 173
Title:

3D Representation Models Construction through a Volume Geometric Decomposition Method

Authors:

Gisele Simas, Rodrigo de Bem and Silvia Botelho

Abstract: Despite the fact of 3D motion tracking has being highly explored in the computer vision researches, it still faces some relevant challenges, such as the tracking of objects using few a priori knowledge. In this context, this work presents the Volume Geometric Decomposition method, capable of constructing representation models of distinct and previously unknown objects. This method is executed over a probabilistic volumetric reconstruction of the interested objects. It adjusts the representation to the reconstructed volume, minimizing the amount of empty space enclosed by the model. Such representation model is composed by an appearance and a kinematic models. The former is comprised of ellipsoids and joints, while the latter is implemented through the Loose-Limbed model, a probabilistic graphical model. The performed experiments and the obtained results shown that the proposed method successfully constructed representation models to highly distinct and a priori unknown objects.

Paper Nr: 174
Title:

Visual Tracking with Similarity Matching Ratio

Authors:

Aysegul Dundar, Jonghoon Jin and Eugenio Culurciello

Abstract: This paper presents a novel approach to visual tracking: Similarity Matching Ratio (SMR). The traditional approach of tracking is minimizing some measures of the difference between the template and a patch from the frame. This approach is vulnerable to outliers and drastic appearance changes and an extensive study is focusing on making the approach more tolerant to them. However, this often results in longer, corrective algorithms which do not solve the original problem. This paper proposes a novel approach to the definition of the tracking problems, SMR, which turns the differences into probability measures. Only pixel differences below a threshold count towards deciding the match, the rest are ignored. This approach makes the SMR tracker robust to outliers and points that dramatically change appearance. The SMR tracker is tested on challenging video sequences and achieves state-of-the-art performance.

Paper Nr: 182
Title:

3D Invariants from Coded Projection without Explicit Correspondences

Authors:

Kenta Suzuki, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a method for computing stable 3D features for 3D object recognition. The feature is projective invariant computed from 3D information which is based on disparity of two projectors. In our method, the disparity can be estimated just from image intensity without obtaining any explicit corresponding points. Thus, we do not need any image matching method in order to obtain corresponding points. This means that we can avoid any kind of problems arise from image matching essentially. Therefore, we can compute 3D invariant features from the 3D information reliably. The experimental results show our proposed invariant feature is useful for 3D object recognition.

Paper Nr: 189
Title:

Genetic Algorithm for Stereo Correspondence with a Novel Fitness Function and Occlusion Handling

Authors:

Alvaro Arranz, Manuel Alvar, Jaime Boal, Alvaro Sanchez-Miralles and Arturo de la Escalera

Abstract: This paper proposes a genetic algorithm for solving the stereo correspondence problem. Applied to stereo, genetic algorithms are flexible in the cost function and permit global reasoning. The main contribution of this paper is a new crossover and a mutation operator which accounts for occlusion management and a new fitness function which considers occluded pixels and photometric derivatives. Both left and right disparity images are analysed in order to classify occluded pixels correctly. The proposed fitness function is compared to the traditional energy function based in the framework of the Markov Random Fields. The results show that a 32% bad-pixel error reduction can be achieved on average using the proposed fitness function. The results have been uploaded to the Middlebury ranking webpage, as the first evolutionary algorithm evaluated.

Paper Nr: 196
Title:

Weighted Joint Bilateral Filter with Slope Depth Compensation Filter for Depth Map Refinement

Authors:

Takuya Matsuo, Norishige Fukushima and Yutaka Ishibashi

Abstract: In this paper, we propose a new refinement filter for depth maps. The filter convolutes a depth map by a jointly computed kernel on a natural image with a weight map. We call the filter weighted joint bilateral filter. The filter fits an outline of an object in the depth map to the outline of the object in the natural image, and it reduces noises. An additional filter of slope depth compensation filter removes blur across object boundary. The filter set’s computational cost is low and is independent of depth ranges. Thus we can refine depth maps to generate accurate depth map with lower cost. In addition, we can apply the filters for various types of depth map, such as computed by simple block matching, Markov random field based optimization, and Depth sensors. Experimental results show that the proposed filter has the best performance of improvement of depth map accuracy, and the proposed filter can perform real-time refinement.

Paper Nr: 214
Title:

Stable Keypoint Recognition using Viewpoint Generative Learning

Authors:

Takumi Yoshida, Hideo Saito, Masayoshi Shimizu and Akinori Taguchi

Abstract: We propose a stable keypoint recognition method that is robust to viewpoint changes. Conventional local features such as SIFT, SURF, etc., have scale and rotation invariance but often fail in matching points when the camera pose significantly changes. In order to solve this problem, we adopt viewpoint generative learning. By generating various patterns as seen from different viewpoints and collecting local invariant features, our system can learn feature descriptors under various camera poses for each keypoint before actual matching. Experimental results comparing usual local feature matching or patch classification method show both robustness and fastness of learning.

Paper Nr: 229
Title:

Learning Multi-class Topological Mapping using Visual Information

Authors:

Anna Romero and Miguel Cazorla

Abstract: Mapping of an unknown environment is an important area within robotics. The map obtained can be used in more complex problems such as localisation, scene recognition, navigation, SLAM, etc.. Topological maps, inspired by the human mental description of their surroundings, do not seek for accurate measures, but for the classification of the real environment in areas containing distinctive features that differentiate them from other areas. The use of learning techniques can help us to define different areas of the environment so that the robot can recognise them later. In this paper, we propose the use of Samme algorithm, a supervised learning method based on AdaBoost to select the best visual features that describe each area of a topological map.

Paper Nr: 251
Title:

Robust Guided Matching and Multi-layer Feature Detection Applied to High Resolution Spherical Images

Authors:

Christiano Couto Gava, Alain Pagani, Bernd Krolla and Didier Stricker

Abstract: We present a novel, robust guided matching technique. Given a set of calibrated spherical images along with the associated sparse 3D point cloud, our approach consistently finds matches across the images in a multilayer feature detection framework. New feature matches are used to refine existing 3D points or to add reliable ones to the point cloud, therefore improving scene representation. We use real indoor and outdoor scenarios to validate the robustness of the proposed approach. Moreover, we perform a quantitative evaluation of our technique to demonstrate its effectiveness.

Paper Nr: 303
Title:

3D Face Pose Tracking from Monocular Camera via Sparse Representation of Synthesized Faces

Authors:

Ngoc-Trung Tran, Jacques Feldmar, Maurice Charbit, Dijana Petrovska-Delacrétaz and Gérard Chollet

Abstract: This paper presents a new method to track head pose efficiently from monocular camera via sparse representation of synthesized faces. In our framework, the appearance model is trained using a database of synthesized face generated from the first video frame. The pose estimation is based on the similarity distance between the observations of landmarks and their reconstructions. The reconstruction is the texture extracted around the landmark, represented as a sparse linear combination of positive training samples after solving l1-norm problem. The approach finds the position of new landmarks and face pose by minimizing an energy function as the sum of these distances while simultaneously constraining the shape by a 3D face. Our framework gives encouraging pose estimation results on the Boston University Face Tracking (BUFT) dataset.

Paper Nr: 304
Title:

Shape from Multi-view Images based on Image Generation Consistency

Authors:

Kousuke Wakabayashi, Norio Tagawa and Kan Okubo

Abstract: There are various and a lot of depth recovery methods have been studied, but a discussion about an unification of individual methods is expected not to be enough yet. In this study, we argue that the importance and the necessity of an image generation consistency. Various clues including binocular disparity, motion parallax, texture, shading and so on can be effectively used for depth recovery for the case where each or some of those are completely performed. However, in general, those clues without shading cause ideal and simplified constraints, and for several cases those clues without shading are suitable for obtaining initial depth values for the unification algorithm based on an image generation consistency. On the other hand, shading indicates a strict characteristics for image generation and should be used for the key principle for the unification. Based on the above strategy, as a first step of our scheme, through a simple problem with two-views, the unification of binocular disparity and shading without explicit disparity detection is examined based on an image generation consistency, and simple evaluation results are shown by simulations.

Paper Nr: 313
Title:

Consensus-based Inter-camera Re-identification - Across Non-overlapping Views

Authors:

Fouad bousetouane, Cina Motamed and Lynda Dib

Abstract: Multi-object re-identification across cameras network with non-overlapping fields of view is a challenging problem. Firstly, the visual signature of the same object might be very different from one camera to another. Secondly, the blind zone between cameras creates the discontinuity in the observation of the same object in terms of locations and travelling times. Centralized inferences proposed in literature for inter-camera re-identification becomes insufficient in practice mostly with the requirement of real-time applications and dynamic cameras network. In this paper we present a completely distributed approach for inter-camera reidentification. The proposed approach based on the distributed inferences, where the set of smart-cameras collaborate to reach a consensus about the identities of objects circulating in the network. Local and global visual descriptors were combined into the proposed approach for inter-camera color mapping and invariant objects description. Experimental results of applying this approach show improvement in inter-camera reidentification and robustness in recovering from very complex conditions.

Paper Nr: 315
Title:

Model-less 3D Head Pose Estimation using Self-optimized Local Discriminant Embedding

Authors:

F. Dornaika, A. Bosgahzadeh and A. Assoum

Abstract: In this paper, we propose a self-optimized Local Discriminant Embedding and apply it to the problem of model-less 3D head pose estimation. Recently, Local Discriminant Embedding (LDE) method was proposed in order to tackle some limitations of the global Linear Discriminant Analysis (LDA) method. In order to better characterize the discriminant property of the data, LDE builds two adjacency graphs: the within-class adjacency graph and the between-class adjacency graph. However, it is very difficult to set in advance these two graphs. Our proposed self-optimized LDE has two important characteristics: (i) while all graph-based manifold learning techniques (supervised and unsupervised) are depending on several parameters that require manual tuning, ours is parameter-free, and (ii) it adaptively estimates the local neighborhood surrounding each sample based on the data similarity. The resulting self-optimized LDE approach has been applied to the problem of model-less coarse 3D head pose estimation (person independent 3D pose estimation). It was tested on two large databases: FacePix and Pointing’04. It was conveniently compared with other linear techniques. The experimental results confirm that our method outperforms, in general, the existing ones.

Posters
Paper Nr: 6
Title:

Multiple Hypotheses Multiple Levels Object Tracking

Authors:

Ronan Sicre and Henri Nicolas

Abstract: This paper presents an object tracking system. Our goal is to create a real-time object tracker that can handle occlusions, track multiple objects that are rigid or deformable, and on indoor or outdoor sequences. This system is composed of two main modules: motion detection and object tracking. Motion detection is achieved using an improved Gaussian mixture model. Based on multiple hypothesis of object appearance, tracking is achieved on various levels. The core of this module uses regions local and global information to match these regions over the frame sequence. Then higher level instances are used to handle uncertainty, such as missmatches, objects disappearance, and occlusions. Finally, merges and splits are detected for further occlusions detection.

Paper Nr: 14
Title:

6D Visual Odometry with Dense Probabilistic Egomotion Estimation

Authors:

Hugo Silva, Alexandre Bernardino and Eduardo Silva

Abstract: We present a novel approach to 6D visual odometry for vehicles with calibrated stereo cameras. A dense probabilistic egomotion (5D) method is combined with robust stereo feature based approaches and Extended Kalman Filtering (EKF) techniques to provide high quality estimates of vehicle’s angular and linear velocities. Experimental results show that the proposed method compares favorably with state-the-art approaches, mainly in the estimation of the angular velocities, where significant improvements are achieved.

Paper Nr: 20
Title:

Optical Flow Estimation with Consistent Spatio-temporal Coherence Models

Authors:

Javier Sánchez, Agustín Salgado and Nelson Monzón

Abstract: In this work we propose a new variational model for the consistent estimation of motion fields. The aim of this work is to develop appropriate spatio-temporal coherence models. In this sense, we propose two main contributions: a nonlinear flow constancy assumption, similar in spirit to the nonlinear brightness constancy assumption, which conveniently relates flow fields at different time instants; and a nonlinear temporal regularization scheme, which complements the spatial regularization and can cope with piecewise continuous motion fields. These contributions pose a congruent variational model since all the energy terms, except the spatial regularization, are based on nonlinear warpings of the flow field. This model is more general than its spatial counterpart, provides more accurate solutions and preserves the continuity of optical flows in time. In the experimental results, we show that the method attains better results and, in particular, it considerably improves the accuracy in the presence of large displacements.

Paper Nr: 22
Title:

Human Detection and Tracking under Complex Activities

Authors:

Brais Cancela, M. Ortega and Manuel G. Penedo

Abstract: Multiple-target tracking is a challenging question when dealing with complex activities. Situations like partial occlusions in grouping events or sudden target orientation changes introduce complexity in the detection which is difficult to solve. In particular, when dealing with human beings, often the head is the only visible part. Techniques based in upper body achieve good results in general, but fail to provide a good tracking accuracy in the kind of situations mentioned before. We present a new methodology for provide a full tracking system under complex activities. A combination of three different techniques is used to overcome the problems mentioned before. Experimental results in sport sequences show both the speed and performance of this technique.

Paper Nr: 46
Title:

Hybrid Iterated Kalman Particle Filter for Object Tracking Problems

Authors:

Amr M. Nagy, Ali Ahmed and Hala H. Zayed

Abstract: Particle Filters (PFs), are widely used where the system is non Linear and non Gaussian. Choosing the importance proposal distribution is a key issue for solving nonlinear filtering problems. Practical object tracking problems encourage researchers to design better candidate for proposal distribution in order to gain better performance. In this correspondence, a new algorithm referred to as the hybrid iterated Kalman particle filter (HIKPF) is proposed. The proposed algorithm is developed from unscented Kalman filter (UKF) and iterated extended Kalman filter (IEKF) to generate the proposal distribution, which lead to an efficient use of the latest observations and generates more close approximation of the posterior probability density. Comparing with previously suggested methods(e.g PF, PF-EKF, PF-UKF, PF-IEKF), our proposed method shows a better performance and tracking accuracy. The correctness as well as validity of the algorithm is demonstrated through numerical simulation and experiment results.

Paper Nr: 99
Title:

Pedestrian Tracking based on 3D Head Point Detection

Authors:

Zhongchuan Zhang and Fernand Cohen

Abstract: In this paper, we introduce a 3D pedestrian tracking method based on 3D head point detection in indoor environment, such as train stations, airports, shopping malls and hotel lobbies where the ground can be non-flat. We also show that our approach is effective and efficient in capturing close-up facial images using pan-tilt-zoom (PTZ) cameras. We use two horizontally displaced overhead cameras to track pedestrians by estimating the accurate 3D position of their heads. The 3D head point is then tracked using common assumptions on motion direction and velocity. Our method is able to track pedestrians in 3D space no matter if the pedestrian is walking on a planar or a non-planar surface. Moreover, we make no assumption about the pedestrians’ heights, nor do we have to generate the full disparity map of the scene. The tracking system architecture allows for a real time capturing of high quality facial images by guiding PTZ cameras. The approach is tested using a publicly available visual surveillance simulation test bed.

Paper Nr: 129
Title:

Continuous Tracking of Structures from an Image Sequence

Authors:

Yann Lepoittevin, Dominique Béréziat, Isabelle Herlin and Nicolas Mercier

Abstract: The paper describes an innovative approach to estimate velocity on an image sequence and simultaneously segment and track a given structure. It relies on the underlying dynamics’ equations of the studied physical system. A data assimilation method is applied to solve evolution equations of image brightness, those of motion’s dynamics, and those of distance map modelling the tracked structures. Results are first quantified on synthetic data with comparison to ground-truth. Then, the method is applied on meteorological satellite acquisitions of a tropical cloud, in order to track this structure on the sequence. The outputs of the approach are the continuous estimation of both motion and structure’s boundary. The main advantage is that the method only relies on image data and on a rough segmentation of the structure at initial date.

Paper Nr: 140
Title:

Real-time Multiple Abnormality Detection in Video Data

Authors:

Simon Hartmann Have, Huamin Ren and Thomas B. Moeslund

Abstract: Automatic abnormality detection in video sequences has recently gained an increasing attention within the research community. Although progress has been seen, there are still some limitations in current research. While most systems are designed at detecting specific abnormality, others which are capable of detecting more than two types of abnormalities rely on heavy computation. Therefore, we provide a framework for detecting abnormalities in video surveillance by using multiple features and cascade classifiers, yet achieve above real-time processing speed. Experimental results on two datasets show that the proposed framework can reliably detect abnormalities in the video sequence, outperforming the current state-of-the-art methods.

Paper Nr: 144
Title:

Sparse Motion Segmentation using Propagation of Feature Labels

Authors:

Pekka Sangi, Jari Hannuksela, Janne Heikkilä and Olli Silvén

Abstract: The paper considers the problem of extracting background and foreground motions from image sequences based on the estimated displacements of a small set of image blocks. As a novelty, the uncertainty of local motion estimates is analyzed and exploited in the fitting of parametric object motion models which is done within a competitive framework. Prediction of patch labels is based on the temporal propagation of labeling information from seed points in spatial proximity. Estimates of local displacements are then used to predict the object motions which provide a starting point for iterative refinement. Experiments with both synthesized and real image sequences show the potential of the approach as a tool for tracking based online motion segmentation.

Paper Nr: 168
Title:

Dynamic 3D Mapping - Visual Estimation of Independent Motions for 3D Structures in Dynamic Environments

Authors:

Juan Carlos Ramirez and Darius Burschka

Abstract: This paper describes an approach to consistently model and characterize potential object candidates presented in non-static scenes. With a stereo camera rig we recollect and collate range data from different views around a scene. Three principal procedures support our method: i) the segmentation of the captured range images into 3D clusters or blobs, by which we obtain a first gross impression of the spatial structure of the scene, ii) the maintenance and reliability of the map, which is obtained through the fusion of the captured and mapped data to which we assign a degree of existence (confidence value), iii) the visual motion estimation of potential object candidates, through the combination of the texture and 3D-spatial information, allows not only to update the state of the actors and perceive their changes in a scene, but also to maintain and refine their individual 3D structures over time. The validation of the visual motion estimation is supported by a dual-layered 3Dmapping framework in which we are able to store the geometric and abstract properties of the mapped entities or blobs, and determine which entities were moved in order to update the map to the actual scene state.

Paper Nr: 198
Title:

Real-time Appearance-based Person Re-identification Over Multiple KinectTM Cameras

Authors:

Riccardo Satta, Federico Pala, Giorgio Fumera and Fabio Roli

Abstract: Person re-identification consists of recognizing a person over different cameras, using appearance cues. We investigate the deployment of real-world re-identification systems, by developing and testing a working prototype. We focus on two practical issues: computational complexity, and reliability of segmentation and tracking. The former is addressed by using a recently proposed fast re-identification method, the latter by using Kinect cameras. To our knowledge, this is the first example of a fully-functional re-identification system based on Kinect in the literature. We finally point out possible improvements and future research directions.

Paper Nr: 212
Title:

External Cameras and a Mobile Robot for Enhanced Multi-person Tracking

Authors:

A. A. Mekonnen, F. Lerasle and A. Herbulot

Abstract: In this paper, we present a cooperative multi-person tracking system between external fixed-view wall mounted cameras and a mobile robot. The proposed system fuses visual detections from the external cameras and laser based detections from a mobile robot, in a centralized manner, employing a “tracking-by-detection” approach within a Particle Filtering scheme. The enhanced multi-person tracker’s ability to track targets in the surveilled area distinctively is demonstrated through quantitative experiments.

Paper Nr: 280
Title:

Visual-based Natural Landmark Tracking Method to Support UAV Navigation over Rain Forest Areas

Authors:

Felipe A. Pinagé, José Reginaldo H. Cavalho and José P. Queiroz Neto

Abstract: Field application of unmanned aerials vehicles (UAVs) have increased in the last decade. An example of difficult task is long endurance missions over rain forest, due to the uniform pattern of the ground. In this scenario an embedded vision system plays a critical role. This paper presents a SIFT adaptation towards a vision system able to track landmarks in forest areas, using wavelet transform to suppression of nonrelevant features, in order to support UASs navigation. Preliminary results demonstrated that this method can correctly track a sequence of natural landmarks in a feasible time for online applications.

Paper Nr: 282
Title:

Automatic Geometric Projector Calibration - Application to a 3D Real-time Visual Feedback

Authors:

Radhwan Ben Madhkour, Matei Mancas and Bernard Gosselin

Abstract: In this paper, we present a fully automatic method for the geometric calibration of a video projector. The approach is based on the Heikkila’s camera calibration algorithm. It combines Gray coded structured light patterns projection and a RGBD camera. Any projection surface can be used. Intrinsic and extrinsic parameters are computed without a scale factor uncertainty and any prior knowledge about the projector and the projection surface. While the structured light provides pixel to pixel correspondences between the projector and the camera, the depth map provides the 3D coordinates of the projected points. Couples of pixel coordinates and their corresponding 3D coordinates are established and used as input for the Heikkila’s algorithm. The projector calibration is used as a basis to augment the scene with information from the RGBD camera in real-time.

Paper Nr: 306
Title:

Spatial Image Display using Double-sided Lenticular or Fly’s Eye Lens Sheets

Authors:

Naoki Kira and Kazuhisa Yanaka

Abstract: In this paper, a novel spatial image display system is described in which the 3D image of a real object is displayed as if it were floating at a position considerably distant from the screen. In our system, double-sided lenticular or fly’s eye lens sheets are used. The light rays emitted from a point on the object are refracted by the double-sided lenses sheets and meet together in the space. Therefore a real image that appears to be floating in the air is formed. Since our system can be produced with only a single material like transparent plastic and no corner mirrors are necessary, it is suitable for mass-production with metal molds, and therefore, it is much more inexpensive than existing technologies.

Paper Nr: 319
Title:

3D Gesture Recognition by Superquadrics

Authors:

Ilya Afanasyev and Mariolino De Cecco

Abstract: This paper presents 3D gesture recognition and localization method based on processing 3D data of hands in color gloves acquired by 3D depth sensor, like Microsoft Kinect. RGB information of every 3D datapoints is used to segment 3D point cloud into 12 parts (a forearm, a palm and 10 for fingers). The object (a hand with fingers) should be a-priori known and anthropometrically modeled by SuperQuadrics (SQ) with certain scaling and shape parameters. The gesture (pose) is estimated hierarchically by RANSAC-object search with a least square fitting the segments of 3D point cloud to corresponding SQ-models: at first – a pose of the hand (forearm & palm), and then positions of fingers. The solution is verified by evaluating the matching score, i.e. the number of inliers corresponding to the appropriate distances from SQ surfaces and 3D datapoints, which are satisfied to an assigned distance threshold.