VISAPP 2015 Abstracts


Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 13
Title:

Combined Bilateral Filter for Enhanced Real-time Upsampling of Depth Images

Authors:

Oliver Wasenmüller, Gabriele Bleser and Didier Stricker

Abstract: Highly accurate depth images at video frame rate are required in many areas of computer vision, such as 3D reconstruction, 3D video capturing or manufacturing. Nowadays low cost depth cameras, which deliver a high frame rate, are widely spread but suffer from a high level of noise and a low resolution. Thus, a sophisticated real time upsampling algorithm is strongly required. In this paper we propose a new sensor fusion approach called Combined Bilateral Filter (CBF) together with the new Depth Discontinuity Preservation (DDP) post processing, which combine the information of a depth and a color sensor. Thereby we especially focus on two drawbacks that are common in related algorithms namely texture copying and upsampling without depth discontinuity preservation. The output of our algorithm is a higher resolution depth image with essentially reduced noise, no aliasing effects, no texture copying and very sharply preserved edges. In a ground truth comparison our algorithm was able to reduce the mean error up to 73% within around 30ms. Furthermore, we compare our method against other state of the art algorithms and obtain superior results.

Paper Nr: 220
Title:

An Optimum-Rounding 5/3 IWT based on 2-Level Decomposition for Lossless/Lossy Image Compression

Authors:

Somchart Chokchaitam

Abstract: Lifting structures and rounding operations are main tools to construct integer wavelet transforms (IWT) that are well applied in lossless/lossy compression. However, the rounding operation generates its non-linear noise that makes its performance worse. In this report, we propose a new optimum-rounding 5/3 IWT based on 2-level decomposition for lossless/lossy image compression. Our proposed 5/3 IWT is designed to reduce rounding operation as much as possible. Filter characteristics of our proposed 5/3 IWT are the same as the conventional 2-level 2D 5/3 IWT excluded rounding effect. Coding performances of the proposed 5/3 IWT are better than those of conventional 5/3 IWT in lossy performance, because of reduction of rounding effects. Especially, its performance in near lossless compression is much better than the conventional one. However, they have almost the same lossless performance. Simulation results confirm effectiveness of our proposed 5/3 IWT.

Short Papers
Paper Nr: 52
Title:

Color Restoration for Infrared Cutoff Filter Removed RGBN Multispectral Filter Array Image Sensor

Authors:

Chul Hee Park, Hyun Mook Oh and Moon Gi Kang

Abstract: Imaging systems based on multispectral filter arrays(MSFA) can simultaneously acquire wide spectral information. A MSFA image sensor with R, G, B, and near-infrared(NIR) filters can obtain the mixed spectral information of visible bands and that of the NIR bands. Since the color filter materials used in MSFA sensors were almost transparent in the NIR range, the observed colors of multispectral images were degraded by the additional NIR spectral band information. To overcome this color degradation, a new signal processing approach is needed to separate the spectral information of visible bands from the mixed spectral information. In this paper, a color restoration method for imaging systems based on MSFA sensors is proposed. The proposed method restores the received image by removing NIR band spectral information from the mixed wide spectral information. To remove additional spectral information of the NIR band, spectral estimation and spectral decomposition were performed based on the spectral characteristics of the MSFA sensor. The experimental results show that the proposed method restored color information by removing unwanted NIR contributions to the RGB color channels.

Paper Nr: 59
Title:

Least Square based Multi-spectral Color Interpolation Algorithm for RGB-NIR Image Sensors

Authors:

Ji Yong Kwon, Chul Hee Park and Moon Gi Kang

Abstract: The use of near-infrared (NIR) band gives us additional invisible information to discriminate objects and enables us to recognize objects more clearly under low light conditions. To acquire color and NIR bands together in a single image sensor developed from a conventional color filter array (CFA), we use a multispectral filter array (MSFA) in the RGB-NIR sensors and design a color interpolation algorithm to fill the information about the multi-spectral (MS) bands from the subsampled MSFA image. Aliasing in the MSFA image caused by the subsampled bands is minimized by balancing the energy of the bands. A panchromatic (PAN) image is generated by filtering the low-pass kernel to the MSFA image. This PAN image without chrominance signals, which contains the most high-frequency in the MSFA image, is used to reconstruct the MS images by solving the least square cost function between the PAN and MS images. The experiments show that the proposed algorithm estimates the high-resolution MS images very well.

Paper Nr: 95
Title:

Periodic Patterns Recovery for Multicamera Calibration

Authors:

Lorenzo Sorgi and Andrey Bushnevskiy

Abstract: Camera calibration is an essential step for most computer vision applications. This task usually requires the consistent detection of a 2D periodic pattern across multiple views and in practice one of the main difficulties is a correct localization of the pattern origin and its orientation in case of partial occlusion. To overcome this problem many calibration tools require a full visibility of the calibration pattern, which is not always possible, especially when a multicamera systems are used. This paper addresses the specific problem of consistent recovery of the calibration pattern, captured by a multicamera systems under the condition of partial occlusion of the calibration object in several (even all) calibration images. The proposed algorithm is structured in two sequential steps aimed at the removal of the rotational and the translational components of the pattern offset transformation, which is essential for a correct calibration. The paper focuses on two common calibration patterns, the checkerboard grid and the bundle of parallel lines; however, the technique can be easily rearranged in order to cope with other classes of periodic patterns. The algorithm effectiveness has been successfully proven on the simulated data and two real calibration datasets, captured using a fisheye stereo rig.

Paper Nr: 134
Title:

A Comparison between Multi-Layer Perceptrons and Convolutional Neural Networks for Text Image Super-Resolution

Authors:

Clément Peyrard, Franck Mamalet and Christophe Garcia

Abstract: We compare the performances of several Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (ConvNets) for single text image Super-Resolution. We propose an example-based framework for both MLP and ConvNet, where a non-linear mapping between pairs of patches and high-frequency pixel values is learned. We then demonstrate that for equivalent complexity, ConvNets are better than MLPs at predicting missing details in upsampled text images. To evaluate the performances, we make use of a recent database (ULR-textSISR-2013a) along with different quality measures. We show that the proposed methods outperforms sparse coding-based methods for this database.

Paper Nr: 167
Title:

Round Colour Space for Pentachromacy - Circularity of Multidimensional Hue

Authors:

Alfredo Restrepo Palacios

Abstract: We generalize previous results to dimension 5 and further. The geometry of the 5-hypercube [0,1] 5 gives a model for colour vision in the case of 5 photoreceptor types and a colour space corresponding to the combination of five primary lights. In particular, we focus on the (topologically spherical) boundary of the hypercube and on an equatorial sphere within the boundary, roughly orthogonal to the achromatic segment. In the polytopal and double-cone type spaces, we consider a tridimensional hue component; in the round Runge space we consider a 4-dimensional colourfulness component.

Paper Nr: 193
Title:

Hierarchical SNR Scalable Video Coding with Adaptive Quantization for Reduced Drift Error

Authors:

Roya Choupani, Stephan Wong and Mehmet Tolun

Abstract: In video coding, dependencies between frames are being exploited to achieve compression by only coding the differences. This dependency can potentially lead to decoding inaccuracies when there is a communication error, or a deliberate quality reduction due to reduced network or receiver capabilities. The dependency can start at the reference frame and progress through a chain of dependent frames within a group of pictures (GOP) resulting in the so-called drift error. Scalable video coding schemes should deal with such drift errors while maximizing the delivered video quality. In this paper, we present a multi-layer hierarchical structure for scalable video coding capable of reducing the drift error. Moreover, we propose an optimization to adaptively determine the quantization step size for the base and enhancement layers. In addition, we address the trade-off between the drift error and the coding efficiency. The improvements in terms of average PSNR values when one frame in a GOP is lost are 3.70(dB) when only the base layer is delivered, and 4.78(dB) when both the base and the enhancement layers are delivered. The improvements in presence of burst errors are 3.52(dB) when only the base layer is delivered, and 4.50(dB) when both base and enhancement layers are delivered.

Paper Nr: 203
Title:

A Perceptual Measure of Illumination Estimation Error

Authors:

Nikola Banić and Sven Lončarić

Abstract: The goal of color constancy is to keep colors invariant to illumination. An important group of color constancy methods are the global illumination estimation methods. Numerous such methods have been proposed and their accuracy is usually described by using statistical descriptors of illumination estimation angular error. In order to demonstrate some of their fallacies and shortages, a very simple learning-based global illumination estimation dummy method is designed for which the values of statistical descriptors of illumination estimation error can be interpreted in contradictory ways. To resolve the paradox, a new performance measures is proposed that focuses on perceptual difference between different illumination estimation errors. The effect of ground-truth illumination distribution of the benchmark datasets on method evaluation is also demonstrated.

Paper Nr: 218
Title:

Real-time Visualization of High-Dynamic-Range Infrared Images based on Human Perception Characteristics - Noise Removal, Image Detail Enhancement and Time Consistency

Authors:

Frederic Garcia, Cedric Schockaert and Bruno Mirbach

Abstract: This paper presents an image detail enhancement and noise removal method that accounts for the limitations on human’s perception to effectively visualize high-dynamic-range (HDR) infrared (IR) images. In order to represent real world scenes, IR images use to be represented by a HDR that generally exceeds the working range of common display devices (8 bits). Therefore, an effective HDR compression without loosing the perceptibility of details is needed. We herein propose a practical approach to effectively map raw IR images to 8 bit data representation. To do so, we propose an image processing pipeline based on two main steps. First, the raw IR image is split into base and detail image components using the guided filter (GF). The base image corresponds to the resulting edge-preserving smoothed image. The detail image results from the difference between the raw and base images, which is further masked using the linear coefficients of the GF, an indicator of the spatial detail. Then, we filter the working range of the HDR along time to avoid global brightness fluctuations in the final 8 bit data representation, which results from combining both detail and base image components using a local adaptive gamma correction (LAGC). The last has been designed according to the human vision characteristics. The experimental evaluation shows that the proposed approach significantly enhances image details in addition to improving the contrast of the entire image. Finally, the high performance of the proposed approach makes it suitable for real word applications.

Paper Nr: 260
Title:

Patch-based Statistical Performance Analysis of Upsampling for Precise Super–Resolution

Authors:

Djamila Aouada, Kassem Al Ismaeil and Björn Ottersten

Abstract: All existent methods for the statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias in the mean square error. In this paper we give an original derivation of the bias term. We propose to use a patch-based method inspired by the work of (Chatterjee and Milanfar, 2009). Our approach, however, is completely new as we derive a new affine bias model dedicated for the multi-frame super resolution framework. We apply the proposed statistical performance analysis to the Upsampling for Precise Super–Resolution (UP-SR) algorithm. This algorithm was shown experimentally to be a good solution for enhancing the resolution of depth sequences in both cases of global and local motions. Its performance is herein analyzed theoretically in terms of its approximated mean square error, using the proposed derivation of the bias. This analysis is validated experimentally on simulated static and dynamic depth sequences with a known ground truth. This provides an insightful understanding of the effects of noise variance, number of observed low resolution frames, and super–resolution factor on the final and intermediate performance of UP–SR. Our conclusion is that increasing the number of frames should improve the performance while the error is increased due to local motions, and to the upsampling which is part of UP-SR.

Paper Nr: 278
Title:

Range and Vision Sensors Fusion for Outdoor 3D Reconstruction

Authors:

Ghina El Natour, Omar Ait Aider, Raphael Rouveure, François Berry and Patrice Faure

Abstract: The conscience of the surrounding environment is inevitable task for several applications such as mapping, autonomous navigation and localization. In this paper we are interested by exploiting the complementarity of a panoramic microwave radar and a monocular camera for 3D reconstruction of large scale environments. Considering the robustness to environmental conditions and depth detection ability of the radar on one hand, and the high spatial resolution of a vision sensor on the other hand, makes these tow sensors well adapted for large scale outdoor cartography. Firstly, the system model of the two sensors is represented and a new 3D reconstruction method based on sensors geometry is introduced. Secondly, we address the global calibration problem which consists in finding the exact transformation between radar and camera coordinate systems. The method is based on the optimization of a non-linear criterion obtained from a set of radar-to-image target correspondences. Both methods have been validated with synthetic and real data.

Paper Nr: 299
Title:

Image-based Location Recognition and Scenario Modelling

Authors:

Carlos Orrite, Juan Soler, Mario Rodriguez, Elías Herrero and Roberto Casas

Abstract: This work presents a significant improvement of the state regarding intelligent environments developed to support the independent living of users with special needs. By automatically registering all the pictures taken by a wearable camera and using them to reconstruct the living scenario, our proposal allows tracking of a subject in living scenario, recognising the localization of new images, and contextually organize them along the time, to make feasible the subsequent context - dependent recall. This application can be useful from an entertainment point of view (in the same way we like to see old pictures) to more serious applications related to cognitive rehabilitation through recall.

Paper Nr: 304
Title:

Sensor Pattern Noise Matching Based on Reliability Map for Source Camera Identification

Authors:

Riccardo Satta

Abstract: Source camera identification using the residual noise pattern left by the sensor, or Sensor Pattern Noise, has received much attention by the digital image forensics community in recent years. One notable issue in this regard is that high-frequency components of an image (textures, edges) can be easily mistaken as being part of the SPN itself, due to the procedure used to extract SPN, which is based on adaptive low-pass filtering. In this paper, a method to cope with this problem is presented, which estimates a SPN reliability map associating a degree of reliability to each pixel, based on the amount of high-frequency content in its neighbourhood. The reliability map is then used to weight SPN pixels during matching. The technique is tested using a data set of images coming from 27 different cameras; results show a notable improvement with respect to standard, non-weighted matching.

Paper Nr: 317
Title:

Adaptive Reference Image Selection for Temporal Object Removal from Frontal In-vehicle Camera Image Sequences

Authors:

Toru Kotsuka, Daisuke Deguchi, Ichiro Ide and Hiroshi Murase

Abstract: In recent years, image inpainting is widely used to remove undesired objects from an image. Especially, the removal of temporal objects, such as pedestrians and vehicles, in street-view databases such as Google Street View has many applications in Intelligent Transportation Systems (ITS). To remove temporal objects, Uchiyama et al. proposed a method that combined multiple image sequences captured along the same route. However, when spatial alignment inside an image group does not work well, the quality of the output image of this method is often affected. For example, large temporal objects existing in only one image create regions that do not correspond to other images in the group, and the image created from aligned images becomes distorted. One solution to this problem is to select adaptively the reference image containing only small temporal objects for spatial alignment. Therefore, this paper proposes a method to remove temporal objects by integration of multiple image sequences with an adaptive reference image selection mechanism.

Paper Nr: 319
Title:

Real-time Material Transformation using Single Frame Surface Orientation Imager

Authors:

Toru Kurihara and Shigeru Ando

Abstract: In this paper, we propose real-time reflectance transformation system using correlation image sensor and four LEDs. The reflectance transformation system changes object appearance into different materials one and displays it in the monitor. We have developed real-time surface orientation imager to add specular component according to captured normal vector map for reflectance transformation. Surface orientation of the object is encoded into amplitude and phase of the reflected light intensity by using phase shifted blinking LEDs, The correlation image sensor, provided by us, demodulates those amplitude and phase in each pixel during exposure time. Therefore, the surface orientation is captured by single frame, which can be applied to moving object. We developed reflectance transformation system using surface orientation captured by our real-time surface orientation imager. We demonstrated that the system provides relighting and changing reflectance property in real-time.

Paper Nr: 328
Title:

Testing the Validity of Lamberts Law for Micro-scale Photometric Stereo Applied to Paper Substrates

Authors:

Faisal Azhar, Khemraj Emrith, Stephen Pollard, Melvyn Smith, Guy Adams and Steve Simske

Abstract: This paper presents an empirical study to investigate the use of photometric stereo (PS) for micro-scale 3D measurement of paper samples. PS estimates per-pixel surface orientation from images of a surface captured from the same viewpoint but under different illumination directions. Specifically, we investigate the surface properties of paper to test whether they are sufficiently well approximated by a Lambertian reflectance model to allow veridical surface reconstruction under PS and explore the range of conditions for which this model is valid. We present an empirical setup that is used to conduct a series of experiments in order to analyse the applicability of PS at the micro-scale. In addition, we determine the best 4, 6, and 8 light source tilt (illumination) angles with respect to multi-source micro-scale PS. Furthermore, an intensity based image registration method is used to test the accuracy of the recovery of surface normals. The results demonstrate that at the micro-scale: (a) Lambert model represents well the data sets with low root mean square (RMS) error between the original and reconstructed image, (b) increasing the light sources from 4 to 8 reduces RMS error, and (c) PS can be used to extract veridical surface normals.

Paper Nr: 335
Title:

Compressive Video Sensing with Adaptive Measurement Allocation for Improving MPEGx Performance

Authors:

George Tzagkarakis, Panagiotis Tsakalides and Jean-Luc Starck

Abstract: Remote imaging systems, such as unmanned aerial vehicles (UAVs) and terrestrial-based visual sensor networks, have been increasingly used in surveillance and reconnaissance both at the civilian and battlegroup levels. Nevertheless, most existing solutions do not adequately accommodate efficient operation, since limited power, processing and bandwidth resources is a major barrier for abandoned visual sensors and for light UAVs, not well addressed by MPEGx compression standards. To cope with the growing compression ratios, required for all remote imaging applications to minimize the payloads, existing MPEGx compression profiles may result in poor image quality. In this paper, the inherent property of compressive sensing, acting simultaneously as a sensing and compression framework, is exploited to built a compressive video sensing (CVS) system by modifying the standard MPEGx structure, such as to cope with the limitations of a resource-restricted visual sensing system. Besides, an adaptive measurement allocation mechanism is introduced, which is combined with the CVS approach achieving an improved performance when compared with the basic MPEG-2 standard.

Paper Nr: 342
Title:

Unsupervised Rib Delineation in Chest Radiographs by an Integrative Approach

Authors:

B. Buket Ogul, Emre Sümer and Hasan Ogul

Abstract: We address the problem of segmenting ribs in a chest radiography image as an intermediate step for eliminating rib shadows for an effective Computer-Aided Diagnosis System (CAD). To this end, we introduce a complete framework that takes an unprocessed x-ray image and reports the entire rib regions. The system offers a novel strategy to fit a parabola curve to all rib seeds obtained through a log Gabor filtering approach and extend the center curve by a problem-specific region growing technique to delineate the entire rib, which does not necessarily follow a general parabolic model of rib cage. The visual examinations of predicted rib delineations in a common dataset have demonstrated that the system can achieve a reasonably good performance to be used in practice.

Paper Nr: 344
Title:

SymPaD: Symbolic Patch Descriptor

Authors:

Sinem Aslan, Ceyhun Burak Akgül, Bülent Sankur and Turhan Tunalı

Abstract: We propose a new local image descriptor named SymPaD for image understanding. SymPaD is a probability vector associated with a given image pixel and represents the attachment of the pixel to a previously designed shape repertoire. As such the approach is model-driven. The SymPad descriptor is illumination and rotation invariant, and extremely flexible on extending the repertoire with any parametrically generated geometrical shapes and any desired additional transformation types.

Posters
Paper Nr: 47
Title:

Verification Approach of Statechart Models based on Property Statechart

Authors:

Lina Chen, Yu Zhang and Jianmin Zhao

Abstract: In this paper, we introduced a new approach to verify Statechart models. The core of this approach is property Statechart which is strengthened property specification language. Before verification, we reconstructed property into one big property tree by use the before-and-after and concurrent relationships among them. In the process of verification, the state space of Statechart models is unfolded step by step. To verify a part of properties we only need to search part of state space instead of the whole state space and make sure that the statuses included by this part of state space meet the propositions specified in the corresponding nodes in the property tree. So the verification can be more efficient. Then we discuss how to verify reactive systems using our ideas. Finally, a case study illustrates the approach presented.

Paper Nr: 63
Title:

Fractal Image Compression using Hierarchical Classification of Sub-images

Authors:

Nilavra Bhattacharya, Swalpa Kumar Roy, Utpal Nandi and Soumitro Banerjee

Abstract: In fractal image compression (FIC) an image is divided into sub-images (domains and ranges), and a range is compared with all possible domains for similarity matching. However this process is extremely time-consuming. In this paper, a novel sub-image classification scheme is proposed to speed up the compression process. The proposed scheme partitions the domain pool hierarchically, and a range is compared to only those domains which belong to the same hierarchical group as the range. Experiments on standard images show that the proposed scheme exponentially reduces the compression time when compared to baseline fractal image compression (BFIC), and is comparable to other sub-image classification schemes proposed till date. The proposed scheme can compress Lenna (512x512x8) in 1.371 seconds, with 30.6 dB PSNR decoding quality (140x faster than BFIC), without compromising compression ratio and decoded image quality.

Paper Nr: 68
Title:

Enhancement of Degraded Images by Natural Phenomena

Authors:

Daily Daleno de O. Rodrigues, Anderson G. Fontoura, José R. Hughes Carvalho, José P. de Queiroz Neto and Renato P. Vieira

Abstract: The efficiency of environmental monitoring through imagery data is strongly dependent on the quality of the acquired information, despite weather conditions or other uncontrolled degradation factor. This article describes a series of combined techniques of image enhancement to partially recover information “lost” due to unfavorable operational conditions or natural phenomena, such as: fog, rainstorms, underwater dust (green dust), poor illumination, etc. We based our approach on a process known as homomorphic filtering, which is intrinsically related to the transformation from the spatial to the frequency domains, directly involving the Fourier Transforms, followed by specific enhancement techniques, such as Clipping and Stretching. Although, the use of these techniques separately, without the proper adaptation and coupling, can result in damaging even more the image, the authors developed an efficient sequence of enhanced filtering able to recover most of the affected information. Moreover, the proposed methodology proved to be generally applicable to a large class of images in poor conditions, with a performance comparable to the methodology used as benchmarks.

Paper Nr: 131
Title:

Simultaneous Frame-rate Up-conversion of Image and Optical Flow Sequences

Authors:

Shun Inagaki, Hayato Itoh and Atsushi Imiya

Abstract: We develop a variational method for the frame-rate up-conversion of optical flow fields, in which we combine motion coherency in an image sequence and the smoothness of the temporal flow field. Since optical flow vectors define the motion of each point in an image, we can construct interframe images from low frame-rate image sequences using flow field vectors. The algorithm produces both interframe images and optical flow fields from a set of successive images in a sequence.

Paper Nr: 133
Title:

Monitoring Accropodes Breakwaters using RGB-D Cameras

Authors:

D. Moltisanti, G. M. Farinella, R. E. Musumeci, E. Foti and S. Battiato

Abstract: Breakwaters are marine structures useful for the safe harbouring of ships and the protection of harbours from sedimentation and coasts from erosion. Breakwater monitoring is then a critical part of coastal engineering research, since it is crucial to know the state of a breakwater at any time in order to evaluate its health in terms of stability and plan restoration works. In this paper we present a novel breakwaters monitoring approach based on the analysis of 3D point clouds acquired with RGB-D cameras. The proposed method is able to estimate roto-translation movements of the Accropodes, building the armour layer of a breakwater, both under and above still water level, without any need of human interaction. We tested the proposed monitoring method with several laboratory experiments. The experiments consisted of the hitting of a scale model barrier by waves in a laboratory tank, aiming to asses the robustness of a particular configuration of the breakwater (that is, the arrangement of the Accropodes building the structure). During tests, several 3D depth maps of the barrier have been taken with a RGB-D camera. These point clouds, hence, have been processed to compute roto-translation movement, in order to monitor breakwater conditions and estimate its damage over time.

Paper Nr: 162
Title:

Adaptive Segmentation by Combinatorial Optimization

Authors:

Lakhdar Grouche and Aissa Belmeguenai

Abstract: In this paper we present an iterative segmentation. At the beginning it is using a stochastic method called Kangaroo in order to speed up the regions construction. Later the problem will be presented as non-oriented graph then reconstructed by linear software as entire number. Next, we use the combinatorial optimization to solve the system into entire number. Finally, the impact of this solution became apparent by segmentation, in which the edges are marked with special manner; hence the results are very encouraging.

Paper Nr: 169
Title:

The Evolution of Effective Image Denoising Filter using CodeMonkey-GA

Authors:

Reza Etemadi and Nawwaf Kharma

Abstract: CodeMonkey (CM) is a platform that enables experts and novices to quickly generate and execute Evolutionary Algorithm programs. The flexibility and ease of use of CM makes it a useful tool for evolving image filters that are adapted to the problem set rather than generic filters that are indifferent to underlying patterns. In this paper, we use CM to evolve salt-and-pepper de-noising filters then we compare them against well-known commercial filters, to exhibit their superiority.

Paper Nr: 188
Title:

A Novel Stereo-radiation Detection Device Calibration Method using Planar Homography

Authors:

Pathum Rathnayaka, Seung-Hae Baek and Soon-Yong Park

Abstract: A radiation detection device, also known as a particle detector, is a device used to detect, track and identify the presence of radiation sources within a given area or environment. In general, a stereo-radiation detection device consists with two radiation detection devices and used to estimate 3D distances to radiation sources accurately. In computer vision, device calibration is more important and to obtain accurate results using such devices, they have to be calibrated first. Many stereo camera calibration methods have been introduced throughout the last few decades but a proper stereo radiation device calibration method has not yet been introduced. In this work, we propose a new stereo-radiation detector calibration method using planar Homography. The calibrated devices are used to estimate 3D distances to radiation sources and we obtained very accurate results with an error of less than 6%.

Paper Nr: 196
Title:

Particle Filter using Motion Direction for Vehicle Tracking

Authors:

M. Eren Yildirim, O. Faruk Ince, Jong Kwan Song, Jang Sik Park and Byung Woo Yoon

Abstract: In this paper, a new approach for particle filter (PF) is presented. Color histogram information is used for defining the state model of target. Since we deal with vehicle tracking problem, the information of direction of the vehicle is important. This algorithm has several steps. First one is obtaining the motion direction of the target object. The second one is calculating the angle differences between the direction and each candidate sample. In the last step, the probability of each sample will be weighted according to its angular distance to the motion direction of the target. So, the samples moving in the similar direction with the target, will get larger weights, and increase the probabilities of estimating the state parameters of the target. By using this algorithm, PF became more stable and robust against noises and occlusions. We showed that the proposed PF increases the tracking performance and robustness against noises where as the computational burden is decreased.

Paper Nr: 202
Title:

Color Dog - Guiding the Global Illumination Estimation to Better Accuracy

Authors:

Nikola Banic and Sven Loncaric

Abstract: An important part of image enhancement is color constancy, which aims to make image colors invariant to illumination. In this paper the Color Dog (CD), a new learning-based global color constancy method is proposed. Instead of providing one, it corrects the other methods’ illumination estimations by reducing their scattering in the chromaticity space by using a its previously learning partition. The proposed method outperforms all other methods on most high-quality benchmark datasets. The results are presented and discussed.

Paper Nr: 226
Title:

Unmixing of Hyperspectral Images with Pure Prior Spectral Pixels

Authors:

Abir Zidi, Julien Marot, Klaus Spinnler and Salah Bourennane

Abstract: In the literature, there are several methods for multilinear source separation. We find the most popular ones such as nonnegative matrix factorization (NMF), canonical polyadic decomposition (PARAFAC). In this paper, we solved the problem of the hyperspectral imaging with NMF algorithm. We based on the physical property to improve and to relate the output endmembers spectra to the physical properties of the input data. To achieve this,we added a regularization which enforces the closeness of the output endmembers to automatically selected reference spectra. Afterwards we accounted for these reference spectra and their locations in the initialization matrices. To illustrate our methods, we used self-acquired hyperspectral images (HSIs). The first scene is compound of leaves at the macroscopic level. In a controlled environment, we extract the spectra of three pigments. The second scene is acquired from an airplane: We distinguish between vegetation, water, and soil.

Paper Nr: 229
Title:

Random Initial Search Points Prediction for Content Aware Motion Estimation in H.264

Authors:

Vidya N. More, Ajinkya Deshmukh, Dhiraj More and M. S. Sutaone

Abstract: Motion estimation algorithms used in video encoders are based on three important issues: selection of good initial search points, choice of appropriate search pattern and effective early termination criteria at different stages in algorithm.Motion vector prediction is also treated as initial search point prediction, in which possibility of good match block is predicted. Prediction is based on prior data from co-located and/or adjacent macroblocks from reference frame or current video frame respectively.Different search patterns contribute in achieving near accurate motion estimation. Different types of motion in real time videos can be tracked using different types of patterns. Early termination criteria at different stages in algorithm, avoid search at further possible locations which are pre-decided by pattern of search. This in turns reduces computations and motion estimation time.Proposed algorithm is combination of two concepts, content awareness and initial point prediction. Contents of video data is in terms of homogeneity coefficients. Initial search point prediction is used to avoid the search trapping into local minima.The algorithm is implemented on Reference Software of JM18.4 of H.264/AVC revised on 5th May 2011. The results of the implemented algorithm show that the total time taken for encoding and motion estimation time are less as compared with other algorithms for the videos of different resolutions.

Paper Nr: 254
Title:

An Efficient Image Registration Method based on Modified NonLocal-Means - Application to Color Business Document Images

Authors:

Louisa Kessi, Frank Lebourgeois and Christophe Garcia

Abstract: Most of business documents, in particular invoices, are composed of an existing color template and an added filled-in text by the users. The direct layout analysis without separating the preprinted form from the added text is difficult and not efficient. Previous works use both local features and global layout knowledge to separate the pre-printed forms and the added text. Although for real applications, they are even exposed to a great improvement. This paper presents the first pixel-based image registration of color business documents based on the NonLocal-Means (NLM) method. We prove that the NLM, commonly used for image denoising, can be also adapted to images registration at the pixel level. Our intuition tends to look for a similar neighbourhood from the first image I1 into the second image I2 and provide both an exact image registration with a precision at pixel level and noise removal. We show the feasibility of this approach on several color images of various invoices and forms in real situation and its application to the layout analysis. Applied on color documents, the proposed algorithm shows the benefits of the NLM in this context.

Paper Nr: 258
Title:

AColDPS - Robust and Unsupervised Automatic Color Document Processing System

Authors:

Louisa Kessi, Frank Lebourgeois, Christophe Garcia and Jean Duong

Abstract: This paper presents the first fully automatic color analysis system suited for business documents. Our pixel-based approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is Tree-fold. Firstly, it is the usage of color morphology to simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text automatically using conditional color dilation and erosion, even in cases where there are overlaps between the two. Secondly, it is the extraction of geodesic measures using morphological convolution in order to separate text, noise and graphical elements. Thirdly, we develop a method to disconnect characters touching or overlapping graphical elements. Our system can separate characters that touch straight lines, split overlapped characters with different colors and separate characters from graphics if they have different colors. A color analysis stage automatically calculates the number of character colors. The proposed system is generic enough to process a wide range of images of digitized business documents from different origins. It outperforms the classical approach that uses binarization of greyscale images.

Paper Nr: 264
Title:

Low Complexity Multi-object Tracking System Dealing with Occlusions

Authors:

Aziz Dziri, Marc Duranton and Roland Chapuis

Abstract: In this paper, we propose a vision tracking system primarily targeted for systems with low computing resources. It is based on GMPHD filter and can deal with occlusion between objects. The proposed algorithm is supposed to work in a node of camera network where the cost of the computer processing the information is critical. To achieve a low computing complexity, a basic background subtraction algorithm combined with a connected component analysis method are used to detect the objects of interest. GMPHD was improved to detect occlusions between objects and to handle their identities once the occlusion ends. The occlusion is detected using a low complexity distance criterion that takes into consideration the object’s bounding box. When an occlusion is noticed, the features of the overlapped objects are saved. At the end of the overlapping, the extracted features are compared to the current features of the objects to perform the object reidentification. In our experiments two different features are tested: color histogram features and motion features. The experiments are performed on two datasets: PETS2009 and CAVIAR. The obtained results show that our approach ensures a high improvement of GMPHD filter and has a low computing complexity.

Paper Nr: 283
Title:

Knowledge Bases for Visual Dynamic Scene Understanding

Authors:

Ernst D. Dickmanns

Abstract: In conventional computer vision the actual 3-D state of objects is of primary interest; it is embedded in a temporal sequence analyzed in consecutive pairs. In contrast, in the 4-D approach to machine vision the primary interest is in temporal processes with objects and subjects (defined as objects with the capability of sensing and acting). All perception of 4-D processes is achieved through feedback of prediction errors according to spatiotemporal dynamical models constraining evolution over time. Early jumps to object/subject-hypotheses including capabilities of acting embed the challenge of dynamic scene understanding into a richer environment, especially when competing alternatives are pursued in parallel from beginning. Typical action sequences (maneuvers) form an essential part of the knowledge base of subjects. Expectation-based Multi-focal Saccadic (EMS-) vision has been developed in the late 1990s to demonstrate the advantages and flexibility of this approach. Based on this experience, the paper advocates knowledge elements integrating action processes of subjects as general elements for perception and control of temporal changes, dubbed ‘maneuvers’ here. − As recently discussed in philosophy, emphasizing individual subjects and temporal processes may avoid the separation into a material and a mental world; EMS-vision quite naturally leads to such a monistic view.

Paper Nr: 306
Title:

Geometry-based Superpixel Segmentation - Introduction of Planar Hypothesis for Superpixel Construction

Authors:

M.-A. Bauda, S. Chambon, P. Gurdjos and V. Charvillat

Abstract: Superpixel segmentation is widely used in the preprocessing step of many applications. Most of existing methods are based on a photometric criterion combined to the position of the pixels. In the same way as the Simple Linear Iterative Clustering (SLIC) method, based on k-means segmentation, a new algorithm is introduced. The main contribution lies on the definition of a new distance for the construction of the superpixels. This distance takes into account both the surface normals and a similarity measure between pixels that are located on the same planar surface. We show that our approach improves over-segmentation, like SLIC, i.e. the proposed method is able to segment properly planar surfaces.

Paper Nr: 345
Title:

Acquisition of Aerial Light Fields

Authors:

Indrajit Kurmi and K. S. Venkatesh

Abstract: Since its inception in computer graphics community light field has drawn a lot of attention and interest. By densly sampling the plenoptic function light fields present an alternative way to represent and produce a faithful reconstruction of 3D scenes. But acquisition of densely sampled light fields require camera arrays, robotic arms or newly developed plenoptic cameras. The light fields captured using the existential technologies are limited to scenes containing limited complexity. In this paper we propose to use unmanned aerial vehicle for acquisition of larger unstructured aerial light fields. We aim to capture light fields of larger objects and scenes which are not possible by traditional light field acquisition setup. We combine the data from IMU and state estimated using homography with a Kalman filter framework. Frames which gives a minimum error (approximation of free form camera surface to traditional parameterization) are selected as perspective images of light fields. Rendering algorithm is devised to support the unstructured camera surface and to avoid rebinning of image data.

Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 24
Title:

Adaptive Segmentation based on a Learned Quality Metric

Authors:

Iuri Frosio and Ed R. Ratner

Abstract: We introduce here a model for the evaluation of the segmentation quality of a color image. The model parameters were learned from a set of examples. To this aim, we first segmented a set of images using a traditional graph-cut algorithm, for different values of the scale parameter. A human observer classified these images into three classes: under-, well- and over-segmented. This classification was employed to learn the parameters of the segmentation quality model. This was used to automatically optimize the scale parameter of the graph-cut segmentation algorithm, even at a local scale. Experimental results show an improved segmentation quality for the adaptive algorithm based on our segmentation quality model, which can be easily applied to a wide class of segmentation algorithms.

Paper Nr: 83
Title:

Spectral Fiber Feature Space Evaluation for Crime Scene Forensics - Traditional Feature Classification vs. BioHash Optimization

Authors:

Christian Arndt, Jana Dittmann and Claus Vielhauer

Abstract: Despite of ongoing improvements in the field of digitized crime scene forensics, a lot of analysis work is still done manually by trained experts. In this paper, we derive and define a 2048 dimensional fiber feature space from a spectral scan with a wavelength range of 163 - 844 nm sampled with FRT thin film reflectometer (FTR). Furthermore, we perform an evaluation of seven commonly used classifiers (Naive Bayes, SMO, IBk, Bagging, Rotation Forest, JRip, J48) in combination with a proven concept from the biometric field of user authentication called Biometric Hash algorithm (BioHash). We perform our evaluation in two well-known forensic examination goals: identification - determining the broad fiber group (e.g. acrylic) and individualization - finding the concrete textile originator. Our experimental test set considers 50 different fibers, each sampled in four scan resolutions of: 100; 50; 20; 10 μm. Overall, 800 digital samples are measured. For both examination goals we can show that despite the Naive Bayes all classifiers show a positive classification tendency (80 - 99%), whereby the BioHash optimization performs best for individualization tasks.

Paper Nr: 171
Title:

Simultaneous Estimation of Spectral Reflectance and Normal from a Small Number of Images

Authors:

Masahiro Kitahara, Takahiro Okabe, Christian Fuchs and Hendrik P. A. Lensch

Abstract: Spectral reflectance is inherent characteristics of an object surface and therefore useful not only for computer vision tasks such as material classification but also compute graphics applications such as relighting. In this study, by integrating multispectral imaging and photometric stereo, we propose a method for simultaneously estimating the spectral reflectance and normal per pixel from a small number of images taken under multispectral and multidirectional light sources. In addition, taking attached shadows observed on curved surfaces into consideration, we derive the minimum number of images required for the simultaneous estimation and propose a method for selecting the optimal set of light sources. Through a number of experiments using real images, we show that our proposed method can estimate spectral reflectances without the ambiguity of per-pixel scales due to unknown normals, and that, when the optimal set of light sources is used, our method performs as well as the straightforward method using a large number of images. Moreover, we demonstrated that estimating both the spectral reflectances and normals is useful for relighting under novel illumination conditions.

Paper Nr: 175
Title:

Bag-of-Features based Activity Classification using Body-joints Data

Authors:

Parul Shukla, K.K. Biswas and Prem K. Kalra

Abstract: In this paper, we propose a Bag-of-Joint-Features model for the classification of human actions from body-joints data acquired using depth sensors such as Microsoft Kinect. Our method uses novel scale and translation invariant features in spherical coordinate system extracted from the joints. These features also capture the subtle movements of joints relative to the depth axis. The proposed Bag-of-Joint-Features model uses the well known bag-of-words model in the context of joints for the representation of an action sample. We also propose to augment the Bag-of-Joint-Features model with a Hierarchical Temporal histogram model to take into account the temporal information of the body-joints sequence. Experimental study shows that the augmentation improves the classification accuracy. We test our approach on theMSR-Action3D and Cornell activity datasets using support vector machine.

Paper Nr: 185
Title:

Automatic Perceptual Color Quantization of Dermoscopic Images

Authors:

Vittoria Bruni, Giuliana Ramella and Domenico Vitulano

Abstract: The paper presents a novel method for color quantization (CQ) of dermoscopic images. The proposed method consists of an iterative procedure that selects image regions in a hierarchical way, according to the visual importance of their colors. Each region provides a color for the palette which is used for quantization. The method is automatic, image dependent and computationally not demanding. Preliminary results show that the mean square error of quantized dermoscopic images is competitive with existing CQ approaches.

Short Papers
Paper Nr: 4
Title:

Geometric Edge Description and Classification in Point Cloud Data with Application to 3D Object Recognition

Authors:

Troels Bo Jørgensen, Anders Glent Buch and Dirk Kraft

Abstract: This paper addresses the detection of geometric edges on 3D shapes. We investigate the use of local point cloud features and cast the edge detection problem as a learning problem. We show how supervised learning techniques can be applied to an existing shape description in terms of local feature descriptors. We apply our approach to several well-known shape descriptors. As an additional contribution, we develop a novel shape descriptor, termed Equivalent Circumference Surface Angle Descriptor or ECSAD, which is particularly suitable for capturing local surface properties near edges. Our proposed descriptor allows for both fast computation and fast processing by having a low dimension, while still producing highly reliable edge detections. Lastly, we use our features in a 3D object recognition application using a well-established benchmark. We show that our edge features allow for significant speedups while achieving state of the art results.

Paper Nr: 25
Title:

LDA Combined Depth Similarity and Gradient Features for Human Detection using a Time-of-Flight Sensor

Authors:

Alexandros Gavriilidis, Carsten Stahlschmidt, Jörg Velten and Anton Kummert

Abstract: Visual object detection is an important task for many research areas like driver assistance systems (DASs), industrial automation and various safety applications with human interaction. Since detection of pedestrians is a growing research area, different kinds of visual methods and sensors have been introduced to overcome this problem. This paper introduces new relational depth similarity features (RDSF) for the pedestrian detection using a Time-of-Flight (ToF) camera sensor. The new features are based on mean, variance, skewness and kurtosis values of local regions inside the depth image generated by the Time-of-Flight sensor. An evaluation between these new features, already existing relational depth similarity features using depth histograms of local regions and the well known histogram of oriented gradients (HOGs), which deliver very good results in the topic of pedestrian detection, will be presented. To incorporate more dimensional feature spaces, an existing AdaBoost algorithm, which uses linear discriminant analysis (LDA) for feature space reduction and new combination of already extracted features in the training procedure, will be presented too.

Paper Nr: 32
Title:

Automatic ROI for Remote Photoplethysmography using PPG and Color Features

Authors:

Elisa Calvo-Gallego and Gerard de Haan

Abstract: Remote photoplethysmography (rPPG) enables contact-less monitoring of the blood volume pulse using a regular camera, thus providing valuable information about the cardiovascular system. However, the quality of the acquired rPPG signal is strongly affected by the region of skin where the analysis is carried out and, therefore, to be confident of obtaining valid information, a pre-selection of the region-of-interest (ROI) for the PPG analysis is necessary. In this paper, we propose a method for the automatic extraction of this ROI combining the local characteristics of the PPG-signal with the color information using fuzzy logic. Results of the quality of the ROI extraction and its application on pulse rate detection are provided.

Paper Nr: 40
Title:

Faster Approximations of Shortest Geodesic Paths on Polyhedra Through Adaptive Priority Queue

Authors:

William Robson Schwartz, Pedro Jussieu Rezende and Helio Pedrini

Abstract: Computing shortest geodesic paths is a crucial problem in several application areas, including robotics, medical imaging, terrain navigation and computational geometry. This type of computation on triangular meshes helps to solve different tasks, such as mesh watermarking, shape classification and mesh parametrization. In this work, a priority queue based on a bucketing structure is applied to speed up graph-based methods that approximates shortest geodesic paths on polyhedra. Initially, the problem is stated, some of its properties are discussed and a review of relevant methods is presented. Finally, we describe the proposed method and show several results and comparisons that confirm its benefits.

Paper Nr: 66
Title:

An eXtended Center-Symmetric Local Binary Pattern for Background Modeling and Subtraction in Videos

Authors:

Caroline Silva, Thierry Bouwmans and Carl Frélicot

Abstract: In this paper, we propose an eXtended Center-Symmetric Local Binary Pattern (XCS-LBP) descriptor for background modeling and subtraction in videos. By combining the strengths of the original LBP and the similar CS ones, it appears to be robust to illumination changes and noise, and produces short histograms, too. The experiments conducted on both synthetic and real videos (from the Background Models Challenge) of outdoor urban scenes under various conditions show that the proposed XCS-LBP outperforms its direct competitors for the background subtraction task.

Paper Nr: 77
Title:

Contextual Saliency for Nonrigid Landmark Registration and Recognition of Natural Patterns

Authors:

Luke Palmer and Tilo Burghardt

Abstract: In this paper we develop a method for injecting within-pattern information into the matching of point patterns through utilising the shape context descriptor in a novel manner. In the domain of visual animal biometrics, landmark distributions on animal coats are commonly used as characteristic features in the pursuit of individual identification and are often derived by imaging surface entities such as bifurcations in scales, fur colouring, or skin ridge minutiae. However, many natural distributions of landmarks are quasiregular, a property with which state-of-the-art registration algorithms have difficulty. The method presented here addresses the issue by guiding matching along the most distinctive points within a set based on a measure we term contextual saliency. Experiments on synthetic data are reported which show the contextual saliency measure to be tolerant of many point-set transformations and predictive of correct correspondence. A general point-matching algorithm is then developed which combines contextual saliency information with naturalistic structural constraints in the form of the thin-plate spline. When incorporated as part of a recognition system, the presented algorithm is shown to outperform two widely used point-matching algorithms on a real-world manta ray data set.

Paper Nr: 104
Title:

Color Object Recognition based on Spatial Relations between Image Layers

Authors:

Michaël Clément, Mickaël Garnier, Camille Kurtz and Laurent Wendling

Abstract: The recognition of complex objects from color images is a challenging task, which is considered as a keystep in image analysis. Classical methods usually rely on structural or statistical descriptions of the object content, summarizing different image features such as outer contour, inner structure, or texture and color effects. Recently, a descriptor relying on the spatial relations between regions structuring the objects has been proposed for gray-level images. It integrates in a single homogeneous representation both shape information and relative spatial information about image layers. In this paper, we introduce an extension of this descriptor for color images. Our first contribution is to consider a segmentation algorithm coupled to a clustering strategy to extract the potentially disconnected color layers from the images. Our second contribution relies on the proposition of new strategies for the comparison of these descriptors, based on structural layers alignments and shape matching. This extension enables to recognize structured objects extracted from color images. Results obtained on two datasets of color images suggest that our method is efficient to recognize complex objects where the spatial organization is a discriminative feature.

Paper Nr: 119
Title:

A Self-adaptation Method for Human Skin Segmentation based on Seed Growing

Authors:

Anderson Carlos Sousa e Santos and Helio Pedrini

Abstract: Human skin segmentation has several applications in image and video processing fields, whose main purpose is to distinguish image portions between skin and non-skin regions. Despite the large number of methods available in the literature, accurate skin segmentation is still a challenging task. Many methods rely on color information, which does not completely discriminate the image regions due to variations in lighting conditions and ambiguity between skin and background color. Therefore, there is still need to adapt the segmentation to particular conditions of the images. In contrast to the methods that rely on faces, hands or any other body content detector, we describe a self-contained method for adaptive skin segmentation that makes use of spatial analysis to produce regions from which the overall skin can be estimated. A comparison with state-of-the-art methods using a well known challenging data set shows that our method provides significant improvement on the skin segmentation.

Paper Nr: 120
Title:

Scalable and Iterative Image Super-resolution using DCT Interpolation and Sparse Representation

Authors:

Saulo R. S. Reis and Graça Bressan

Abstract: In a scenario where acquisition systems have limited resources or available images do not have good quality, super-resolution (SR) techniques are an excellent alternative for improving the image quality. The traditional SR methods proposed in the literature are effective in HR image reconstruction to a magnification factor up to 2. In recent years, example-based SR methods have shown excellent results in the HR image reconstruction to magnification factor 3 or more. In this paper, we propose a scalable and iterative algorithm for single-image SR using a two-step strategy with DCT interpolation and the sparse-based learning method. The method proposed implements some improvements in the dictionary training and the reconstruction process. A new dictionary is built by using an unsharp mask technique for feature extraction. The idea is to reduce the learning time by using two different small dictionaries. The results were compared with others interpolation-based and SR methods and demonstrated the effectiveness of the algorithm proposed in terms of PSNR, SSIM and Visual Quality.

Paper Nr: 129
Title:

Ensemble Learning Optimization for Diabetic Retinopathy Image Analysis

Authors:

Hanan S. Alghamdi, Lilian Tang and Yaochu Jin

Abstract: Ensemble Learning has been proved to be an effective solution to learning problems. Its success is mainly dependent on diversity. However, diversity is rarely evaluated and explicitly used to enhance the ensemble performance. Diabetic Retinopathy (DR) automatic detection is one of the important applications to support the health care services. In this research, some existing statistical diversity measures were utilized to optimize ensembles used to detect DR related signs. Ant Colony Optimization (ACO) algorithm is adopted to select the ensemble base models using various criteria. This paper evaluates several optimized and non-optimized ensemble structures used for vessel segmentation. The results demonstrate the necessity of adopting the ensemble learning and the advantage of ensemble optimization to support the DR related signs detection.

Paper Nr: 145
Title:

Learning Visual Odometry with a Convolutional Network

Authors:

Kishore Konda and Roland Memisevic

Abstract: We present an approach to predicting velocity and direction changes from visual information (”visual odometry”) using an end-to-end, deep learning-based architecture. The architecture uses a single type of computational module and learning rule to extract visual motion, depth, and finally odometry information from the raw data. Representations of depth and motion are extracted by detecting synchrony across time and stereo channels using network layers with multiplicative interactions. The extracted representations are turned into information about changes in velocity and direction using a convolutional neural network. Preliminary results show that the architecture is capable of learning the resulting mapping from video to egomotion.

Paper Nr: 149
Title:

Conics Detection Method based on Pascal’s Theorem

Authors:

Musfequs Salehin, Lihong Zheng and Junbin Gao

Abstract: This paper presents a novel conics detection method that can be applied for real images. The existing methods usually detect either circular or elliptical, or parabolic shape at one operation. Most of them need the information about center, radius, major axis, minor axis, vertex, and more. In our proposed method, the tangents on curve segments, conic parts, and conics are constructed using Pascal’s theorem. The conic parts can be used to detect different types of conic sections from an image. The performance of the proposed method has been tested on the sample images selected from Caltech-256 database and various types of conic sections can be identified from the real images compared to other method.

Paper Nr: 170
Title:

Multifractal Texture Analysis using a Dilation-based Hölder Exponent

Authors:

Joao Batista Florindo, Odemir Martinez Bruno and Gabriel Landini

Abstract: We present an approach to extract descriptors for the analysis of grey-level textures in images. Similarly to the classical multifractal analysis, the method subdivides the texture into regions according to a local Hölder exponent and computes the fractal dimension of each subset. However, instead of estimating such exponents (by means of the mass-radius relation, wavelet leaders, etc.) we propose using a local version of Bouligand-Minkowski dimension. At each pixel in the image, this approach provides a scaling relation which fits better to what is expected from a multifractal model than the direct use of the density function. The performance of the classification power of the descriptors obtained with this method was tested on the Brodatz image database and compared to other previously published methods used for texture classification. Our method outperforms other approaches confirming its potential for texture analysis.

Paper Nr: 177
Title:

Motion Compensated Temporal Image Signature Approach

Authors:

Haroon Qureshi and Markus Ludwig

Abstract: Detecting salient regions in a temporal domain is indeed a challenging problem. The problem gets trickier when there is a moving object in a scene and becomes even more complex in the presence of camera motion. The camera motion can influence saliency detection as on one side, it can provide important information about the location of moving object. On the other side, camera motion can also lead to wrong estimation of salient regions. Therefore it is very important to handle this issue more sensible. This paper provides a solution to this issue by combining a saliency detection approach with motion estimation approach. This further extends the Temporal Image Signature (TIS) (Qureshi, 2013) approach to the more complex level where not only object motion is considered but also camera motion influence is compensated.

Paper Nr: 180
Title:

Fully Automatic Deformable Model Integrating Edge, Texture and Shape - Application to Cardiac Images Segmentation

Authors:

Clément Beitone, Christophe Tilmant and Frederic Chausse

Abstract: This article presents a fully automatic left ventricle (LV) segmentation method on MR images by means of an implicit deformable model (Level Set) in a variational context. For these parametrizations, the degrees of freedom are: initialization and functional energy. The first is often delegated to the practician. To avoid this human intervention, we present an automatic initialisation method based on the Hough transform exploiting spatio-temporal information. Generally, energetic functionals integrate edges, regions and shape terms. We propose to bundle an edge-based energy computed by feature asymmetry on the monogenic signal, a regionbased energy capitalizing on image statistics (Weibull model) and a shape-based energy constrained by the myocardium thickness. The presence of multiple tissues implies data non-stationarity. To best estimate distribution parameters over the regions and regarding anatomy, we propose a deformable model maximizing locally and globally the log-likelihood. Finally, we evaluate our method on MICCAI 09 Challenge data.

Paper Nr: 194
Title:

New Method for Evaluation of Video Segmentation Quality

Authors:

Mahmud Abdulla Mohammad, Ioannis Kaloskampis and Yulia Hicks

Abstract: Segmentation is an important stage in image/video analysis and understanding. There are many different approaches and algorithms for image/video segmentation, hence their evaluation is also important in order to assess the quality of segmentation results. Nonetheless, so far there was little research aimed specifically at evaluation of video segmentation quality. In this article, we propose the criteria of good quality of video segmentation suitable for assessment of video segmentations by including a requirement for temporal region consistency. We also propose a new method for evaluation of video segmentation quality on the basis of the proposed criteria. The new method can be used both for supervised and unsupervised evaluation. We designed a test video set specifically for evaluation of our method and evaluated the proposed method using both this set and segmentations of real life videos. We compared our method against a state of the art supervised evaluation method. The comparison showed that our method is better at evaluation of perceptual qualities of video segmentations as well as at highlighting certain defects of video segmentations.

Paper Nr: 213
Title:

Low Level Features for Quality Assessment of Facial Images

Authors:

Arnaud Lienhard, Patricia Ladret and Alice Caplier

Abstract: An automated system that provides feedback about aesthetic quality of facial pictures could be of great interest for editing or selecting photos. Although image aesthetic quality assessment is a challenging task that requires understanding of subjective notions, the proposed work shows that facial image quality can be estimated by using low-level features only. This paper provides a method that can predict aesthetic quality scores of facial images. 15 features that depict technical aspects of images such as contrast, sharpness or colorfulness are computed on different image regions (face, eyes, mouth) and a machine learning algorithm is used to perform classification and scoring. Relevant features and facial image areas are selected by a feature ranking technique, increasing both classification and regression performance. Results are compared with recent works, and it is shown that by using the proposed low-level feature set, the best state of the art results are obtained.

Paper Nr: 217
Title:

CORE: A COnfusion REduction Algorithm for Keypoints Filtering

Authors:

Emilien Royer, Thibault Lelore and Frédéric Bouchara

Abstract: In computer vision, extracting keypoints and computing associated features is the first step for many applications such as object recognition, image indexation, super-resolution or stereo-vision. In many cases, in order to achieve good results, pre or post-processing are almost mandatory steps. In this paper, we propose a generic pre-filtering method for floating point based descriptors which address the confusion problem due to repetitive patterns. We sort keypoints by their unicity without taking into account any visual element but the feature vectors’s statistical properties thanks to a kernel density estimation approach. Even if highly reduced in number, results show that keypoints subsets extracted are still relevant and our algorithm can be combined with classical post-processing methods.

Paper Nr: 242
Title:

Pulse Reformation Algorithm for Leakage of Connected Operators

Authors:

Gene Stoltz and Inger Fabris-Rotelli

Abstract: The Discrete Pulse Transform (DPT) is a hierarchical decomposition of a signal in n-dimensions, built from iteratively applying the LULU operators. The DPT is a fairly new mathematical framework with minimal application and is prone to leakage within the domain, as are most other connected operators. Leakage is the unwanted union of two connected sets and thus provides false connectedness information regarding the data. The Pulse Reformation Framework (PRF) is developed to address the leakage problem within the DPT. It was specifically tested with circular probes and showed successful object extraction of blood cells.

Paper Nr: 259
Title:

Automatic Pharynx Segmentation from MRI Data for Obstructive Sleep Apnea Analysis

Authors:

Muhammad Laiq Ur Rahman Shahid, Teodora Chitiboi, Tatyana Ivanovska, Vladimir Molchanov, Henry Völzke, Horst K. Hahn and Lars Linsen

Abstract: Obstructive sleep apnea (OSA) is a public health problem. Volumetric analysis of the upper airways can help us to understand the pathogenesis of OSA. A reliable pharynx segmentation is the first step in identifying the anatomic risk factors for this sleeping disorder. As manual segmentation is a time-consuming and subjective process, a fully automatic segmentation of pharyngeal structures is required when investigating larger data bases such as in cohort studies. We develop a context-based automatic algorithm for segmenting pharynx from magnetic resonance images (MRI). It consists of a pipeline of steps including pre-processing (thresholding, connected component analysis) to extract coarse 3D objects, classification of the objects (involving object-based image analysis (OBIA), visual feature space analysis, and silhouette coefficient computation) to segregate pharynx from other structures automatically, and post-processing to refine the shape of the identified pharynx (including extraction of the oropharynx and propagating results from neighboring slices to slices that are difficult to delineate). Our technique is fast such that we can apply our algorithm to population-based epidemiological studies that provide a high amount of data. Our method needs no user interaction to extract the pharyngeal structure. The approach is quantitatively evaluated on ten datasets resulting in an average of approximately 90% detected volume fraction and a 90% Dice coefficient, which is in the range of the interobserver variation within manual segmentation results.

Paper Nr: 281
Title:

Curvature-based Human Body Parts Segmentation in Physiotherapy

Authors:

Francis Deboeverie, Roeland De Geest, Tinne Tuytelaars, Peter Veelaert and Wilfried Philips

Abstract: Analysing human sports activity in computer vision requires reliable segmentation of the human body into meaningful parts, such as arms, torso and legs. Therefore, we present a novel strategy for human body segmentation. Firstly, greyscale images of human bodies are divided into smooth intensity patches with an adaptive region growing algorithm based on low-degree polynomial fitting. Then, the key idea in this paper is that human body parts are approximated by nearly cylindrical surfaces, of which the axes of minimum curvature accurately reconstruct the human body skeleton. Next, human body segmentation is qualitatively evaluated with a line segment distance between reconstructed human body skeletons and ground truth skeletons. When compared with human body parts segmentations based on mean shift, normalized cuts and watersheds, the proposed method achieves more accurate segmentations and better reconstructions of human body skeletons.

Paper Nr: 288
Title:

Robust Interest Point Detection by Local Zernike Moments

Authors:

Gökhan Özbulak and Muhittin Gökmen

Abstract: In this paper, a novel interest point detector based on Local Zernike Moments is presented. Proposed detector, which is named as Robust Local Zernike Moment based Features (R-LZMF), is invariant to scale, rotation and translation changes in images and this makes it robust when detecting interesting points across the images that are taken from same scene under varying view conditions such as zoom in/out or rotation. As our experiments on the Inria Dataset indicate, R-LZMF outperforms widely used detectors such as SIFT and SURF in terms of repeatability that is main criterion for evaluating detector performance.

Paper Nr: 291
Title:

Does Inverse Lighting Work Well under Unknown Response Function?

Authors:

Shuya Ohta and Takahiro Okabe

Abstract: Inverse lighting is a technique for recovering the lighting environment of a scene from a single image of an object. Conventionally, inverse lighting assumes that a pixel value is proportional to radiance value, i.e. the response function of a camera is linear. Unfortunately, however, consumer cameras usually have unknown and nonlinear response functions, and therefore conventional inverse lighting does not work well for images taken by those cameras. In this study, we propose a method for simultaneously recovering the lighting environment of a scene and the response function of a camera from a single image. Through a number of experiments using synthetic images, we demonstrate that the performance of our proposed method depends on the lighting distribution, response function, and surface albedo, and address under what conditions the simultaneous recovery of the lighting environment and response function works well.

Paper Nr: 347
Title:

Laplacian Unitary Domain for Texture Morphing

Authors:

Antoni Gurguí, Debora Gil and Enric Martí

Abstract: Deformation of expressive textures is the gateway to realistic computer synthesis of expressions. By their good mathematical properties and flexible formulation on irregular meshes, most texture mappings rely on solutions to the Laplacian in the cartesian space. In the context of facial expression morphing, this approximation can be seen from the opposite point of view by neglecting the metric. In this paper, we use the properties of the Laplacian in manifolds to present a novel approach to warping expressive facial images in order to generate a morphing between them.

Posters
Paper Nr: 23
Title:

Grouping of Isolated Non-directional Cues with Straight Offset Polygons

Authors:

Toshiro Kubota

Abstract: When the boundary of a familiar object is shown by a series of isolated dots, humans can often recognize the object with ease. This ability can be sustained with addition of distracting dots around the object. However, such capability has not been reproduced algorithmically on computers. In this paper, we will introduce a new algorithm that groups a set of dots into multiple overlapping subsets. It first connects the dots into a spanning tree using the proximity cue. It then applies the straight polygon transformation to an initial polygon derived from the spanning tree. The straight polygon divides the space into polygons recursively and each polygon can be viewed as grouping of a subset of the dots. The number of polygons generated is O(n). We used both natural and synthetic images to test the performance of the algorithm. The results are encouraging.

Paper Nr: 36
Title:

Performance Evaluation of Bit-plane Slicing based Stereo Matching Techniques

Authors:

Chung-Chien Kao and Huei-Yung Lin

Abstract: In this paper, we propose a hierarchical framework for stereo matching. Similar to the conventional image pyramids, a series of images with less and less information is constructed. The objective is to use bit-plane slicing technique to investigate the feasibility of correspondence matching with less bits of intensity information. In the experiments, stereo matching with various bit-rate image pairs are carried out using graph cut, semi-global matching, and non-local aggregation methods. The results are submitted to Middlebury stereo page for performance evaluation.

Paper Nr: 42
Title:

Medial Width of Polygonal and Circular Figures - Approach via Line Segment Voronoi Diagram

Authors:

L. M. Mestetskiy

Abstract: The paper proposes the concept of building the so-called medial width function - integral shape descriptor of figures used in image recognition tasks. Medial width function is determined based on the skeleton of the shape and the radial function. An algorithm to compute the medial width function for polygonal figures based on the line segment Voronoi diagram is also presented here. Generalized solution to the circular figures obtained by rounding corners in a polygonal figure is presented. Computational experiment demonstrates the efficiency and effectiveness of the approach to the problem of palm shapes comparing for personal identification.

Paper Nr: 58
Title:

Breast Tissue Characterization in X-Ray and Ultrasound Images using Fuzzy Local Directional Patterns and Support Vector Machines

Authors:

Mohamed Abdel-Nasser, Domenec Puig, Antonio Moreno, Adel Saleh, Joan Marti, Luis Martin and Anna Magarolas

Abstract: Accurate breast mass detection in mammographies is a difficult task, especially with dense tissues. Although ultrasound images can detect breast masses even in dense breasts, they are always corrupted by noise. In this paper, we propose fuzzy local directional patterns for breast mass detection in X-ray as well as ultrasound images. Fuzzy logic is applied on the edge responses of the given pixels to produce a meaningful descriptor. The proposed descriptor can properly discriminate between mass and normal tissues under different conditions such as noise and compression variation. In order to assess the effectiveness of the proposed descriptor, a support vector machine classifier is used to perform mass/normal classification in a set of regions of interest. The proposed method has been validated using the well-known mini-MIAS breast cancer database (X-ray images) as well as an ultrasound breast cancer database. Moreover, quantitative results are shown in terms of area under the curve of the receiver operating curve analysis.

Paper Nr: 78
Title:

A Review of Hough Transform and Line Segment Detection Approaches

Authors:

Payam S. Rahmdel, Richard Comley, Daming Shi and Siobhan McElduff

Abstract: In a wide range of image processing and computer vision problems, line segment detection is one of the most critical challenges. For more than three decades researchers have contributed to build more robust and accurate algorithms with faster performance. In this paper we review the main approaches and in particular the Hough transform and its extensions, which are among the most well-known techniques for the detection of straight lines in a digital image. This paper is based on extensive practical research and is organised into two main parts. In the first part, the HT and its major research directions and limitations are discussed. In the second part of the paper, state-of-the-art line segmentation techniques are reviewed and categorized into three main groups with fundamentally distinctive characteristics. Their relative advantages and disadvantages are compared and summarised in a table.

Paper Nr: 100
Title:

Patch Autocorrelation Features for Optical Character Recognition

Authors:

Radu Tudor Ionescu, Andreea-Lavinia Popescu and Dan Popescu

Abstract: The autocorrelation is often used in signal processing as a tool for finding repeating patterns in a signal. In image processing, there are various image analysis techniques that use the autocorrelation of an image for a broad range of applications from texture analysis to grain density estimation. In this paper, a novel approach of capturing the autocorrelation of an image is proposed. More precisely, the autocorrelation is recorded in a set of features obtained by comparing pairs of patches from an image. Each feature stores the euclidean distance between a particular pair of patches. Although patches contain contextual information and have advantages in terms of generalization, most of the patch-based techinques used in image processing are heavy to compute with current machines. Therefore, patches are selected using a dense grid over the image to reduce the number of features. This approach is termed Patch Autocorrelation Features (PAF). The proposed approach is evaluated in a series of handwritten digit recognition experiments using the popular MNIST data set. The Patch Autocorrelation Features are compared with the euclidean distance using two classification systems, namely the k-Nearest Neighbors and Support Vector Machines. The empirical results show that the feature map proposed in this work is always better than a feature representation based on raw pixel values, in terms of accuracy. Furthermore, the results obtained with PAF are comparable to other state of the art methods.

Paper Nr: 108
Title:

Hand Shape Extraction for Contactless Remote Control

Authors:

J. F. Collumeau, H. Laurent, B. Emile and R. Leconge

Abstract: Contactless computer vision-based interfaces are of particular interest within the specific context of operating rooms (ORs) when remote control of OR equipment is envisaged. In order to avoid the spread of hospital acquired diseases, drastic measures indeed regulate the use of sterile devices by non-sterile staff and vice versa. If these meet the compulsory objective of asepsis preservation, they also impede direct interaction between surgeons and any non-sterile equipment. To overcome such limitations, we developed an image processing chain aiming at allowing remote control of OR equipment. In this study, we focus on the segmentation step and investigate the capacities of three state-of-the-art segmentation algorithms, namely K-Means classification, Watershed cuts and GrabCut. Three different supervised evaluation criteria (Hafiane and Martin’s criteria and the F-score metric) are compared to this end. A wrist extractor was developped specifically to supplement this step. Also described in this paper, it allows to provide consistent inputs to the description methods. This study gives K-Means as the segmentation method the most adapted to the hand segmentation issue with respect to both segmentation accuracy and real-time processing aspects.

Paper Nr: 110
Title:

Saliency Detection based on Depth and Sparse Features

Authors:

Gangbiao Chen and Chun Yuan

Abstract: In this paper, we modified the region-based Human Visual System (HVS) model by import two features, sparse feature and depth feature. The input image is firstly divided into small regions. Then the contrast, sparse and depth feature of each region are extracted. We calculate the center-surround feature differences for saliency detection. In this step, the center shift method is adopted. In the weighting step, the human visual acuity is adopted. Compared with the existing related algorithms, experimental results on a large public database show that the modified method works better and can obtain a more accurate result.

Paper Nr: 115
Title:

Multiphase Region-based Active Contours for Semi-automatic Segmentation of Brain MRI Images

Authors:

Farhan Akram, Domenec Puig, Miguel Angel Garcia and Adel Saleh

Abstract: Segmenting brain magnetic resonance (MRI) images of the brain into white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF) is an important problem in medical image analysis. The study of these regions can be useful for determining different brain disorders, assisting brain surgery, post-surgical analysis, saliency detection and for studying regions of interest. This paper presents a segmentation method that partitions a given brain MRI image into WM, GM and CSF regions through a multiphase region-based active contour method followed by a pixel correction thresholding stage. The proposed region-based active contour method is applied in order to partition the input image into four different regions. Three of those regions within the brain area are then chosen by intersecting a hand-drawn binary mask with the computed contours. Finally, an efficient thresholding-based pixel correction method is applied to the computed WM, GM and CSF regions to increase their accuracy. The segmentation results are compared with ground truths to show the performance of the proposed method.

Paper Nr: 130
Title:

Hybrid Person Detection and Tracking in H.264/AVC Video Streams

Authors:

Philipp Wojaczek, Marcus Laumer, Peter Amon, Andreas Hutter and André Kaup

Abstract: In this paper we present a new hybrid framework for detecting and tracking persons in surveillance video streams compressed according to the H.264/AVC video coding standard. The framework consists of three stages and operates in both the compressed and the pixel domain of the video. The combination of compressed and pixel domain represents the hybrid character. Its main objective is to significantly reduce the amount of computation required, in particular for frames and image regions with few people present. In its first stage the proposed framework evaluates the header information for each compressed frame in the video sequence, namely the macroblock type information. This results in a coarse binary mask segmenting the frame into foreground and background. Only the foreground regions are processed further in the second stage that searches for persons in the image pixel domain by applying a person detector based on the Implicit Shape Model. The third stage segments each detected person further with a newly developed method that fuses information from the first two stages. This helps obtaining a finer segmentation for calculating a color histogram suitable for tracking the person using the mean shift algorithm. The proposed framework was experimentally evaluated on a publicly available test set. The results demonstrate that the proposed framework reliably separates frames with and without persons such that the computational load is significantly reduced while the detection performance is kept.

Paper Nr: 151
Title:

A Comparison on Supervised Machine Learning Classification Techniques for Semantic Segmentation of Aerial Images of Rain Forest Regions

Authors:

Luiz Carlos A. M. Cavalcanti, Jose Reginaldo Hughes Carvalho and Eulanda Miranda dos Santos

Abstract: Segmentation is one of the most important operations in Computer Vision. Partition of the image in several domain-independent components is important in several practical machine learning solutions involving visual data. In the specific problem of finding anomalies in aerial images of forest regions, this can be specially important, as a multilevel classification solution can demand that each type of terrain and other components of the image are inspected by different classification algorithms or parameters. This work compares several common classification algorithms and assess their reliability on segmenting aerial images of rain forest regions as a first step into a multi-level classification solution. Finally, we draw conclusions based on the experiments using real images from a publicly available dataset, comparing the results of those classification algorithms for segmenting this kind of images.

Paper Nr: 199
Title:

A Real-time Computer Vision System for Biscuit Defect Inspection

Authors:

Yu Wang, Chenbo Shi, Chun Zhang and Qingmin Liao

Abstract: This paper presents a computer vision system for biscuit defects inspection which contains both hardware and software. By utilizing the system with two cameras, we focus on the detection of biscuit partial deletion and cream overflow. For detecting partial deletion, a new algorithm with a membership function for calculating feature descriptor is proposed. It’s convenient and efficient to extract feature of textons. For cream overflow detection, a chemical property of enantiomers under polarized light is made use of distinguishing cream from background. The proposed system has been implemented on the production line. Groups of on-line experiments show that our system can achieve accurate defect detection with low missing detection rate and false alarm.

Paper Nr: 211
Title:

Video Segmentation via a Gaussian Switch Background Model and Higher Order Markov Random Fields

Authors:

Martin Radolko and Enrico Gutzeit

Abstract: Foreground-background segmentation in videos is an important low-level task needed for many different applications in computer vision. Therefore, a great variety of different algorithms have been proposed to deal with this problem, however none can deliver satisfactory results in all circumstances. Our approach combines an efficent novel Background Substraction algorithm with a higher order Markov Random Field (MRF) which can model the spatial relations between the pixels of an image far better than a simple pairwise MRF used in most of the state of the art methods. Afterwards, a runtime optimized Belief Propagation algorithm is used to compute an enhanced segmentation based on this model. Lastly, a local between Class Variance method is combined with this to enrich the data from the Background Substraction. To evaluate the results the difficult Wallflower data set is used.

Paper Nr: 215
Title:

Real-time Human Pose Estimation from Body-scanned Point Clouds

Authors:

Jilliam María Díaz Barros, Frederic Garcia and Désiré Sidibé

Abstract: This paper presents a novel approach to estimate the human pose from a body-scanned point cloud. To do so, a predefined skeleton model is first initialized according to both the skeleton base point and its torso limb obtained by Principal Component Analysis (PCA). Then, the body parts are iteratively clustered and the skeleton limb fitting is performed, based on Expectation Maximization (EM). The human pose is given by the location of each skeletal node in the fitted skeleton model. Experimental results show the ability of the method to estimate the human pose from multiple point cloud video sequences representing the external surface of a scanned human body; being robust, precise and handling large portions of missing data due to occlusions, acquisition hindrances or registration inaccuracies.

Paper Nr: 219
Title:

Point-wise Diversity Measure and Visualization for Ensemble of Classifiers - With Application to Image Segmentation

Authors:

Ahmed Al-Taie, Horst K. Hahn and Lars Linsen

Abstract: The idea of using ensembles of classifiers is to increase the performance when compared to applying a single classifier. Crucial to the performance improvement is the diversity of the ensemble. A classifier ensemble is considered to be diverse, if the classifiers make no coinciding errors. Several studies discuss the diversity issue and its relation to the ensemble accuracy. Most of them proposed measures that are based on an ”Oracle” classification. In this paper, we propose a new probability-based diversity measure for ensembles of unsupervised classifiers, i.e., when no Oracle machine exists. Our measure uses a point-wise definition of diversity, which allows for a distinction of diverse and non-diverse areas. Moreover, we introduce the concept of further categorizing the diverse areas into healthy and unhealthy diversity areas. A diversity area is healthy for the ensemble performance, if there is enough redundancy to compensate for the errors. Then, the performance of the ensemble can be based on two parameters, the non-diversity area, i.e., the size of all regions where the classifiers of the ensemble agree, and the healthy diversity area, i.e., the size of the regions where the diversity is healthy. Furthermore, our point-wise diversity measure allows for an intuitive visualization of the ensemble diversity for visual ensemble performance comparison in the context of image segmentation.

Paper Nr: 240
Title:

Retinal Vessel Segmentation using Deep Neural Networks

Authors:

Martina Melinscak, Pavle Prentasic and Sven Loncaric

Abstract: Automatic segmentation of blood vessels in fundus images is of great importance as eye diseases as well as some systemic diseases cause observable pathologic modifications. It is a binary classification problem: for each pixel we consider two possible classes (vessel or non-vessel). We use a GPU implementation of deep max-pooling convolutional neural networks to segment blood vessels. We test our method on publicly-available DRIVE dataset and our results demonstrate the high effectiveness of the deep learning approach. Our method achieves an average accuracy and AUC of 0.9466 and 0.9749, respectively.

Paper Nr: 252
Title:

Optimization-based Automatic Segmentation of Organic Objects of Similar Types

Authors:

Enrico Gutzeit, Martin Radolko, Arjan Kuijper and Uwe von Lukas

Abstract: For the segmentation of multiple objects on unknown background in images, some approaches for specific objects exist. However, no approach is general enough to segment an arbitrary group of organic objects of similar type, like wood logs, apples, or tomatoes. Each approach contains restrictions in the object shape, texture, color or in the image background. Many methods are based on probabilistic inference on Markov Random Fields – summarized in this work as optimization based segmentation. In this paper, we address the automatic segmentation of organic objects of similar types by using optimization based methods. Based on the result of object detection, a fore- and background model is created enabling an automatic segmentation of images. Our novel and more general approach for organic objects is a first and important step in a measuring or inspection system. We evaluate and compare our approaches on images with different organic objects on very different backgrounds, which vary in color and texture. We show that the results are very accurate.

Paper Nr: 266
Title:

Saliency Detection using Geometric Context Contrast Inferred from Natural Images

Authors:

Anurag Singh, Chee-Hung Henry Chu and Michael A. Pratt

Abstract: Image saliency detection using region contrast is often based on the premise that salient region has a contrast with the background which becomes a limiting factor if the color of the salient object background is similar. To overcome this problem associated with single image analysis, we propose to collect background regions from a collection of images where generative property of, say, natural images ensures that all the images are spun out of it hence negating any bias. Background regions are differentiated based on their geometric context where we use the ground and sky context as background. Finally, the aggregated map is generated using color contrast between the superpixels segments of the image and collection of background superpixels.

Paper Nr: 267
Title:

Segmentation of Optic Disc and Blood Vessels in Retinal Images using Wavelets, Mathematical Morphology and Hessian-based Multi-scale Filtering

Authors:

Luiz Carlos Rodrigues and Mauricio Marengoni

Abstract: A digitized image captured by a fundus camera provides an effective, inexpensive and non-invasive resource for the assessment of vascular damage caused by diabetes, arterial hypertension, hypercholesterolemia and aging. These unhealthy conditions may have very serious consequence like hemorrhages, exudates, branch retinal vein occlusion, leading to the partial or total loss of vision capabilities. This study has focus on the computer vision techniques of image segmentation required for a completely automated assessment system for the vascular conditions of the eye. The study here presented proposes a new algorithm based on wavelets transforms and mathematical morphology for the segmentation of the optic disc and a Hessian based multi-scale filtering to segment the vascular tree in color eye fundus photographs. The optic disc and vessel tree, are both essential to the analysis of the retinal fundus image. The optic disc can be identified by a bright region on the fundus image, for its segmentation we apply Haar wavelets transform to obtain the low frequencies representation of the image and then apply mathematical morphology to enhance the segmentation. The tree vessel segmentation is achieved using a Hessian-based multi-scale filtering that, based on its second order derivatives, explores the tubular shape of a blood vessel to classify the pixels as part, or not, of a vessel. The proposed method is being developed and tested based on the DRIVE database, which contains 40 color eye fundus images.

Paper Nr: 270
Title:

Pre-processing Techniques to Improve the Efficiency of Video Identification for the Pygmy Bluetongue Lizard

Authors:

Damian Tohl, Jimmy Li and C. Michael Bull

Abstract: In the study of the endangered Pygmy Bluetongue Lizard, non-invasive photographic identification is preferred to the current invasive methods which can be unreliable and cruel. As the lizard is an endangered species, there are restrictions on its handling. The lizard is also in constant motion and it is therefore difficult to capture a good still image for identification purposes. Hence video capture is preferred as a number of images of the lizard at various positions and qualities can be collected in just a few seconds from which the best image can be selected for identification. With a large number of individual lizards in the database, matching a video sequence of images against each database image for identification will render the process very computationally inefficient. Moreover, a large portion of those images are non-identifiable due to motion and optical blur and different body curvature to the reference database image. In this paper, we propose a number of pre-processing techniques for pre-selecting the best image out of the video image sequence for identification. Using our proposed pre-selection techniques, it has been shown that the computational efficiency can be significantly improved.

Paper Nr: 285
Title:

An Human Perceptive Model for Person Re-identification

Authors:

Angelo Cardellicchio, Tiziana D'Orazio, Tiziano Politi and Vito Renò

Abstract: Person re-identification has increasingly become an interesting task in the computer vision field, especially after the well known terroristic attacks on the World Trade Center in 2001. Even if video surveillance systems exist since the early 1950s, the third generation of such systems is a relatively modern topic and refers to systems formed by multiple fixed or mobile cameras - geographically referenced or not - whose information have to be handled and processed by an intelligent system. In the last decade, researchers are focusing their attention on the person re-identification task because computers (and so video surveillance systems) can handle a huge amount of data reducing the time complexity of the algorithms. Moreover, some well known image processing techniques - i.e. background subtraction - can be embedded directly on cameras, giving modularity and flexibility to the whole system. The aim of this work is to present an appearance-based method for person re-identification that models the chromatic relationship between both different frames and different areas of the same frame. This approach has been tested against two public benchmark datasets (ViPER and ETHZ) and the experiments demonstrate that the person re-identification processing by means of intra frame relationships is robust and shows great results in terms of recognition percentage.

Paper Nr: 314
Title:

Thinning based Antialiasing Approach for Visual Saliency of Digital Images

Authors:

Olivier Rukundo

Abstract: A thinning based approach for spatial antialiasing (TAA) has been proposed for visual saliency of digital images. This TAA approach is based on edge-matting and digital compositing strategies. Prior to edgematting the image edges are detected using ant colony optimization (ACO) algorithm and then thinned using a fast parallel algorithm. After the edge-matting, a composite image is created between the edge-matted and non-antialiasing image. Motivations for adopting the ACO and fast parallel algorithm in lieu of others found in the literature are also extensively addressed in this paper. Preliminary TAA experimental outcomes are more promising but with debatable smoothness to some extent of the original size of the images in comparison.

Paper Nr: 332
Title:

Curve-skeleton Extraction from Visual Hull

Authors:

Andrey Zimovnov and Leonid Mestetskiy

Abstract: We present a new algorithm of curve-skeleton extraction from a wide variety of objects. The algorithm uses visuall hull object approximation, which gives us an ability to work with the model in its silhouettes domain. We propose an efficient algorithm for 3D distance transform computation for the inner voxels of visual hull. Using that 3D distance transform we backproject continuous medial axes of visual hull silhouettes that form a first approximation for a curve-skeleton. Then we use a set of filtering techniques to denoise that point cloud to form a thinner approximation. We believe that a resulting approximation is usefull in its own. The described method shows a great improvement in computational time comparing to existing ones. The method shows good extraction results for models with complex geometry and topology. Resulting curve-skeletons conform with most requirements to universal curve-skeletons.

Paper Nr: 337
Title:

Comparison of Statistical and Artificial Neural Networks Classifiers by Adjusted Non Parametric Probability Density Function Estimate

Authors:

Ibtissem Ben Othman, Wissal Drira, Faycel El Ayeb and Faouzi Ghorbel

Abstract: In the industrial field, the artificial neural network classifiers are currently used and they are generally integrated of technologic systems which need efficient classifier. Statistical classifiers also have been developed in the same direction and different associations and optimization procedures have been proposed as Adaboost training or CART algorithm to improve the classification performance. However, the objective comparison studies between these novel classifiers stay marginal. In the present work, we intend to evaluate with a new criterion the classification stability between neural networks and some statistical classifiers based on the optimization Fischer criterion or the maximization of Patrick-Fischer distance orthogonal estimator. The stability comparison is performed by the error rate probability densities estimation which is valorised by the performed kernel-diffeomorphism Plug-in algorithm. The results obtained show that the statistical approaches are more stable compared to the neural networks.

Paper Nr: 340
Title:

An Experimental Benchmark for Point Set Coarse Matching

Authors:

Ferran Roure, Yago Díez, Xavier Lladó, Josep Forest, Tomislav Pribanic and Joaquim Salvi

Abstract: Coarse Matching of point clouds is a fundamental problem in a variety of computer vision applications. While many algorithms have been developed in recent years to address its different aspects, the lack of unified measures and commonly agreed upon data hampers algorithm performances comparison. Additionally, a large number of contributions are tested only with synthetic or processed data. This is a problem as the resulting scenario is somewhat less challenging and does not always conform to practical application conditions. In this paper, we present a new, publicly available database that aims at overcoming the existing problems, provide researchers with a useful tool to compare new contributions to existing ones and represent a step towards standardization. The database contains both processed and unprocessed data with attention to specially challenging datasets. It also includes information on correct solution, presence of noise, overlap percentages and additional information that will allow researchers to focus only on specific parts of the matching pipeline.

Paper Nr: 341
Title:

Low Level Statistical Models for Initialization of Interactive 2D/3D Segmentation Algorithms

Authors:

Jan Kolomazník, Jan Horáček and Josef Pelikán

Abstract: In this paper we present two models which are suitable for interactive segmentation algorithms to decrease amount of user work. Models are used during initialization step and do not increase complexity of segmentation algorithms. Model describe spatial distribution of image values and classification as either foreground or background. Second part of the model is vector field which constrains direction of boundary normals. We show how to use these models in parametric snakes/surfaces framework and minimal graph-cut based segmentation.

Area 3 - Image and Video Understanding

Full Papers
Paper Nr: 29
Title:

A Group Contextual Model for Activity Recognition in Crowded Scenes

Authors:

Khai N. Tran, Xu Yan, Ioannis A. Kakadiaris and Shishir K. Shah

Abstract: This paper presents an efficient framework for activity recognition based on analyzing group context in crowded scenes. We use graph based clustering algorithm to discover interacting groups using top-down mechanism. Using discovered interacting groups, we propose a new group context activity descriptor capturing not only the focal person’s activity but also behaviors of its neighbors. For a high-level of understanding of human activities, we propose a random field model to encode activity relationships between people in the scene. We evaluate our approach on two public benchmark datasets. The results of both the steps show that our method achieves recognition rates comparable to state-of-the-art methods for activity recognition in crowded scenes.

Paper Nr: 74
Title:

Regularized Latent Least Squares Regression for Unconstrained Still-to-Video Face Recognition

Authors:

Haoyu Wang, Changsong Liu and Xiaoqing Ding

Abstract: In this paper, we present a novel method for the still-to-video face recognition problem in unconstrained environments. Due to variations in head pose, facial expression, lighting condition and image resolution, it is infeasible to directly matching faces from still images and video frames. We regard samples from these two distinct sources as multi-modal or heterogeneous data, and use latent identity vectors in a common subspace to connect two modalities. Differed from the conventional least squares regression problem, unknown latent variables are treated as response to be computed. Besides, several constraint and regularization terms are introduced into the optimization equation. This method is thus called regularized latent least squares regression. We divide the original problem into two sub-problems and develop an alternating optimization algorithm to solve it. Experimental results on two public datasets demonstrate the effectiveness of our method.

Paper Nr: 92
Title:

Using Inertial Data to Enhance Image Segmentation - Knowing Camera Orientation Can Improve Segmentation of Outdoor Scenes

Authors:

Osian Haines, David Bull and J. F. Burn

Abstract: In the context of semantic image segmentation, we show that knowledge of world-centric camera orientation (from an inertial sensor) can be used to improve classification accuracy. This works because certain structural classes (such as the ground) tend to appear in certain positions relative to the viewer. We show that orientation information is useful in conjunction with typical image-based features, and that fusing the two results in substantially better classification accuracy than either alone – we observed an increase from 61% to 71% classification accuracy, over the six classes in our test set, when orientation information was added. The method is applied to segmentation using both points and lines, and we also show that combining points with lines further improves accuracy. This work is done towards our intended goal of visually guided locomotion for either an autonomous robot or human.

Paper Nr: 102
Title:

Implicit Shape Models for 3D Shape Classification with a Continuous Voting Space

Authors:

Viktor Seib, Norman Link and Dietrich Paulus

Abstract: Recently, different adaptations of Implicit Shape Models (ISM) for 3D shape classification have been presented. In this paper we propose a new method with a continuous voting space and keypoint extraction by uniform sampling. We evaluate different sets of typical parameters involved in the ISM algorithm and compare the proposed algorithm on a large public dataset with state of the art approaches.

Paper Nr: 116
Title:

Weakly Supervised Object Localization with Large Fisher Vectors

Authors:

Josip Krapac and Siniša Šegvić

Abstract: We propose a novel method for learning object localization models in a weakly supervised manner, by employing images annotated with object class labels but not with object locations. Given an image, the learned model predicts both the presence of the object class in the image and the bounding box that determines the object location. The main ingredients of our method are a large Fisher vector representation and a sparse classification model enabling efficient evaluation of patch scores. The method is able to reliably detect very small objects with some intra-class variation in reasonable time. Experimental validation has been performed on a public dataset and we report localization performance comparable to strongly supervised approaches.

Paper Nr: 139
Title:

Real-time Curve-skeleton Extraction of Human-scanned Point Clouds - Application in Upright Human Pose Estimation

Authors:

Frederic Garcia and Bjorn Ottersten

Abstract: This paper presents a practical and robust approach for upright human curve-skeleton extraction. Curveskeletons are object descriptors that represent a simplified version of the geometry and topology of a 3-D object. The curve-skeleton of a human-scanned point set enables the approximation of the underlying skeletal structure and thus, to estimate the body configuration (human pose). In contrast to most curve-skeleton extraction methodologies from the literature, we herein propose a real-time curve-skeleton extraction approach that applies to scanned point clouds, independently of the object’s complexity and/or the amount of noise within the depth measurements. The experimental results show the ability of the algorithm to extract a centered curve-skeleton within the 3-D object, with the same topology, and with unit thickness. The proposed approach is intended for real world applications and hence, it handles large portions of data missing due to occlusions, acquisition hindrances or registration inaccuracies.

Paper Nr: 140
Title:

A Real-time, Automatic Target Detection and Tracking Method for Variable Number of Targets in Airborne Imagery

Authors:

Tunç Alkanat, Emre Tunali and Sinan Öz

Abstract: In this study, a real-time fully automatic detection and tracking method is introduced which is capable of handling variable number of targets. The procedure starts with multiple scale target hypothesis generation in which the distinctive targets are revealed. To measure distinctiveness; first, the interested blobs are detected based on Canny edge detection with adaptive thresholding which is achieved by a feedback loop considering the number of target hypotheses of the previous frame. Then, the irrelevant blobs are eliminated by two metrics, namely effective saliency and compactness. To handle the missing and noisy observations, temporal consistency of each target hypothesis is evaluated and the outlier observations are eliminated. To merge data from multiple scales, a target likelihood map is generated by using kernel density estimation in which weights of the observations are determined by temporal consistency and scale factor. Finally, significant targets are selected by an adaptive thresholding scheme; then the tracking is achieved by minimizing spatial distance between the selected targets in consecutive frames.

Paper Nr: 153
Title:

Linear Discriminant Analysis for Zero-shot Learning Image Retrieval

Authors:

Sovann EN, Frédéric Jurie, Stéphane Nicolas, Caroline Petitjean and Laurent Heutte

Abstract: This paper introduces a new distance function for comparing images in the context of content-based image retrieval. Given a query and a large dataset to be searched, the system has to provide the user – as efficiently as possible – with a list of images ranked according to their distance to the query. Because of computational issues, traditional image search systems are generally based on conventional distance function such as the Euclidian distance or the dot product, avoiding the use of any training data nor expensive online metric learning algorithms. The drawback is that, in this case, the system can hardly cope with the variability of image contents. This paper proposes a simple yet efficient zero-shot learning algorithm that can learn a query-adapted distance function from a single image (the query) or from a few images (e.g. some user-selected images in a relevance feedback iteration), hence improving the quality of the retrieved images. This allows our system to work with any object categories without requiring any training data, and is hence more applicable in real world use cases. More interestingly, our system can learn the metric on the fly, at almost no cost, and the cost of the ranking function is as low as the dot product distance. By allowing the system to learn to rank the images, significantly and consistently improved results (over the conventional approaches) have been observed on the Oxford5k, Paris6k and Holiday1k datasets.

Paper Nr: 159
Title:

Using Action Objects Contextual Information for a Multichannel SVM in an Action Recognition Approach based on Bag of VisualWords

Authors:

Jordi Bautista-Ballester, Jaume Vergés-Llahí and Domenec Puig

Abstract: Classifying web videos using a Bag of Words (BoW) representation has received increased attention due to its computational simplicity and good performance. The increasing number of categories, including actions with high confusion, and the addition of significant contextual information has lead to most of the authors focusing their efforts on the combination of descriptors. In this field, we propose to use the multikernel Support Vector Machine (SVM) with a contrasted selection of kernels. It is widely accepted that using descriptors that give different kind of information tends to increase the performance. To this end, our approach introduce contextual information, i.e. objects directly related to performed action by pre-selecting a set of points belonging to objects to calculate the codebook. In order to know if a point is part of an object, the objects are previously tracked by matching consecutive frames, and the object bounding box is calculated and labeled. We code the action videos using BoW representation with the object codewords and introduce them to the SVM as an additional kernel. Experiments have been carried out on two action databases, KTH and HMDB, the results provide a significant improvement with respect to other similar approaches.

Paper Nr: 178
Title:

A Unified Framework for Coarse-to-Fine Recognition of Traffic Signs using Bayesian Network and Visual Attributes

Authors:

Hamed Habibi Aghdam, Elnaz Jahani Heravi and Domenec Puig

Abstract: Recently, impressive results have been reported for recognizing the traffic signs. Yet, they are still far from the real-world applications. To the best of our knowledge, all methods in the literature have focused on numerical results rather than applicability. First, they are not able to deal with novel inputs such as the false-positive results of the detection module. In other words, if the input of these methods is a non-traffic sign image, they will classify it into one of the traffic sign classes. Second, adding a new sign to the system requires retraining the whole system. In this paper, we propose a coarse-to-fine method using visual attributes that is easily scalable and, importantly, it is able to detect the novel inputs and transfer its knowledge to the newly observed sample. To correct the misclassified attributes, we build a Bayesian network considering the dependency between the attributes and find their most probable explanation using the observations. Experimental results on the benchmark dataset indicates that our method is able to outperform the state-of-art methods and it also possesses three important properties of novelty detection, scalability and providing semantic information.

Paper Nr: 184
Title:

Recognition of Human Actions using Edit Distance on Aclet Strings

Authors:

Luc Brun, Pasquale Foggia, Alessia Saggese and Mario Vento

Abstract: In this paper we propose a novel method for human action recognition based on string edit distance. A two layer representation is introduced in order to exploit the temporal sequence of the events: a first representation layer is obtained by using a feature vector obtained from depth images. Then, each action is represented as a sequence of symbols, where each symbol corresponding to an elementary action (aclet) is obtained according to a dictionary previously defined during the learning phase. The similarity between two actions is finally computed in terms of string edit distance, which allows the system to deal with actions showing different length as well as different temporal scales. The experimentation has been carried out on two widely adopted datasets, namely the MIVIA and the MHAD datasets, and the obtained results, compared with state of the art approaches, confirm the effectiveness of the proposed method.

Paper Nr: 186
Title:

Scene Representation and Anomalous Activity Detection using Weighted Region Association Graph

Authors:

D.P. Dogra, R. D. Reddy, K.S. Subramanyam, A. Ahmed and H. Bhaskar

Abstract: In this paper we present a novel method for anomalous activity detection using systematic trajectory analysis. First, the visual scene is segmented into constituent regions by attaching importances based on motion dynamics of targets in that scene. Further, a structured representation of these segmented regions in the form of a region association graph (RAG) is constructed. Finally, anomalous activity is detected by benchmarking the target’s trajectory against the RAG. We have evaluated our proposed algorithm and compared it against competent baselines using videos from publicly available as well as in-house datasets. Our results indicate high accuracy in localizing anomalous segments and demonstrate that the proposed algorithm has several compelling advantages when applied to scene analysis in autonomous visual surveillance.

Paper Nr: 187
Title:

Estimation of Human Orientation using Coaxial RGB-Depth Images

Authors:

Fumito Shinmura, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase and Hironobu Fujiyoshi

Abstract: Estimation of human orientation contributes to improving the accuracy of human behavior recognition. However, estimation of human orientation is a challenging task because of the variable appearance of the human body. The wide variety of poses, sizes and clothes combined with a complicated background degrades the estimation accuracy. Therefore, we propose a method for estimating human orientation using coaxial RGB-Depth images. This paper proposes Depth Weighted Histogram of Oriented Gradients (DWHOG) feature calculated from RGB and depth images. By using a depth image, the outline of a human body and the texture of a background can be easily distinguished. In the proposed method, a region having a large depth gradient is given a large weight. Therefore, features at the outline of the human body are enhanced, allowing robust estimation even with complex backgrounds. In order to combine RGB and depth images, we utilize a newly available single-chip RGB-ToF camera, which can capture both RGB and depth images taken along the same optical axis. We experimentally confirmed that the proposed method can estimate human orientation robustly to complex backgrounds, compared to a method using conventional HOG features.

Paper Nr: 198
Title:

Semi-automatic Hand Detection - A Case Study on Real Life Mobile Eye-tracker Data

Authors:

Stijn De Beugher, Geert Brône and Toon Goedemé

Abstract: In this paper we present a highly accurate algorithm for the detection of human hands in real-life 2D image sequences. Current state of the art algorithms show relatively poor detection accuracy results on unconstrained, challenging images. To overcome this, we introduce a detection scheme in which we combine several well known detection techniques combined with an advanced elimination mechanism to reduce false detections. Furthermore we present a novel (semi-)automatic framework achieving detection rates up to 100%, with only minimal manual input. This is a useful tool in supervised applications where an error-free detection result is required at the cost of a limited amount of manual effort. As an application, this paper focuses on the analysis of video data of human-human interaction, collected with the scene camera of mobile eye-tracking glasses. This type of data is typically annotated manually for relevant features (e.g. visual fixations on gestures), which is a time-consuming, tedious and error-prone task. The usage of our semi-automatic approach reduces the amount of manual analysis dramatically. We also present a new fully annotated benchmark dataset on this application which we made publicly available.

Paper Nr: 205
Title:

Estimating Human Actions Affinities Across Views

Authors:

Nicoletta Noceti, Alessandra Sciutti, Francesco Rea, Francesca Odone and Giulio Sandini

Abstract: This paper deals with the problem of estimating the affinity level between different types of human actions observed from different viewpoints. We analyse simple repetitive upper body human actions with the goal of producing a view-invariant model from simple motion cues, that have been inspired by studies on the human perception. We adopt a simple descriptor that summarizes the evolution of spatio-temporal curvature of the trajectories, which we use for evaluating the similarity between actions pair on a multi-level matching. We experimentally verified the presence of semantic connections between actions across views, inferring a relations graph that shows such affinities.

Paper Nr: 214
Title:

Efficient Implementation of a Recognition System using the Cortex Ventral Stream Model

Authors:

Ahmad Bitar, Mohammad M. Mansour and Ali Chehab

Abstract: In this paper, an efficient implementation for a recognition system based on the original HMAX model of the visual cortex is proposed. Various optimizations targeted to increase accuracy at the so-called layers S1, C1, and S2 of the HMAX model are proposed. At layer S1, all unimportant information such as illumination and expression variations are eliminated from the images. Each image is then convolved with 64 separable Gabor filters in the spatial domain. At layer C1, the minimum scales values are exploited to be embedded into the maximum ones using the additive embedding space. At layer S2, the prototypes are generated in a more efficient way using Partitioning Around Medoid (PAM) clustering algorithm. The impact of these optimizations in terms of accuracy and computational complexity was evaluated on the Caltech101 database, and compared with the baseline performance using support vector machine (SVM) and nearest neighbor (NN) classifiers. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10% where the computational complexity is also reduced. The accuracy is slightly increased for both approximations at the C1 and S2 layers.

Paper Nr: 247
Title:

Unsupervised Segmentation Evaluation for Image Annotation

Authors:

Annette Morales-González, Edel García-Reyes and Luis Enrique Sucar

Abstract: Unsupervised segmentation evaluation measures are usually validated against human-generated ground-truth. Nevertheless, with the recent growth of image classification methods that use hierarchical segmentation-based representations, it would be desirable to assess the performance of unsupervised segmentation evaluation to select the most suitable levels to perform recognition tasks. Another problem is that unsupervised segmentation evaluation measures use only low-level features, which makes difficult to evaluate how well an object is outlined. In this paper we propose to use four semantic measures, that combined with other state-of-the-art measures improve the evaluation results and also, we validate the results of each unsupervised measure against an image annotation algorithm ground truth, showing that using measures that try to emulate human behaviour is not necessarily what an automatic recognition algorithm may need. We employed the Stanford Background Dataset to validate an image annotation algorithm that includes segmentation evaluation as starting point, and the proposed combination of unsupervised measures showed the best annotation accuracy results.

Paper Nr: 262
Title:

CURFIL: Random Forests for Image Labeling on GPU

Authors:

Hannes Schulz, Benedikt Waldvogel, Rasha Sheikh and Sven Behnke

Abstract: Random forests are popular classifiers for computer vision tasks such as image labeling or object detection. Learning random forests on large datasets, however, is computationally demanding. Slow learning impedes model selection and scientific research on image features. We present an open-source implementation that significantly accelerates both random forest learning and prediction for image labeling of RGB-D and RGB images on GPU when compared to an optimized multi-core CPU implementation. We use the fast training to conduct hyper-parameter searches, which significantly improves on previous results on the NYU depth v2 dataset. Our prediction runs in real time at VGA resolution on a mobile GPU and has been used as data term in multiple applications.

Short Papers
Paper Nr: 12
Title:

The Gradient Product Transform for Symmetry Detection and Blood Vessel Extraction

Authors:

Christoph Dalitz, Regina Pohle-Fröhlich, Fabian Schmitt and Manuel Jeltsch

Abstract: The "gradient product transform" is a recently proposed image filter for assigning each image point a symmetry score based on scalar products of gradients. In this article, we show that the originally suggested method for finding the radius of the symmetry region is unreliable, and a more robust method is presented. Moreover, we extend the symmetry transform to rectangular symmetry regions so that it is more robust with respect to skew, and the transform is generalised to also work with three dimensional image data. We apply the transform to two different problems: detection of objects with rotational symmetry, and blood vessel extraction from medical images. In an experimental comparison with other solutions for these problems, the gradient product transform performs comparable to the best known algorithm for rotational symmetry detection, and better than the vesselness filter for blood vessel extraction.

Paper Nr: 26
Title:

Self-scaling Kinematic Hand Skeleton for Real-time 3D Hand-finger Pose Estimation

Authors:

Kristian Ehlers and Jan Helge Klüssendorff

Abstract: Since low cost RGB-D sensors have been available, gesture detection has gained more and more interest in the field of human computer and human robot interaction. It is possible to navigate through interactive menus by waving the hand and to confirm menu items by pointing at them. Such applications require real-time body or hand-finger pose estimation algorithms. This paper presents a kinematic approach to estimate the full pose of the hand including the finger joints’ angles. A self-scaling kinematic hand skeleton model is presented and fitted into the 3D data of the hand in real-time on standard hardware with up to 30 frames per second without using a GPU. This approach is based on least-square minimization and an intelligent choice of the error function. The tracking accuracy is evaluated based on a recorded dataset as well as simulated data. Qualitative results are presented emphasizing the tracking ability under hard conditions like full hand turning and self-occlusion.

Paper Nr: 38
Title:

A Probabilistic Feature Fusion for Building Detection in Satellite Images

Authors:

Dimitrios Konstantinidis, Tania Stathaki, Vasileios Argyriou and Nikos Grammalidis

Abstract: Building segmentation from 2D images can be a very challenging task due to the variety of objects that appear in an urban environment. Many algorithms that attempt to automatically extract buildings from satellite images face serious problems and limitations. In this paper, we address some of these problems by applying a novel approach that is based on the fusion of Histogram of Oriented Gradients (HOG), Normalized Difference Vegetation Index (NDVI) and Features from Accelerated Segment Test (FAST) features. We will demonstrate that by taking advantage of the multi-spectral nature of a satellite image and by employing a probabilistic fusion of the aforementioned features, we manage to create a novel methodology that increases the performance of a building detector compared to other state-of-the-art methods.

Paper Nr: 45
Title:

Solving Orientation Duality for 3D Circular Features using Monocular Vision

Authors:

Alaa AlZoubi, Tanja K. Kleinhappel, Thomas W. Pike, Bashir Al-Diri and Patrick Dickinson

Abstract: Methods for estimating the 3D orientation of circular features from a single image result in at least two solutions, of which only one corresponds to the actual orientation of the object. In this paper we propose two new methods for solving this “orientation duality” problem using a single image. Our first method estimates the resulting ellipse projections in 2D space for the given solutions, then matches them against the image ellipse to infer the true orientation. The second method compares solutions from two co-planar circle features with different centre points, to identify their mutual true orientation. Experimental results show the robustness and the effectiveness of our methods for solving the duality problem, and perform better than state-of-art methods.

Paper Nr: 54
Title:

Improving Quality of Training Samples Through Exhaustless Generation and Effective Selection for Deep Convolutional Neural Networks

Authors:

Takayoshi Yamashita, Taro Watasue, Yuji Yamauchi and Hironobu Fujiyoshi

Abstract: Deep convolutional neural networks require a huge amount of data samples to train efficient networks. Al- though many benchmarks manage to create abundant samples to be used for training, they lack efficiency when trying to train convolutional neural networks up to their full potential. The data augmentation is one of the solutions to this problem, but it does not consider the quality of samples, i.e. whether the augmented samples are actually suitable for training or not. In this paper, we propose a method that will allow us to select effective samples from an augmented sample set. The achievements of our method were 1) to be able to generate a large amount of augmented samples from images with labeled data and multiple background images; and 2) to be able to select effective samples from the additionally augmented ones through iterations of parameter updating during the training process. We utilized exhaustless sample generation and effective sample selection in order to perform recognition and segmentation tasks. It obtained the best performance in both tasks when compared to other methods using, or not, sample generation and/or selection.

Paper Nr: 76
Title:

Reliable Image Matching using Binarized Gradient Features Obtained with Multi-flash Camera

Authors:

Yasunori Sakuramoto, Yuichi Kanematsu, Shuichi Akizuki, Manabu Hashimoto, Kiyotaka Watanabe and Makito Seki

Abstract: In this paper, we propose an object detection method using features describing information about a concavoconvex shape of an object that are obtained by using a small camera that controls the illumination direction. A feature image containing information about the shape of the object is generated by integrating images obtained by turning on, one by one, light emitting diodes (LEDs) annularly arranged around the camera. Our method can reliably detect a texture-less object by using this feature image in the matching process. Experiments using 200 actual images confirmed that the method achieves a 97.5% recognition success rate and a 4.62 sec processing time.

Paper Nr: 79
Title:

Detection of Low-textured Objects

Authors:

Christopher Bulla and Andreas Weissenburger

Abstract: In this paper, we present a descriptor architecture, SIFText, that combines texture, shape and color information in one descriptor. The respective descriptor parts are weighted according to the underlying image content, thus we are able to detect and locate low-textured objects in images without performance losses for textured objects. We furthermore present a matching strategy beside the frequently used nearest neighbor matching that has been especially designed for the proposed descriptor. Experiments on synthetically generated images show the improvement of our descriptor in comparison to the standard SIFT descriptor. We show that we are able to detect more features in non-textured regions, which facilitates an accurate detection of non-textured objects. We further show that the performance of our descriptor is comparable to the performance of the SIFT descriptor for textured objects.

Paper Nr: 87
Title:

A Shape Consistency Measure for Improving the Generalized Hough Transform - Modified Voting Procedure for Discriminative Generalized Hough Transform based on Random Forest Confidence Measure

Authors:

Ferdinand Hahmann, Gordon Böer, Eric Gabriel, Carsten Meyer and Hauke Schramm

Abstract: The Discriminative Generalized Hough Transform (DGHT) is a general object localization approach. Based on a training corpus with annotated target point locations it employs a discriminative training technique to generate weighted shape models for usage in a standard GHT voting procedure. The method has shown to successfully cover medium target object variability by aggregating model points, representing the different variants, in a single model. However, due to the independent treatment of model points in the GHT voting, mutually exclusive variations may support the same localization hypothesis, leading to false positives. The problem is addressed by analyzing the spatial pattern of model points, voting for a specific Hough cell, and learning the structural differences between successful and unsuccessful localizations. Random Forests are utilized to rate the regularity of model point patterns to provide the probability of a “regular shape”, indicating a successful localization. The approach is evaluated on a public corpus containing 3830 portrait images with strong head pose variation with a localization success rate of 99.2% for the iris of both eyes. This is an improvement of 2% compared to the DGHT baseline system which demonstrates the potential of the novel method to eliminatve an important source of mislocalizations.

Paper Nr: 88
Title:

Deep Learning for Facial Keypoints Detection

Authors:

Mikko Haavisto, Arto Kaarna and Lasse Lensu

Abstract: A new area of machine learning research called deep learning has moved machine learning closer to one of its original goals: artificial intelligence and feature learning. Originally the key idea of training deep networks was to pretrain models in completely unsupervised way and then fine-tune the parameters for the task at hand using supervised learning. In this study, deep learning is applied to a facial keypoints detection. The task is to predict the positions of 15 keypoints on grayscale face images. Each predicted keypoint is specified by a real valued pair in the space of pixel coordinates. In the experiments, we pretrained a Deep Belief Network (DBN) and finally performed discriminative fine-tuning. We varied the depth and size of the network. We tested both deterministic and sampled hidden activations, and the effect of additional unlabeled data on pretraining. The experimental results show that our model provides better results than the publicly available benchmarks for the dataset.

Paper Nr: 121
Title:

Illumination Estimation and Relighting using an RGB-D Camera

Authors:

Yohei Ogura, Takuya Ikeda, Francois de Sorbier and Hideo Saito

Abstract: In this paper, we propose a relighting system combined with an illumination estimation method using RGBD camera. Relighting techniques can achieve the photometric registration of composite images. They often need illumination environments of the scene which include a target object and the background scene. Some relighting methods obtain the illumination environments beforehand. In this case, they cannot be used under the unknown dynamic illumination environment. Some on-line illumination estimation methods need light probes which can be invade the scene geometry. In our method, the illumination environment is estimated from pixel intensity, normal map and surface reflectance based on inverse rendering in on-line processing. The normal map of the arbitrary object which is used in the illumination estimation part and the relighting part is calculated from the denoised depth image on each frame. Relighting is achieved by calculating the ratio for the estimated Illumination environment of the each scene. Thus our implementation can be used for dynamic illumination or a dynamic object.

Paper Nr: 136
Title:

A Recommendation System for Paintings using Bag of Keypoints and Dominant Color Descriptors

Authors:

Ricardo Ribani and Mauricio Marengoni

Abstract: Determining the visual description for a painting is an interesting task that can be used in different applications, like retrieval, classification and recommendation. A painting can differ from others depending on the time period it was painted, the genre and the art movement the author lived. This paper present an approach for content based image retrieval applied to art paintings using the concept of bag of keypoints and SURF detector. A descriptor for dominant color is also used and weighted for a best visual retrieval.

Paper Nr: 138
Title:

Active Perception - Improving Perception Robustness by Reasoning about Context

Authors:

Andreas Hofmann and Paul Robertson

Abstract: Existing machine perception systems are too inflexible, and therefore cannot adapt well to environment uncertainty. We address this problem through a more dynamic approach in which reasoning about context is used to actively and effectively allocate and focus sensing and action resources. This Active Perception approach prioritizes the system’s overall goals, so that perception and situation awareness are well integrated with actions to focus all efforts on these goals in an optimal manner. We use a POMDP (Partially Observable Markov Decision Process) framework, but do not attempt to compute a comprehensive control policy, as this is intractible for practical problems. Instead, we employ Belief State Planning to compute point solutions from an initial state to a goal state set. This approach automatically generates action sequences for sensing operations that reduce uncertainty in the belief state, and ultimately achieve the goal state set.

Paper Nr: 181
Title:

Good Practices on Hand Gestures Recognition for the Design of Customized NUI

Authors:

Damiano Malafronte and Nicoletta Noceti

Abstract: In this paper we consider the problem of recognizing dynamic human gestures in the context of human-machine interaction. We are particularly interested to the so-called Natural User Interfaces, a new modality based on a more natural and intuitive way of interacting with a digital device. In our work, a user can interact with a system by performing a set of encoded hand gestures in front of a webcam. We designed a method that first classifies hand poses guided by a finger detection procedure, and then recognizes known gestures with a syntactic approach. To this purpose, we collected a sequence of hand poses over time, to build a linguistic gesture description. The known gestures are formalized using a generative grammar. Then, at runtime, a parser allows us to perform gesture recognition leveraging on the production rules of the grammar. As for finger detection, we propose a new method which starts from a distance transform of the hand region and iteratively scans such region according to the distance values moving from a fingertip to the hand palm. We experimentally validated our approach, showing both the hand pose classification and gesture recognition performances.

Paper Nr: 190
Title:

Natural Scene Character Recognition Without Dependency on Specific Features

Authors:

Muhammad Ali and Hassan Foroosh

Abstract: Current methods in scene character recognition heavily rely on discriminative power of local features, such as HoG, SIFT, Shape Contexts (SC), Geometric Blur (GB), etc. One of the problems with this approach is that the local features are rasterized in an ad hoc manner into a single vector perturbing thus spatial correlations that carry crucial information. To eliminate this feature dependency and associated problems, we propose a holistic solution as follows: For each character to be recognized, we stack a set of training images to form a 3-mode tensor. Each training tensor is then decomposed into a linear superposition of ‘k’ rank-1 matrices, whereby the rank-1 matrices form a basis, spanning solution subspace of the character class. For a test image to be classified, we obtain projections onto the pre-computed rank-1 bases of each class, and recognize it as the class for which inner-product of mixing vectors is maximized. We use challenging natural scene character datasets, namely Chars74K, ICDAR2003, and SVT-CHAR. We achieve results better than several baseline methods based on local features (e.g. HoG) and show leave-random-one-out-cross validation yield even better recognition performance, justifying thus our intuition of the importance of feature-independency and preservation of spatial correlations in recognition.

Paper Nr: 195
Title:

A Relevant Visual Feature Selection Approach for Image Retrieval

Authors:

Olfa Allani, Nedra Mellouli, Hajer Baazaoui Zghal, Herman Akdag and Henda Ben Ghzala

Abstract: Content-Based Image Retrieval approaches have been marked by the semantic gap (inconsistency) between the perception of the user and the visual description of the image. This inconsistency is often linked to the use of predefined visual features randomly selected and applied whatever the application domain. In this paper we propose an approach that adapts the selection of visual features to semantic content ensuring the coherence between them. We first design visual and semantic descriptive ontologies. These ontologies are then explored by association rules aiming to link semantic descriptor (a concept) to a set of visual features. The obtained feature collections are selected according to the annotated query images. Different strategies have been experimented and their results have shown an improvement of the retrieval task based on relevant feature selections.

Paper Nr: 210
Title:

Fast Rotation Invariant Object Detection with Gradient based Detection Models

Authors:

Floris De Smedt and Toon Goedemé

Abstract: Accurate object detection has been studied thoroughly over the years. Although these techniques have become very precise, they lack the capability to cope with a rotated appearance of the object. In this paper we tackle this problem in a two step approach. First we train a specific model for each orientation we want to cover. Next to that we propose the use of a rotation map that contains the predicted orientation information at a specific location based on the dominant orientation. This helps us to reduce the number of models that will be evaluated at each location. Based on 3 datasets, we obtain a high speed-up while still maintaining accurate rotated object detection.

Paper Nr: 237
Title:

Route Segmentation into Speed Limit Categories by using Image Analysis

Authors:

Philippe Foucher, Emmanuel Moebel and Pierre Charbonnier

Abstract: In this contribution, we address the problem of road sequence segmentation into speed limit categories, as perceived by the user. We propose an algorithm that is based on two processing steps. First, the images are classified independently using a standard random forest algorithm. Low-level and high-level approaches are proposed and compared. In the second phase, a sequential smoothing of the results using different filters is applied. An evaluation based on two databases of images with ground truth shows the pros and cons of the methods.

Paper Nr: 238
Title:

Various Fusion Schemes to Recognize Simulated and Spontaneous Emotions

Authors:

Sonia Gharsalli, Hélène Laurent, Bruno Emile and Xavier Desquesnes

Abstract: This paper investigates the performance of combining geometric features and appearance features with various fusion strategies in a facial emotion recognition application. Geometric features are extracted by a distance-based method; appearance features are extracted by a set of Gabor filters. Various fusion methods are proposed from two principal classes namely early fusion and late fusion. The former combines features in the feature space, the latter fuses both feature types in the decision space by a statistical rule or a classification method. Distance-based method, Gabor method and hybrid methods are evaluated on simulated (CK+) and spontaneous (FEEDTUM) databases. The comparison between methods shows that late fusion methods have better recognition rates than the early fusion method. Moreover, late fusion methods based on statistical rules perform better than the other hybrid methods for simulated emotion recognition. However in the recognition of spontaneous emotions, the statistical-based methods improve the recognition of positive emotions, while the classification-based method slightly enhances sadness and disgust recognition. A comparison with hybrid methods from the literature is also made.

Paper Nr: 241
Title:

Upper Body Detection and Feature Set Evaluation for Body Pose Classification

Authors:

Laurent Fitte-Duval, Alhayat Ali Mekonnen and Frédéric Lerasle

Abstract: This work investigates some visual functionalities required in Human-Robot Interaction (HRI) to evaluate the intention of a person to interact with another agent (robot or human). Analyzing the upper part of the human body which includes the head and the shoulders, we obtain essential cues on the person’s intention. We propose a fast and efficient upper body detector and an approach to estimate the upper body pose in 2D images. The upper body detector derived from a state-of-the-art pedestrian detector identifies people using Aggregated Channel Features (ACF) and fast feature pyramid whereas the upper body pose classifier uses a sparse representation technique to recognize their shoulder orientation. The proposed detector exhibits state-of-the-art result on a public dataset in terms of both detection performance and frame rate. We also present an evaluation of different feature set combinations for pose classification using upper body images and report promising results despite the associated challenges.

Paper Nr: 276
Title:

Automatic Road Segmentation of Traffic Images

Authors:

Chiung-Yao Fang, Han-Ping Chou, Jung-Ming Wang and Sei-Wang Chen

Abstract: Automatic road segmentation plays an important role in many vision-based traffic applications. It provides a priori information for preventing the interferences of irrelevant objects, activities, and events that take place outside road areas. The proposed road segmentation method consists of four major steps: background-shadow model generation and updating, moving object detection and tracking, background pasting, and road location. The full road surface is finally recovered from the preliminary one using a progressive fuzzy theoretic shadowed sets technique. A large number of video sequences of traffic scenes under various conditions have been employed to demonstrate the feasibility of the proposed road segmentation method.

Paper Nr: 280
Title:

Interest Area Localization using Trajectory Analysis in Surveillance Scenes

Authors:

D. P. Dogra, A. Ahmed and H. Bhaskar

Abstract: In this paper, a method for detecting and localizing interest areas in a surveillance scene by analyzing the motion trajectories of multiple interacting targets, is proposed. Our method is based on a theoretical model representing the importance distribution of different areas (represented as a rectangular blocks) present in a surveillance scene. The importance of each block is modeled as a function of the total time spent by multiple targets and their relative velocity whilst passing through the blocks. Extensive experimentation and statistical validation with empirical data has shown that the proposed method follows the process of the theoretical model. The accuracy of our method in localizing interest areas has been verified and its superiority demonstrated against baseline methods using the publicly available: CAVIAR, ViSOR datasets and a scenario-specific in-house surveillance dataset.

Paper Nr: 300
Title:

Video-to-video Pose and Expression Invariant Face Recognition using Volumetric Directional Pattern

Authors:

Vijayan Asari and Almabrok E. Essa

Abstract: Face recognition in video has attracted attention as a cryptic method of human identification in surveillance systems. In this paper, we propose an end-to-end video face recognition system, addressing a difficult problem of identifying human faces in video due to the presence of large variations in facial pose and expression, and poor video resolution. The proposed descriptor, named Volumetric Directional Pattern (VDP), is an oriented and multi-scale volumetric descriptor that is able to extract and fuse the information of multi frames, temporal (dynamic) information, and multiple poses and expressions of faces in input video to produce feature vectors, which are used to match with all the videos in the database. To make the approach computationally simple and easy to extend, key-frame extraction method is employed. Therefore, only the frames which contain important information of the video can be used for further processing instead of analysing all the frames in the video. The performance evaluation of the proposed VDP algorithm is conducted on a publicly available database (YouTube celebrities’ dataset) and observed promising recognition rates.

Paper Nr: 305
Title:

Robust Human Detection using Bag-of-Words and Segmentation

Authors:

Yuta Tani and Kazuhiro Hotta

Abstract: It is reported that Bag-of-Words (BoW) is effective to detect humans with large pose changes and occlusions in still images. BoW can make consistent representation even if a human has pose changes and occlusions. However, the conventional method represents all information within a bounding box as positive data. Since the bounding box is the rectangle including a human, background region is also included in BoW representation. The background region affects BoW representation and the detection accuracy decreases. Thus, in this paper, we propose to segment the region by GrabCut or Color Names, and the influence of background is reduced and we can obtain BoW histogram from only human region. By the comparison with the deformable part model (DPM) and conventional method using BoW, the effectiveness of our method is demonstrated.

Paper Nr: 307
Title:

Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding

Authors:

Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner and Joachim Denzler

Abstract: Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.

Paper Nr: 311
Title:

On the Influence of Superpixel Methods for Image Parsing

Authors:

Johann Strassburg, Rene Grzeszick, Leonard Rothacker and Gernot A. Fink

Abstract: Image parsing describes a very fine grained analysis of natural scene images, where each pixel is assigned a label describing the object or part of the scene it belongs to. This analysis is a keystone to a wide range of applications that could benefit from detailed scene understanding, such as keyword based image search, sentence based image or video descriptions and even autonomous cars or robots. State-of-the art approaches in image parsing are data-driven and allow for recognizing arbitrary categories based on a knowledge transfer from similar images. As transferring labels on pixel level is tedious and noisy, more recent approaches build on the idea of segmenting a scene and transferring the information based on regions. For creating these regions the most popular approaches rely on over-segmenting the scene into superpixels. In this paper the influence of different superpixel methods will be evaluated within the well known Superparsing framework. Furthermore, a new method that computes a superpixel-like over-segmentation of an image is presented that computes regions based on edge-avoiding wavelets. The evaluation on the SIFT Flow and Barcelona dataset will show that the choice of the superpixel method is crucial for the performance of image parsing.

Paper Nr: 312
Title:

Launch These Manhunts! Shaping the Synergy Maps for Multi-camera Detection

Authors:

Muhammad Owais Mehmood, Sébastien Ambellouis and Catherine Achard

Abstract: We present a method for multi-camera people detection based on the multi-view geometry. We propose to create a synergy map by the projection of foreground masks across all camera views on the ground plane and the planes parallel to the ground. This leads to significant values on locations where people are present, and also to a particular shape around these values. Moreover, a well-known ghost phenomena appears i.e. when these shapes corresponding to different persons are fused then the false detections are also generated. In this article, the first improvement is the robust detection of the candidate detection locations, namely keypoints, from the synergy map based on a watershed transform. Then, in order to reduce the false positives, mainly due to the ghost phenomena, we check if the particular shape, for an ideal person, is present or not. This shape, that is different for each location of the synergy map, is generated for each keypoint, assuming the presence of a person, and with the knowledge of the scene geometry. Finally, the real shape and the synthetic one are compared using a similarity measure that is similar to correlation. Another improvement proposed in this article is the use of unsupervised clustering, performed on the measures obtained at all the keypoints. It allows to automatically find the optimal threshold on the measure, and thus to decide about people detection. We have compared our method to the recent state-of-the-art techniques on a publicly available dataset and have shown that it reduces the detection errors.

Paper Nr: 313
Title:

Image Labeling by Integrating Global Information by 7 Patches and Local Information

Authors:

Takuto Omiya, Takahiro Ishida and Kazuhiro Hotta

Abstract: We propose an image labeling method by integrating the probabilities of local and global information. Many conventional methods put label to each pixel or region by using the features extracted from local regions and local contextual relationships between neighboring regions. However, labeling results tend to depend on a local viewpoint. To overcome this problem, we propose the image labeling method using not only local information but also global information. The probability by global information is estimated by KNearest Neighbor. In the experiments using the MSRC21 dataset, labeling accuracy is much improved by using global information.

Paper Nr: 327
Title:

Open Framework for Combined Pedestrian Detection

Authors:

Floris De Smedt and Toon Goedemé

Abstract: Pedestrian detection is a topic in computer vision of great interest for many applications. Due to that, a large amount of pedestrian detection techniques are presented in current literature, each one improving previous techniques. The improvement in accuracy in recent pedestrian detection, is commonly in combination with a higher computational requirement. Although, recently a technique was proposed to combine multiple detection algorithms to improve accuracy instead. Since the evaluation speed of this combination is dependent on the detection algorithm it uses, we provide an open framework that includes multiple pedestrian detection algorithms, and the technique to combine them. We show that our open implementation is superior on speed, accuracy and peak memory-use when compared to other publicly available implementations.

Paper Nr: 330
Title:

Bi-modal Face Recognition - How combining 2D and 3D Clues Can Increase the Precision

Authors:

Amel Aissaoui and Jean Martinet

Abstract: This paper introduces a bi-modal face recognition approach. The objective is to study how combining depth and intensity information can increase face recognition precision. In the proposed approach, local features based on LBP (Local Binary Pattern) and DLBP (Depth Local Binary Pattern) are extracted from intensity and depth images respectively. Our approach combines the results of classifiers trained on extracted intensity and depth cues in order to identify faces. Experiments are performed on three datasets: Texas 3D face dataset, BOSPHORUS 3D face dataset and FRGC 3D face dataset. The obtained results demonstrate the enhanced performance of the proposed method compared to mono-modal (2D or 3D) face recognition. Most processes of the proposed system are performed automatically. It leads to a potential prototype of face recognition using the latest RGB-D sensors, such as Microsoft Kinect or Intel RealSense 3D Camera.

Posters
Paper Nr: 9
Title:

Optimized Background Subtraction for Moving Camera Recordings

Authors:

Kahraman Ayyildiz and Stefan Conrad

Abstract: In this paper we expose how state of the art background subtraction models can be optimized for moving camera recordings. During our research work we found out that none of the commonly used background subtraction models is able to subtract the background accurately, when the camera is moving. Camera motion leads to motion areas in the background, whereas only motion of the foreground object is aimed to be detected. For the most part camera motion produces edges in the background. Therefore we developed an iterative approach to detect and remove edges from the background. Our experiments show the accuracy increase for four chosen state of the art background subtraction models. Moreover we compare our approach to camera motion resistant background subtraction models and optical flow applications.

Paper Nr: 33
Title:

Detection and Classification of Vehicles from Omnidirectional Videos using Temporal Average of Silhouettes

Authors:

Hakki Can Karaimer and Yalin Bastanlar

Abstract: This paper describes an approach to detect and classify vehicles in omnidirectional videos. The proposed classification method is based on the shape (silhouette) of the detected moving object obtained by background subtraction. Different from other shape based classification techniques, we exploit the information available in multiple frames of the video. The silhouettes extracted from a sequence of frames are combined to create an ‘average’ silhouette. This approach eliminates most of the wrong decisions which are caused by a poorly extracted silhouette from a single video frame. The vehicle types that we worked on are motorcycle, car (sedan) and van (minibus). The features extracted from the silhouettes are convexity, elongation, rectangularity, and Hu moments. The decision boundaries in the feature space are determined using a training set, whereas the performance of the proposed classification is measured with a test set. To ensure randomization, the procedure is repeated with the whole dataset split differently into training and testing samples. The results indicate that the proposed method of using average silhouettes performs better than using the silhouettes in a single frame.

Paper Nr: 51
Title:

On-line Hand Gesture Recognition to Control Digital TV using a Boosted and Randomized Clustering Forest

Authors:

Ken Yano, Takeshi Ogawa, Motoaki Kawanabe and Takayuki Suyama

Abstract: Behavior recognition has been one of the hot topics in the field of computer vision and its application. The popular appearance-based behavior classification methods often utilize sparse spatio-temporal features that capture the salient features and then use a visual word dictionary to construct visual words. Visual word assignments based on K-means clustering are very effective and behave well for general behavior classification. However, these pipelines often demand high computational power for the stages for low visual feature extraction and visual word assignment, and thus they are not suitable for real-time recognition tasks. To overcome the inefficient processing of K-means and the nearest neighbor approach, an ensemble approach is used for fast processing. For real-time recognition, an ensemble of random trees seems particularly suitable for visual dictionaries owing to its simplicity, speed, and performance. In this paper, we focus on the real-time recognition by utilizing a random clustering forest and verifying its effectiveness by classifying various hand gestures. In addition, we proposed a boosted random clustering forest so that training time can be successfully shortened with minimal negative impact on its recognition rate. For an application, we demonstrated a possible use of real-time gesture recognition by controlling a digital TV using hand gestures.

Paper Nr: 64
Title:

Robust Head-shoulder Detection using Deformable Part-based Models

Authors:

Enes Dayangac, Christian Wiede, Julia Richter and Gangolf Hirtz

Abstract: Conventional person detection algorithms lack of robustness, especially when the person is partially occluded. We propose thereby a robust head-shoulder detector in 2-D images using deformable part-based models. This detector can be used in a variety of applications such as people counting and person dwell time measurements. In experiments, we compare the head-shoulder detector with the full body detector quantitatively and analyze the robustness of the detector in realistic scenarios. In the results, we show that the model learned with our method outperforms other methods proposed in related work on an ambient assisted living application.

Paper Nr: 67
Title:

Comparison of Multi-shot Models for Short-term Re-identification of People using RGB-D Sensors

Authors:

Andreas Møgelmose, Chris Bahnsen and Thomas B. Moeslund

Abstract: This work explores different types of multi-shot descriptors for re-identification in an on-the-fly enrolled environment using RGB-D sensors. We present a full re-identification pipeline complete with detection, segmentation, feature extraction, and re-identification, which expands on previous work by using multi-shot descriptors modeling people over a full camera pass instead of single frames with no temporal linking. We compare two different multi-shot models; mean histogram and histogram series, and test them each in 3 different color spaces. Both histogram descriptors are assisted by a depth-based pruning step where unlikely candidates are filtered away. Tests are run on 3 sequences captured in different circumstances and lighting situations to ensure proper generalization and lighting/environment invariance.

Paper Nr: 71
Title:

Robust Method of Vote Aggregation and Proposition Verification for Invariant Local Features

Authors:

Grzegorz Kurzejamski, Jacek Zawistowski and Grzegorz Sarwas

Abstract: This paper presents method for analysis of the vote space created from the local features extraction process in a multi-detection system. The method is opposed to the classic clustering approach and gives a high level of control over the clusters composition for further verification steps. Proposed method comprises of the graphical vote space presentation, the proposition generation, the two-pass iterative vote aggregation and the cascade filters for verification of the propositions. Cascade filters contain all of the minor algorithms needed for effective object detection verification. The new approach does not have the drawbacks of the classic clustering approaches and gives a substantial control over process of detection. Method exhibits an exceptionally high detection rate with conjunction with a low false detection chance in comparison with alternative methods.

Paper Nr: 84
Title:

Pedestrian Re-identification - Metric Learning using Symmetric Ensembles of Categories

Authors:

Sateesh Pedagadi, James Orwell and Boghos Boghossian

Abstract: This paper presents a method for pedestrian re-identification, with two novel contributions. Firstly, each element in the target population is classified into one of n categories, using the expected accuracy of the re-identification estimate for this element. A metric for each category is separately trained using a standard (Local Fisher) method. To process a test set, each element is classified into one of the categories, and the corresponding metric is selected and used. The second contribution is the proposal to use a symmetrised distance measure. A standard procedure is to learn a metric using one set as the probe and the other set as the gallery. This paper generalises that procedure by reversing the labels to learn a different metric, and uses a linear (symmetrised) combination of the two. This can be applied in cases for which there are two distinct sets of observations, i.e. from two cameras, e.g. VIPER. Using this publicly available dataset, it is demonstrated how these contributions result in improved re-identification performance.

Paper Nr: 94
Title:

Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features

Authors:

Joan Sosa-Garcia and Francesca Odone

Abstract: This paper proposes a new approach for embedding spatial information into a Bag of Features image descriptor, primarily meant for image retrieval. The method is conceptually related to Spatial Pyramids but instead of requiring fixed and arbitrary sub-regions where to compute region-based BoF, it relies on an adaptive procedure based on multiple partitioning of the image in four quadrants (the NE, NW, SE, SW regions of the image). To obtain a compact and efficient description, all BoF related to the same quadrant are averaged, obtaining four descriptors which capture the dominant structures of the main areas of the image, and then concatenated. The computational cost of the method is the same as BoF and the size of the descriptor comparable to BoF, but the amount of spatial information retained is considerable, as shown in the experimental analysis carried out on benchmarks.

Paper Nr: 123
Title:

Detection and Characterization of the Sclera - Evaluation of Eye Gestural Reactions to Auditory Stimuli

Authors:

Alba Fernández, Joaquim de Moura, Marcos Ortega and Manuel G. Penedo

Abstract: Hearing assessment becomes a challenge for the audiologists when there are severe difficulties in the communication with the patient. This methodology is aimed at facilitating the audiological evaluation of the patient when cognitive decline, or other communication disorders, complicate the necessary interaction between patient and audiologist for the proper development of the test. In these cases, the audiologist must focus his attention on the detection of spontaneous and unconscious reactions that tend to occur in the eye region of the patient, expressed in most cases as changes in the gaze direction. In this paper, the tracking of the gaze direction is addressed by the study of the sclera, the white area of the eye. The movement is identified and characterized in order to determine whether or not a positive reaction to the auditory stimuli has occurred, so the hearing of the patients can be correctly assessed.

Paper Nr: 142
Title:

Improved Automatic Recognition of Engineered Nanoparticles in Scanning Electron Microscopy Images

Authors:

Stephen Kockentiedt, Klaus Tönnies, Erhardt Gierke, Nico Dziurowitz, Carmen Thim and Sabine Plitzko

Abstract: The amount of engineered nanoparticles produced each year has grown for some time and will grow in the coming years. However, if such particles are inhaled, they can be toxic. Therefore, to ensure the safety of workers, the nanoparticle concentrations at workplaces have to be measured. This is usually done by gathering the particles in the ambient air and then taking images using scanning electron microscopy. The particles in the images are then manually identified and counted. However, this task takes much time. Therefore, we have developed a system to automatically find and classify particles in these images (Kockentiedt et al., 2012). In this paper, we present an improved version of the system with two new classification feature types. The first are Haralick features. The second is a newly developed feature which estimates the counts of electrons detected by the scanning electron microscopy for each particle. In addition, we have added an algorithm to automatically choose the classifier type and parameters. This way, no expert is needed when the user wants to train the system to recognize a previously unknown particle type. The improved system yields much better results for two types of engineered particles and shows comparable results for a third type.

Paper Nr: 143
Title:

Localization of Visual Codes using Fuzzy Inference System

Authors:

Peter Bodnar and László Nyúl

Abstract: Usage of computer-readable visual codes is common in everyday life. The reading process of visual codes consists of two steps, localization and data decoding. This paper introduces a fast and robust method for localization of visual codes using Fuzzy Inference Systems based on simplistic, attentive features which can be optionally extended with cell histograms. Input image properties, assigned membership functions and efficiency of the system has been evaluated and discussed, showing FIS is a viable alternative for rapid QR code recognition in the image domain. The basic approach can be also used with lookup tables, that speeds up image cell evaluation and makes it ideal for embedded systems.

Paper Nr: 156
Title:

Shape Classification based on Skeleton-branch Distances

Authors:

Salih Arda Boluk and M. Fatih Demirci

Abstract: In recent decades, the need for efficient and effective image search from large databases has increased. In this paper, we present a novel shape matching framework based on structures that are likely to exist in similar shapes. After representing shapes as medial axis graphs, where vertices show skeletons and edges connect nearby skeletons, we determine the branches connecting or representing shape’s different parts. Using the shortest path distance from each vertex (skeleton) to each of the branches, we effectively retrieve similar shapes to the given query through a transportation-based distance function. A set of shape retrieval experiments including the comparison with two previous approaches demonstrate the proposed algorithm’s effectiveness and perturbation experiments present its robustness.

Paper Nr: 200
Title:

A Method for Detecting Long Term Left Baggage based on Heat Map

Authors:

Pasquale Foggia, Antonio Greco, Alessia Saggese and Mario Vento

Abstract: In this paper we propose a method able to identify the presence of objects remaining motionless in the scene for a long time by analyzing the videos acquired by surveillance cameras. Our approach combines a background subtraction strategy with an enhanced tracking algorithm. The main contributions of this paper are the following: first, spatio-temporal information is implicitly encoded into a heat map; furthermore, differently from state of the art methodologies, the background is not updated by only evaluating the instantaneous movement of the objects, but instead by taking into account their whole history encoded in the heat map. The experimentation has been carried out over two standard datasets and the obtained results have been compared with state of the art approaches, confirming the effectiveness and the robustness of our system.

Paper Nr: 209
Title:

In-plane Rotational Alignment of Faces by Eye and Eye-pair Detection

Authors:

M. F. Karaaba, O. Surinta, L. R. B. Schomaker and M. A. Wiering

Abstract: In face recognition, face rotation alignment is an important part of the recognition process. In this paper, we present a hierarchical detector system using eye and eye-pair detectors combined with a geometrical method for calculating the in-plane angle of a face image. Two feature extraction methods, the restricted Boltzmann machine and the histogram of oriented gradients, are compared to extract feature vectors from a sliding window. Then a support vector machine is used to accurately localize the eyes. After the eye coordinates are obtained through our eye detector, the in-plane angle is estimated by calculating the arc-tangent of horizontal and vertical parts of the distance between left and right eye center points. By using this calculated in-plane angle, the face is subsequently rotationally aligned. We tested our approach on three different face datasets: IMM, Labeled Faces in the Wild (LFW) and FERET. Moreover, to compare the effect of rotational aligning on face recognition performance, we performed experiments using a face recognition method using rotationally aligned and non-aligned face images from the IMM dataset. The results show that our method calculates the in-plane rotation angle with high precision and this leads to a significant gain in face recognition performance.

Paper Nr: 231
Title:

Real-time Human Age Estimation based on Facial Images using Uniform Local Binary Patterns

Authors:

Mohamed Selim, Shekhar Raheja and Didier Stricker

Abstract: This paper summarizes work done on real-time human age-group estimation based on frontal facial images. Our approach relies on detecting visible ageing effects, such as facial skin texture. This information is described using uniform Local Binary Patterns (LBP) and the estimation is done using the K-Nearest Neighbour classifier. In the current work, the system is trained using the FERET dataset. The training data is divided into five main age groups. Facial images captured in real-time using the Microsoft Kinect RGB data are used to classify the subjects age into one of the five different age groups. An accuracy of 81% was achieved on the live testing data. In the proposed approach, only facial regions affected by the ageing process are used in the face description. Moreover, the use of uniform Local Binary Patterns is evaluated in the context of facial description and age-group estimation. Results show that the uniform LBP depicts most of the facial texture information. That led to speeding up the entire process as the feature vector’s length has been reduced significantly, which optimises the process for real-time applications.

Paper Nr: 239
Title:

A Measure of Texture Directionality

Authors:

Manil Maskey and Timothy Newman

Abstract: Determining the directionality (i.e., orientedness) of textures is considered here. The work has three major components. The first component is a new method that indicates if a texture is directional or not. The new method considers both local and global aspects of a texture’s directionality. Local pixel intensity differences provide most of the local aspect. A frequency domain analysis provides most of the global aspect. The second component is a comparison study (based on the complete set of Brodatz textures) of the method versus the known, competing methods for determining texture directionality. The third component is a user study of the method’s utility.

Paper Nr: 249
Title:

Robust Face Recognition using Key-point Descriptors

Authors:

Soeren Klemm, Yasmina Andreu, Pedro Henriquez and Bogdan J. Matuszewski

Abstract: Key-point based techniques have demonstrated a good performance for recognition of various objects in numerous computer vision applications. This paper investigates the use of some of the most popular key-point descriptors for face recognition. The emphasis is put on the experimental performance evaluation of the key-point based face recognition methods against some of the most popular and best performing techniques, utilising both global (Eigenfaces) and local (LBP, Gabor filters) information extracted from the whole face image. Most of the results reported in literature so far, on the use of the key-points descriptors for the face recognition, concluded that the methods based on processing of the full face image have somewhat better performances than methods using exclusively key-points. The results reported in this paper suggest that the performance of the key-point based methods could be at least comparable to the leading “whole face” methods and are often better suited to handle face recognition in practical applications, as they do not require face image co-registration, and perform well even with significantly occluded faces.

Paper Nr: 253
Title:

Efficient Ground Truth Generation based on Spatio-temporal Properties for Lane Prediction Model

Authors:

Jun Shiwaku and Hiroki Takahashi

Abstract: Automobile safety has developed rapidly to prevent traffic accidents. Because of these techniques, traffic accidents are decreasing year by year. There are however more than 4,000 fatal traffic accident cases per year in Japan. Many lane detection systems are investigated. Those systems should be evaluated precisely and ground truth is generally used for evaluations. Ground truth generation is however very hard and time-consuming work. In this paper, an efficient ground truth generation method for reducing manual operations is proposed. Firstly, time slice images are obtained from an in-vehicle video. Secondly, meanderings of the vehicle against a lane are corrected by minimizing sum of squared differences of adjacent rows in the nearest time slice image. Then, lane markers in all time slice images are extracted by propagating lane marker information from the bottom time slice image to the upper one. Ground Truth is generated with contour information of the lane markers offline. Offline ground truth generation methods are often used for constructing the lane prediction model.

Paper Nr: 257
Title:

Efficient Hand Detection on Client-server Recognition System

Authors:

Victor Chernyshov

Abstract: In this paper, an efficient method for hand detection based on continuous skeletons approach is presented. It showcased real-time working speed and high detection accuracy (3-5% both FAR and FRR) on a large dataset (50 persons, 80 videos, 2322 frames). This makes the method suitable for use as a part of modern hand biometrics systems including mobile ones. Next, the study shows that continuous skeletons approach can be used as prior for object and background color models in segmentation methods with supervised learning (e.g. interactive segmentation with seeds or abounding box). This fact was successfully adopted to the developed client-server hand recognition system — both thumbnailed colored frame and extracted seeds are sent from Android application to server where Grabcut segmentation is performed. As a result, more qualitative hand shape features are extracted which is confirmed by several identification experiments. Finally, it is demonstrated that hand detection results can be used as a region of interest localization routine in the subsequent analysis of finger knuckle print. The future research will be devoted to extracting features from dorsal fingers surface and developing multi-modal classifier (hand shape and knuckle print features) for identification problem.

Paper Nr: 284
Title:

A Generic Probabilistic Graphical Model for Region-based Scene Interpretation

Authors:

Michael Ying Yang

Abstract: The task of semantic scene interpretation is to label the regions of an image and their relations into meaningful classes. Such task is a key ingredient to many computer vision applications, including object recognition, 3D reconstruction and robotic perception. The images of man-made scenes exhibit strong contextual dependencies in the form of the spatial and hierarchical structures. Modeling these structures is central for such interpretation task. Graphical models provide a consistent framework for the statistical modeling. Bayesian networks and random fields are two popular types of the graphical models, which are frequently used for capturing such contextual information. Our key contribution is the development of a generic statistical graphical model for scene interpretation, which seamlessly integrates different types of the image features, and the spatial structural information and the hierarchical structural information defined over the multi-scale image segmentation. It unifies the ideas of existing approaches, e. g. conditional random field and Bayesian network, which has a clear statistical interpretation as the MAP estimate of a multi-class labeling problem. We demonstrate experimentally the application of the proposed graphical model on the task of multi-class classification of building facade image regions.

Paper Nr: 293
Title:

Challenges and Limitations Concerning Automatic Child Pornography Classification

Authors:

Anton Moser, Marlies Rybnicek and Daniel Haslinger

Abstract: The huge volume of data to be analyzed in the course of child pornography investigations puts special demands on tools and methods for automated classification, often used by law enforcement and prosecution. The need for a clear distinction between pornographic material and inoffensive pictures with a large amount of skin, like people wearing bikinis or underwear, causes problems. Manual evaluation carried out by humans tends to be impossible due to the sheer number of assets to be sighted. The main contribution of this paper is an overview of challenges and limitations encountered in the course of automated classification of image data. An introduction of state-of-the-art methods, including face- and skin tone detection, face- and texture recognition as well as craniofacial growth evaluation is provided. Based on a prototypical implementation of feasible and promising approaches, the performance is evaluated, as well as their abilities and shortcomings.

Paper Nr: 325
Title:

Automatic Representation and Classifier Optimization for Image-based Object Recognition

Authors:

Fabian Bürger and Josef Pauli

Abstract: The development of image-based object recognition systems with the desired performance is – still – a challenging task even for experts. The properties of the object feature representation have a great impact on the performance of any machine learning algorithm. Manifold learning algorithms like e.g. PCA, Isomap or Autoencoders have the potential to automatically learn lower dimensional and more useful features. However, the interplay of features, classifiers and hyperparameters is complex and needs to be carefully tuned for each learning task which is very time-consuming, if it is done manually. This paper uses a holistic optimization framework with feature selection, multiple manifold learning algorithms, multiple classifier concepts and hyperparameter optimization to automatically generate pipelines for image-based object classification. An evolutionary algorithm is used to efficiently find suitable pipeline configurations for each learning task. Experiments show the effectiveness of the proposed representation and classifier tuning on several high-dimensional object recognition datasets. The proposed system outperforms other state-of-the-art optimization frameworks.

Paper Nr: 331
Title:

Identification of MIR-Flickr Near-duplicate Images - A Benchmark Collection for Near-duplicate Detection

Authors:

Richard Connor, Stewart MacKenzie-Leigh, Franco Alberto Cardillo and Robert Moss

Abstract: There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR-Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images.

Area 4 - Applications and Services

Full Papers
Paper Nr: 89
Title:

A Comprehensive Approach for Evaluation of Stereo Correspondence Solutions in Augmented Reality

Authors:

Bahar Pourazar and Oscar Meruvia-Pastor

Abstract: This paper suggests a comprehensive approach for the evaluation of stereo correspondence techniques based on the specific requirements of outdoor augmented reality systems. To this end, we present an evaluation model that integrates existing metrics of stereo correspondence algorithms with additional metrics that consider human factors that are relevant in the context of outdoor augmented reality systems. Our model provides modified metrics of stereoacuity, average outliers, disparity error, and processing time. These metrics have been modified to provide more relevant information with respect to the target application. We evaluate our model using two stereo correspondence methods: the OpenCV implementation of the semi-global block matching, also known as SGBM, which is a modified version of the semi-global matching by Hirschmuller; and our implementation of the solution by Mei et al., known as ADCensus. To test these methods, we use a sample of fifty-two image pairs selected from the Kitti stereo dataset, which depicts many situations typical of outdoor scenery. Experimental results show that our proposed model can provide a more detailed evaluation of both algorithms. Further, we discuss areas of improvement and suggest directions for future research.

Paper Nr: 106
Title:

Blind Watermarking using QIM and the Quantized SVD Domain based on the q-Logarithm Function

Authors:

Ta Minh Thanh and Keisuke Tanaka

Abstract: We propose new image blind watermarking by using both the quantization index modulation (QIM) technique and the quantized singular value decomposition (SVD) domain based on the q-logarithm function. In order to reduce the distortion of the embedded image, we employ the q-logarithm transform on the Y component of the original image after performing the SVD domain. We call this domain the q-SVD domain. In our proposed method, the tradeoff of robustness and quality can be controlled by a predefined quantization coefficient Q of QIM and a parameter q of the q-SVD domain. Several experiments are conducted to show the robustness of our proposed method against processing attacks and geometric attacks.

Paper Nr: 308
Title:

Handwritten Text Verification on Mobile Devices

Authors:

Nilson Donizete Guerin Júnior, Flávio de Barros Vidal and Bruno Macchiavello

Abstract: In this work we propose an online verification system for both signature and isolated cursive words. The proposed system is designed to be used in a mobile device with limited computational capability. In the proposed scenario it is assumed that the user will use either his fingertip or a passive pen, therefore no azimuth or inclination information is available. Isolated words have certain desirable traits that can be more useful on a mobile device. Different isolated words can be used to verify the user in different applications, combining a knowledge-based security systems (i.e. passwords) with a behavioral biometric verification system. The proposed technique can achieve 4:39% of equal error rate for signatures and 6:5% for isolated words.

Short Papers
Paper Nr: 19
Title:

A Multi-resolution Approach for Combining Visual Information using Nuclei Segmentation and Classification in Histopathological Images

Authors:

Harshita Sharma, Norman Zerbe, Daniel Heim, Stephan Wienert, Hans-Michael Behrens, Olaf Hellwich and Peter Hufnagl

Abstract: This paper describes a multi-resolution technique to combine diagnostically important visual information at different magnifications in H&E whole slide images (WSI) of gastric cancer. The primary goal is to improve the results of nuclei segmentation method for heterogeneous histopathological datasets with variations in stain intensity and malignancy levels. A minimum-model nuclei segmentation method is first applied to tissue images at multiple resolutions, and a comparative evaluation is performed. A comprehensive set of 31 nuclei features based on color, texture and morphology are derived from the nuclei segments. AdaBoost classification method is used to classify these segments into a set of pre-defined classes. Two classification approaches are evaluated for this purpose. A relevance score is assigned to each class and a combined segmentation result is obtained consisting of objects with higher visual significance from individual magnifications, thereby preserving both coarse and fine details in the image. Quantitative and visual assessment of combination results shows that they contain comprehensive and diagnostically more relevant information than in constituent magnifications.

Paper Nr: 61
Title:

Extraction of Homogeneous Regions in Historical Document Images

Authors:

Maroua Mehri, Pierre Héroux, Nabil Sliti, Petra Gomez-Krämer, Najoua Essoukri Ben Amara and Rémy Mullot

Abstract: To reach the objective of ensuring the indexing and retrieval of digitized resources and offering a structured access to large sets of cultural heritage documents, a raising interest to historical document image segmentation has been generated. In fact, there is a real need for automatic algorithms ensuring the identification of homogenous regions or similar groups of pixels sharing some visual characteristics from historical documents (i.e. distinguishing graphic types, segmenting graphical regions from textual ones, and discriminating text in a variety of situations of different fonts and scales). Indeed, determining graphic regions can help to segment and analyze the graphical part in historical heritage, while finding text zones can be used as a pre-processing stage for character recognition, text line extraction, handwriting recognition, etc. Thus, we propose in this article an automatic segmentation method for historical document images based on extraction of homogeneous or similar content regions. The proposed algorithm is based on using simple linear iterative clustering (SLIC) superpixels, Gabor filters, multi-scale analysis, majority voting technique, connected component analysis, color layer separation, and an adaptive run-length smoothing algorithm (ARLSA). It has been evaluated on 1000 pages of historical documents and achieved interesting results.

Paper Nr: 96
Title:

Data Acquisition in Cast Iron Foundries by Image Analysis

Authors:

Bernd Dreier, Florian Blas and Alexander Kostgeld

Abstract: The project IDA - Intelligent Data Acquisition is an interdisciplinary project in the fields of applied informatics and mechanical engineering. Its purpose is to collect process relevant information in industrial foundry processes like iron casting with handmade and mechanically made molds. Currently a lot of data sets are collected by hand. But these contain inaccuracies and errors and are not available digitally for further analysis. As a result it is not possible to evaluate them automatically. In particular it is not possible to conclude from a defect cast part to the whole set of its production parameters. We develop several procedures to collect these data sets and prepare them for computation in data analysis algorithms. The acquisition of digitally available data in IDA is done mostly by optical sensors. In this paper we describe our approach especially regarding marking and recognition of relevant objects. Furthermore we show first results in environments close to reality.

Paper Nr: 114
Title:

Real-time Detection and Recognition of Machine-Readable Zones with Mobile Devices

Authors:

Andreas Hartl, Clemens Arth and Dieter Schmalstieg

Abstract: Many security documents contain machine readable zones (MRZ) for automatic inspection. An MRZ is intended to be read by dedicated machinery, which often requires a stationary setup. Although MRZ information can also be read using camera phones, current solutions require the user to align the document, which is rather tedious. We propose a real-time algorithm for MRZ detection and recognition on off-the-shelf mobile devices. In contrast to state-of-the-art solutions, we do not impose position restrictions on the document. Our system can instantly produce robust reading results from a large range of viewpoints, making it suitable for document verification or classification. We evaluate the proposed algorithm using a large synthetic database on a set of off-the-shelf smartphones. The obtained results prove that our solution is capable of achieving good reading accuracy despite using largely unconstrained viewpoints and mobile devices.

Paper Nr: 222
Title:

On the Assessment of Segmentation Methods for Images of Mosaics

Authors:

Gianfranco Fenu, Nikita Jain, Eric Medvet, Felice Andrea Pellegrino and Myriam Pilutti Namer

Abstract: The present paper deals with automatic segmentation of mosaics, whose aim is obtaining a digital representation of the mosaic where the shape of each tile is recovered. This is an important step, for instance, for preserving ancient mosaics. By using a ground-truth consisting of a set of manually annotated mosaics, we objectively compare the performance of some existing recent segmentation methods, based on a simple error metric taking into account precision, recall and the error on the number of tiles. Moreover, we introduce some mosaic-specific hardness estimators (namely some indexes of how difficult is the task of segmenting a particular mosaic image). The results show that the only segmentation algorithm specifically designed for mosaics performs better than the general purpose algorithms. However, the problem of segmentation of mosaics appears still partially unresolved and further work is needed for exploiting the specificity of mosaics in designing new segmentation algorithms.

Paper Nr: 265
Title:

On Line Video Watermarking - A New Robust Approach of Video Watermarking based on Dynamic Multi-sprites Generation

Authors:

Ines Bayoudh, Saoussen Ben Jabra and Ezzeddine Zagrouba

Abstract: With the development of the emerging applications, watermarking methods require a new type of constraints. In fact, robustness to malicious attacks and processing time reduction have became two important constraints which must verify a watermarking approach. In this paper, a new scheme of digital watermarking for video security is proposed. This scheme is based on dynamic multiple-sprites. These last ones allow obtaining robustness in front of collusion which presents a dangerous attack for marked video. First, original video is divided into groups of images, and then a sprite will be generated from each group. Finally, the signature will be inserted in the low bits of the obtained sprite and marked frames will be generated from the marked sprites. Experimental results show that the proposed scheme is robust against several attacks such as collusion, compression, frame suppression and transposition, and geometric attacks. In more, processing time of watermarking is reduced.

Paper Nr: 286
Title:

Image-based Ear Biometric Smartphone App for Patient Identification in Field Settings

Authors:

Sarah Adel Bargal, Alexander Welles, Cliff R. Chan, Samuel Howes, Stan Sclaroff, Elizabeth Ragan, Courtney Johnson and Christopher Gill

Abstract: We present a work in progress of a computer vision application that would directly impact the delivery of healthcare in underdeveloped countries. We describe the development of an image-based smartphone application prototype for ear biometrics. The application targets the public health problem of managing medical records at on-site medical clinics in less developed countries where many individuals do not hold IDs. The domain presents challenges for an ear biometric system, including varying scale, rotation, and illumination. It was not clear which feature descriptors would work best for the application, so a comparative study of three ear biometric extraction techniques was performed, one of which was used to develop an iOS application prototype to establish the identity of an individual using a smartphone camera image. A pilot study was then conducted on the developed application to test feasibility in naturalistic settings.

Paper Nr: 287
Title:

Copyright Protection for 3D Printing by Embedding Information Inside Real Fabricated Objects

Authors:

Masahiro Suzuki, Piyarat Silapasuphakornwong, Kazutake Uehira, Hiroshi Unno and Youichi Takashima

Abstract: This paper proposes a technique that can protect the copyrights of digital content for 3D printers. It embeds the information on copyrights inside real objects fabricated with 3D printers by forming a fine structure inside the objects as a watermark. Information on copyrights is included in the content before data are input into the 3D printer. This paper also presents a technique that can non-destructively read out information from inside real objects by using thermography. We conducted experiments where we structured fine cavities inside the objects by disposition, which expressed binary code depending on whether or not the code was at a designated position. The results obtained from the experiments demonstrated that binary code could be read out successfully when we used micro-cavities with a horizontal size of 2 x 2 mm, and character information using ASCCI code could be embedded and read out correctly. These results demonstrated the feasibility of the technique we propose.

Paper Nr: 294
Title:

Automatic Identification of Mycobacterium tuberculosis in Ziehl-Neelsen Stained Sputum Smear Microscopy Images using a Two-stage Classifier

Authors:

Lucas de Assis Soares, Klaus Fabian Coco, Evandro Ottoni Teatini Salles and Saulo Bortolon

Abstract: This paper presents a method for the automatic identification of Mycobacterium tuberculosis in Ziehl-Neelsen stained sputum smear microscopy images, the most common bacilloscopy method in developing countries due to its low costs. The proposed method is divided in two stages: a projection of the original coloured image followed by the segmentation and the elimination of large and small segmented structures, and the classification of structures using neural networks and support vector machines. The segmentation of structures presents a loss of bacilli of 1.31 %, while the elimination of areas increases the loss to 14.39 %. The evaluation of the classification of structures is made using cross validation and a maximum sensitivity of 94.25 % is obtained. The presented method has a low computational cost, allying performance and efficiency.

Paper Nr: 315
Title:

Towards a 3D Pipeline for Monitoring and Tracking People in an Indoor Scenario using Multiple RGBD Sensors

Authors:

Konstantinos Amplianitis, Michele Adduci and Ralf Reulke

Abstract: Human monitoring and tracking has been a prominent research area for many scientists around the globe. Several algorithms have been introduced and improved over the years, eliminating false positives and enhancing monitoring quality. While the majority of approaches are restricted to the 2D and 2.5D domain, 3D still remains an unexplored field. Microsoft Kinect is a low cost commodity sensor extensively used by the industry and research community for several indoor applications. Within this framework, an accurate and fast-to-implement pipeline is introduced working in two main directions: pure 3D foreground extraction of moving people in the scene and interpretation of the human movement using an ellipsoid as a mathematical reference model. The proposed work is part of an industrial transportation research project whose aim is to monitor the behavior of people and make a distinction between normal and abnormal behaviors in public train wagons. Ground truth was generated by the OpenNI human skeleton tracker and used for evaluating the performance of the proposed method.

Paper Nr: 322
Title:

Watermark Embedding and Extraction Scheme Design by Two-stage Optimization for Illegal Replication Detection of Two-dimensional Barcodes

Authors:

Satoshi Ono, Kentaro Nakai, Takeru Maehara and Ryo Ikeda

Abstract: Recently, two-dimensional (2D) barcodes displayed on mobile phones are becoming used for authentification such as airplane boarding pass and online payment. Digital watermarking is promising technology to detect illegal replication or fabrication of such 2D codes. However, due to geometric distortions and/or interferences between patterns of camera sensors and screen pixels, watermark may not be sufficiently extracted from sub-bands which are used when embedding the watermark. This paper proposes a two-stage optimization method for watermark embedding and extracting scheme design. The proposed method discriminates frequency sub-bands between for embedding and extraction, whereas general watermarking schemes extract the watermark from the same subbands as while embedding. To evaluate actual image deterioration by digital-analogue conversion with mobile phone screen and camera, the proposed method uses actual mobile phones to obtain real images of valid and replicated 2D codes. Experimental result have shown that the proposed two-stage optimization of watermark embedding and extracting schemes improved watermark performance for 2D code replication detection.

Paper Nr: 348
Title:

A Technique for Computerised Brushwork Analysis

Authors:

Dmitry Murashov, Alexey Berezin and Ekaterina Ivanova

Abstract: In this work, the problem of computer-assisted attribution of fine-art paintings based on image analysis methods is considered. A technique for comparing artistic styles is proposed. Textural features represented by histograms of brushstroke ridge orientation and local neighborhood orientation are used in this work to characterize painter's artistic style. The procedures for feature extraction are developed and the parameters are chosen. The paintings are compared using three informative fragments segmented in a particular image. Selected image fragments are compared by information-theoretical dissimilarity measure. The technique is tested on images of portraits created in 17-19th centuries. The preliminary results of the experiments showed that the difference between portraits painted by the same artist is substantially smaller than one between portraits painted by different authors. The proposed technique may be used as a part of technological description of fine art paintings for attribution. The unsolved problems are pointed out and the directions of further research are outlined.

Posters
Paper Nr: 80
Title:

Depth Camera to Improve Segmenting People in Indoor Environments - Real Time RGB-Depth Video Segmentation

Authors:

Arnaud Boucher, Olivier Martinot and Nicole Vincent

Abstract: The paper addresses the problem of people extraction in a closed context in a video sequence including colour and depth information. The study is based on low cost depth captor included in products such as Kinect or Asus devices that contain a couple of cameras, colour and depth cameras. Depth cameras lack precision especially where a discontinuity in depth occur and some times fail to give an answer. Colour information may be ambiguous to discriminate between background and foreground. This made us use first depth information to achieve a coarse segmentation that is improved with colour information. Furthermore, color information is only used when a classification in two classes of fore/background pixels is clear enough. The developed method provides a reliable and robust segmentation and a natural visual rendering, while maintaining a real time processing.

Paper Nr: 97
Title:

PCB Recognition using Local Features for Recycling Purposes

Authors:

Christopher Pramerdorfer and Martin Kampel

Abstract: We present a method for detecting and classifying Printed Circuit Boards (PCBs) in waste streams for recycling purposes. Our method employs local feature matching and geometric verification to achieve a high open-set recognition performance under practical conditions. In order to assess the suitability of different local features in this context, we perform a comprehensive evaluation of established (SIFT, SURF) and recent (ORB, BRISK, FREAK, AKAZE) keypoint detectors and descriptors in terms of established performance measures. The results show that SIFT and SURF are outperformed by recent alternatives, and that most descriptors benefit from color information in the form of opponent color space. The presented method achieves a recognition rate of up to 100% and is robust with respect to PCB damage, as verified using a comprehensive public dataset.

Paper Nr: 126
Title:

A Posterization Strategy for the Registration of [123I]FP-CIT SPECT Brain Images

Authors:

Diego Salas-Gonzalez, Elmar W. Lang, Juan M. Gorriz and Javier Ramírez

Abstract: A fully automatic procedure to build a [123I]FP-CIT SPECT template in the MNI-space using only information from the source images is presented. This approach does not require the acquisition of patient-specific brain magnetic resonance image. This fully automatic procedure uses, firstly, the Otsu’s method to outline the source images; secondly, a threshold strategy to posterize the source images and the template and, lastly, an affine registration algorithm by the optimization of a square root of sum of squares cost function.

Paper Nr: 161
Title:

Selective Encryption of Medical Images

Authors:

Aissa Belmeguenai, Lakhdar Grouche and Rafik Djemili

Abstract: The transfer of image in the digital world plays a very important role, their security is an important issue, and encryption is one of the ways to ensure security. Few applications like medical image security needs to secure only selected region of the image. This work proposes a selective encryption approach for medical images. The approach based on Grain-128 which provides the facilities of implementation of selective image encryption and decryption. Several tests are done in order to prove the approach performance including visual tests, key sensitivity, entropy analysis and correlation coefficient analysis.

Paper Nr: 164
Title:

Semi-automatic Analysis of Huge Digital Nautical Charts of Coastal Aerial Images

Authors:

Matthias Vahl, Uwe von Lukas, Bodo Urban and Arjan Kuijper

Abstract: Geo-referenced aerial images are available in very high resolution. The automated production and updating of electronic nautical charts (ENC), as well as other products (e.g. thematic maps), from aerial images is a current challenge for hydrographic organizations. Often standard vision algorithms are not reliable enough for robust object detection in natural images. We thus propose a procedure that combines processing steps on three levels, from pixel (low-level) via segments (mid-level) to semantic information (high level). We combine simple linear iterative clustering (SLIC) as an efficient low-level algorithm with a classification based on texture features by supported vector machine (SVM) and a generalized Hough transformation (GHT) for detecting shapes on mid-level. Finally, we show how semantic information can be used to improve results from the earlier processing steps in the high-level step. As standard vision methods are typically much too slow for such huge-sized images and additionally geographical references must be maintained over the complete procedure, we present a solution to overcome these problems.

Paper Nr: 191
Title:

Securing Iris Images with a Robust Watermarking Algorithm based on Discrete Cosine Transform

Authors:

Mohammed A. M. Abdullah, S. S. Dlay and W. L. Woo

Abstract: With the expanded use of biometric systems, the security of biometric trait is becoming increasingly important. When biometric images are transmitted through insecure channels or stored as a raw data, they become subject to the risk of being stolen, faked and attacked. Hence, it is imperative that robust and reliable means of protection are implemented. Various methods of data protection are available and digital watermarking is one such techniques. This paper presents a new method for protecting the integrity of the iris images using a demographic text as a watermark. The watermark text is embedded in the middle band frequency region of the iris image by interchanging three middle band coefficients pairs of the Discrete Cosine Transform (DCT). Experimental results show that exchanging more than one pair will make middle band scheme more robust against malicious attack along with making it resistant to image manipulation such as compression. The results also illustrate that our watermarking algorithm does not introduce discernible decrease on iris image quality or biometric recognition performance.

Paper Nr: 204
Title:

The “Everywhere Switch” using a Projector and Camera

Authors:

Akira Nozaki and Katsuto Nakajima

Abstract: We propose a virtual remote control interface called the “Everywhere Switch” as an alternative to multiple remote controllers for many computerized home appliances. The interface consists of a group of virtual touch buttons projected near the user from a projector affixed to a pan-tilt mount just below the living-room ceiling. Methods to implement our system, including methods to search for a place to project the virtual touch buttons, to extract finger and shadow regions on the virtual button area and determine their ratio, and to detect touch operations, are described. We evaluated the precisions of the foreground extraction (finger or its shadow) and the segmentation of the finger and its shadow under three different brightness conditions (dim, semi-bright, and bright). The foreground extraction showed an F value of more than 0.97, and the finger/shadow segmentation showed an F value of about 0.8 in all tested brightness conditions.

Paper Nr: 204
Title:

The “Everywhere Switch” using a Projector and Camera

Authors:

Akira Nozaki and Katsuto Nakajima

Abstract: We propose a virtual remote control interface called the “Everywhere Switch” as an alternative to multiple remote controllers for many computerized home appliances. The interface consists of a group of virtual touch buttons projected near the user from a projector affixed to a pan-tilt mount just below the living-room ceiling. Methods to implement our system, including methods to search for a place to project the virtual touch buttons, to extract finger and shadow regions on the virtual button area and determine their ratio, and to detect touch operations, are described. We evaluated the precisions of the foreground extraction (finger or its shadow) and the segmentation of the finger and its shadow under three different brightness conditions (dim, semi-bright, and bright). The foreground extraction showed an F value of more than 0.97, and the finger/shadow segmentation showed an F value of about 0.8 in all tested brightness conditions.

Paper Nr: 216
Title:

A Simple Real-Time Eye Tracking and Calibration Approach for Autostereoscopic 3D Displays

Authors:

Christian Bailer, José Henriques, Norbert Schmitz and Didier Stricker

Abstract: In this paper a simple eye tracking and calibration approach for glasses-free stereoscopic 3D displays is presented that requires only one ordinary camera for tracking. The approach is designed for the use with parallax barrier based screens, but should with(out) limitations also be usable with other autostereoscopic screens. The robust eye tracking approach can easily be reimplemented and is based on well chosen search areas and parameters that were determined experimentally. Thanks to efficient eye position prediction delays can be eliminated that are normally a problem in realtime systems. The calibration approach allows us to calibrate the eye tracking camera to the 3D screen in a direct and simple way. Direct means that the calibration is realized pixel-wise and simple means that there is no need for a (complex) camera/screen model in contrast to many other calibration approaches. There is even no need to determine any camera parameter for calibration. The accuracy of the eye tracking approach and the eye position prediction is evaluated experimentally.

Paper Nr: 223
Title:

Computing Corpus Callosum as Biomarker for Degenerative Disorders

Authors:

Thomas Kovac, Sammy Rogmans and Frank Van Reeth

Abstract: The developed framework can automatically extract a plane with minimal corpus callosum area while simultaneously segmenting it. The method used, introduced by Ishaq, treats the corpus callosum area as a function of the plane extraction parameters and it uses deformable registration to generate a displacement field that can be used for the calculation of the corpus callosum area. Our registration framework is accelerated using CUDA, which enables researchers to benchmark huge amounts of data (patients) to test the hypothesis of the corpus callosum evolution as a biomarker for multiple degenerative disorders like e.g. Alzheimer disease and multiple sclerosis (MS).

Paper Nr: 263
Title:

Robust Watermarking Algorithm for 3D Multiresolution Meshes

Authors:

Ikbel Sayahi, Akram Elkefi, Mohamed Koubaa and Chokri Ben Amar

Abstract: Digital watermarking for 3D meshes is a means to copyright protection. In this paper, we propose a robust watermarking algorithm for 3D Meshes. We work on multiresolution, triangular and semi regular meshes having various sizes. Our algorithm is able to insert high amount of information in the field of multiresolution . For this reason, we apply a uniform scalling then a wavelet transform to the host mesh. Embedding step consist on modifying wavelet coefficients vector according to the bit to be inserted. These techniques do not generate a quality degradation of the mesh despite the important capacity adopted. Tests applied to various attacks have shown the robustness of our algorithm against rotation, translation, uniform scaling, Noise addition, smoothing, simplification and coordinate quantization. A comparison with literature revealed that we have a remarkable improvement over the published results.

Paper Nr: 279
Title:

Automatic Generation of Suitable DWT Sub-band - An Application to Brain MRI Classification

Authors:

Mohamed Mokhtar Bendib, Hayet Farida Merouani and Fatma Diaba

Abstract: This paper addresses the Brain MRI (Magnetic Resonance Imaging) classification problem from a new point of view. Indeed, most of the works reported in the literature follow the subsequent methodology: 1) Discrete Wavelet Transform (DWT) application, 2) sub-band selection, 3) feature extraction, and 4) learning. Consequently, those methods are limited by the information contained on the selected DWT outputs (sub-bands). This paper addresses the possibility of creating new suitable DWT sub-bands (by combining the classical DWT sub-bands) using Genetic Programming (GP) and a Random Forest (RF) classifier. These could be employed to efficiently address different classification scenarios (normal versus pathological, one versus all, and even multiclassification) as well as other automatic tasks.

Paper Nr: 324
Title:

An Augmented Environment for Command and Control Systems

Authors:

Alessandro Zocco, Lucio T. De Paolis, Lorenzo Greco and Cosimo L. Manes

Abstract: In the information age the ability to develop high-level situational awareness is essential for the success of any military operation. The power of network centric warfare comes from the linking of knowledgeable entities that allows information sharing and collaboration. By increasing the number of commanded platforms, the volume of the data that can be accessed grows exponentially. When this volume is displayed to an operator, there is an high risk to get a state of information overload and great care must be taken to make sure that what is provided is actually information and not noise. In this paper we propose a novel interaction environment that leverages the augmented reality technology to provide a digitally enhanced view of a real command and control table. The operator, equipped with an optical see-through head mounted display, controls the virtual context remaining connected to the real world. Technical details of the system are described together with the evaluation method. Twelve users evaluated the usability of the augmented environment comparing it with a wall-sized stereoscopic human computer interface and with a multi-screen system. Results showed the effectiveness of the proposed system in understanding complex electronic warfare scenarios and in supporting the decision-making process.

Paper Nr: 326
Title:

Visualization 3D Reconstruction - Volume Rendering of Mucus into Paranasal Sinuses

Authors:

Rodrigo Freitas Lima and Mauricio Marengoni

Abstract: This position paper explains our method for segmenting and volume rendering in computer tomography images. Our application is developed to reconstruct craniofacial objects in 3D visualization using insight toolkit and visualization toolkit frameworks. We intend to quantify volume rendering of mucus found in CT images and analyze the data which is an important tool in sinus disease treatment. Two algorithms were implemented in order to compare the results: an automatic segmentation and a manual method. Both solutions presented some issues that will be discussing in the next sections.

Area 5 - Motion, Tracking and Stereo Vision

Full Papers
Paper Nr: 44
Title:

Salient Parts based Multi-people Tracking

Authors:

Zhi Zhou, Yue Wang and Eam Khwang Teoh

Abstract: The saliency of an object or area is the quality to stand out from its neighborhood, it is an important component when we observe objects in the real world. The detection of saliency has been studied for years and has already been applied in many areas. In this paper, salient parts based framework is proposed for multi-people tracking. The framework follows tracking-by-detection approach and performs multi-people tracking from frame to frame. Salient parts are detected inside the human body area by finding high contrasts to their local neighborhood. Short-term tracking of salient parts are applied to help locating targets when the association with detections fails. And supporting models are on-line learnt to indicate the locations of targets based on the tracking results of salient parts. Experiments are carried out on PETS09 and Town Center datasets to validate the proposed method. The experimental result shows the promising performance of the proposed method and comparison with state-of-the-art works is provided.

Paper Nr: 60
Title:

Semantic Segmentation and Model Fitting for Triangulated Surfaces using Random Jumps and Graph Cuts

Authors:

Tilman Wekel and Olaf Hellwich

Abstract: Recent advances in 3D reconstruction allows to acquire highly detailed geometry from a set of images. The outcome of vision-based reconstruction methods is often oversampled, noisy, and without any higher-level information. Further processing such as object recognition, physical measurement, urban modeling, or rendering requires more advanced representations such as computer-aided design (CAD) models. In this paper, we present a global approach that simultaneously decomposes a triangulated surface into meaningful segments and fits a set of bounded geometric primitives. Using the theory of Markov chain Monte Carlo methods (MCMC), a random process is derived to find the solution that most likely explains the measured data. A data-driven approach based on the random sample consensus (RANSAC) paradigm is designed to guide the optimization process with respect to efficiency and robustness. It is shown, that graph cuts allow to incorporate model complexity and spatial regularization into the MCMC process. The algorithm has successfully been implemented and tested on various examples.

Paper Nr: 75
Title:

Depth-silhouette-Based Action Recognition for Real-time Interactions

Authors:

Long Meng, Chun Zhang and Li He

Abstract: In this paper we propose a novel and robust depth-silhouette-based approach to recognition of human actions for real-time interactions, efficiently exploiting rich depth information from depth map sequences. In the proposed approach, we introduce 3D shape context to describe the features of postures, use Gaussian mixtures to approximate the probability distributions of postures, develop action graphs to model actions, and consequently apply the Viterbi decoding algorithm to the recognition of actions. We show that the new 3D shape context, invariant to scale and locomotion, can mitigate normalisation errors. Our experiments demonstrate that the new approach outperforms the state-of-the-art algorithm on two datasets of real-time interaction actions.

Paper Nr: 90
Title:

Fast Adaptive Frame Preprocessing for 3D Reconstruction

Authors:

Fabio Bellavia, Marco Fanfani and Carlo Colombo

Abstract: This paper presents a new online preprocessing strategy to detect and discard ongoing bad frames in video sequences. These include frames where an accurate localization between corresponding points is difficult, such as for blurred frames, or which do not provide relevant information with respect to the previous frames in terms of texture, image contrast and non-flat areas. Unlike keyframe selectors and deblurring methods, the proposed approach is a fast preprocessing working on a simple gradient statistic, that does not require to compute complex time-consuming image processing, such as the computation of image feature keypoints, previous poses and 3D structure, or to know a priori the input sequence. The presented method provides a fast and useful frame pre-analysis which can be used to improve further image analysis tasks, including also the keyframe selection or the blur detection, or to directly filter the video sequence as shown in the paper, improving the final 3D reconstruction by discarding noisy frames and decreasing the final computation time by removing some redundant frames. This scheme is adaptive, fast and works at runtime by exploiting the image gradient statistic of the last few frames of the video sequence. Experimental results show that the proposed frame selection strategy is robust and improves the final 3D reconstruction both in terms of number of obtained 3D points and reprojection error, also reducing the computational time.

Paper Nr: 107
Title:

TextTrail - A Robust Text Tracking Algorithm In Wild Environments

Authors:

Myriam Robert-Seidowsky, Jonathan Fabrizio and Séverine Dubuisson

Abstract: In this paper, we propose TextTrail, a new robust algorithm dedicated to text tracking in uncontrolled environments (strong motion of camera and objects, partial occlusions, blur, etc.). It is based on a particle filter framework whose correction step has been improved. First, we compare some likelihood functions and introduce a new one which integrates tangent distance. We show that this likelihood has a strong influence on the text tracking performances. Secondly, we compare our tracker with a similar one and finally an example of application is presented. TextTrail has been tested on real video sequences and has proven its efficiency. In particular, it can track texts in complex situations starting from only one detection step without needing another one to reinitialize the model.

Paper Nr: 122
Title:

A Rolling Shutter Compliant Method for Localisation and Reconstruction

Authors:

Gaspard Duchamp, Omar Ait-Aider, Eric Royer and Jean-Marc Lavest

Abstract: Nowadays Rolling shutter CMOS cameras are embedded on a lot of devices. This type of cameras does not have its retina exposed simultaneously but line by line. The resulting distortions affect structure from motion methods developed for global shutter, like CCD cameras. The bundle adjustment method presented in this paper deals with rolling shutter cameras. We use a projection model which considers pose and velocity and need 6 more parameters for one view in comparison to the global shutter model. We propose a simplified model which only considers distortions due to rotational speed. We compare it to the global shutter model and the full rolling shutter one. The model does not need any condition on the inter-frame motion so it can be applied to fully independent views, even with global shutter images equivalent to a null velocity. Results with both synthetic and real images shows that the simplified model can be considered as a good compromise between a correct geometrical modelling of rolling shutter effects and the reduction of the number of extra parameters. Keywords

Paper Nr: 165
Title:

SPHERA - A Unifying Structure from Motion Framework for Central Projection Cameras

Authors:

Christiano Couto Gava and Didier Stricker

Abstract: As multi-view reconstruction techniques evolve, they accomplish to reconstruct larger environments. This is possible due to the availability of vast image collections of the target scenes. Within the next years it will be necessary to account for all available sources of visual information to supply future 3D reconstruction approaches. Accordingly, Structure from Motion (SfM) algorithms will need to handle such variety of image sources, i.e. perspective, wide-angle or spherical images. Although SfM for perspective and spherical images as well as catadioptric systems have already been studied, state of the art algorithms are not able to deal with these images simultaneously. To close this gap, we developed SPHERA, a unifying SfM framework designed for central projection cameras. It uses a sphere as underlying model, allowing single effective viewpoint vision systems to be treated in a unified way. We validate our framework with quantitative evaluations on synthetic spherical as well as real perspective, spherical and hybrid image datasets. Results show that SPHERA is a powerful framework to support upcoming algorithms and applications on large scale 3D reconstruction.

Paper Nr: 166
Title:

TVL1 Shape Approximation from Scattered 3D Data

Authors:

Eugen Funk, Laurence S. Dooley and Anko Boerner

Abstract: With the emergence in 3D sensors such as laser scanners and 3D cameras, large 3D point clouds can now be sampled from physical objects within a scene. The raw 3D samples delivered by these sensors however, do not contain any information about the environment the objects exist in, which means that further geometrical high-level modelling is essential. In addition, issues like sparse data measurements, noise, missing samples due to occlusion, and the inherently huge datasets involved in such representations makes this task extremely challenging. This paper addresses these issues by presenting a new 3D shape modelling framework for samples acquired from 3D sensor. Motivated by the success of nonlinear kernel-based approximation techniques in the statistics domain, existing methods using radial basis functions are applied to 3D object shape approximation. The task is framed as an optimization problem and is extended using non-smooth L1 total variation regularization. Appropriate convex energy functionals are constructed and solved by applying the Alternating Direction Method of Multipliers approach, which is then extended using Gauss-Seidel iterations. This significantly lowers the computational complexity involved in generating 3D shape from 3D samples, while both numerical and qualitative analysis confirms the superior shape modelling performance of this new framework compared with existing 3D shape reconstruction techniques.

Paper Nr: 179
Title:

Optimal Surface Normal from Affine Transformation

Authors:

Barath Daniel, Jozsef Molnar and Levente Hajder

Abstract: This paper deals with surface normal estimation from calibrated stereo images. We show here how the affine transformation between two projections defines the surface normal of a 3D planar patch. We give a formula that describes the relationship of surface normals, camera projections, and affine transformations. This formula is general since it works for every kind of cameras. We propose novel methods for estimating the normal of a surface patch if the affine transformation is known between two perspective images. We show here that the normal vector can be optimally estimated if the projective depth of the patch is known. Other non-optimal methods are also introduced for the problem. The proposed methods are tested both on synthesized data and images of real-world 3D objects.

Paper Nr: 197
Title:

A Convex Framework for High Resolution 3D Reconstruction

Authors:

Min Li, Changyu Diao, Song Lv and Dongming Lu

Abstract: We present a convex framework to acquire high resolution surfaces. It is typical to couple a structure-light setup and a photometric method to reconstruct a high resolution 3D surface. Previous methods often get stuck in a local minima for the appearance of occasional outliers. To address this issue, we develop a convex variational model by incorporating a total variation (TV) regularization term with a data term to generate the surface. Through relaxing the model to an equivalent high dimensional variational problem, we obtain a global minimizer of the proposed problem. Results on both synthetic and real-world data show an excellent performance by utilizing our convex variational model.

Paper Nr: 212
Title:

Real-time Accurate Pedestrian Detection and Tracking in Challenging Surveillance Videos

Authors:

Kristof Van Beeck and Toon Goedemé

Abstract: This paper proposes a novel approach for real-time robust pedestrian tracking in surveillance images. Such images are challenging to analyse since the overall image quality is low (e.g. low resolution and high compression). Furthermore often birds-eye viewpoint wide-angle lenses are used to achieve maximum coverage with a minimal amount of cameras. These specific viewpoints make it difficult - or even unfeasible - to directly apply existing pedestrian detection techniques. Moreover, real-time processing speeds are required. To overcome these problems we introduce a pedestrian detection and tracking framework which exploits and integrates these scene constraints to achieve excellent accuracy results. We performed extensive experiments on challenging real-life video sequences concerning both speed and accuracy. We show that our approach achieves excellent accuracy results while still meeting the stringent real-time demands needed for these surveillance applications, using only a single-core CPU implementation.

Paper Nr: 230
Title:

Bio-inspired Model for Motion Estimation using an Address-event Representation

Authors:

Luma Issa Abdul-Kreem and Heiko Neumann

Abstract: In this paper, we propose a new bio-inspired approach for motion estimation using a Dynamic Vision Sensor (DVS) (Lichtsteiner et al., 2008), where an event-based-temporal window accumulation is introduced. This format accumulates the activity of the pixels over a short time, i.e. several μs. The optic flow is estimated by a new neural model mechanism which is inspired by the motion pathway of the visual system and is consistent with the vision sensor functionality, where new temporal filters are proposed. Since the DVS already generates temporal derivatives of the input signal, we thus suggest a smoothing temporal filter instead of biphasic temporal filters that introduced by (Adelson and Bergen, 1985). Our model extracts motion information via a spatiotemporal energy mechanism which is oriented in the space-time domain and tuned in spatial frequency. To achieve balanced activities of individual cells against the neighborhood activities, a normalization process is carried out. We tested our model using different kinds of stimuli that were moved via translatory and rotatory motions. The results highlight an accurate flow estimation compared with synthetic ground truth. In order to show the robustness of our model, we examined the model by probing it with synthetically generated ground truth stimuli and realistic complex motions, e.g. biological motions and a bouncing ball, with satisfactory results.

Paper Nr: 261
Title:

Improving the Egomotion Estimation by Correcting the Calibration Bias

Authors:

Ivan Krešo and Siniša Šegvić

Abstract: We present a novel approach for improving the accuracy of the egomotion recovered from rectified stereoscopic video. The main idea of the proposed approach is to correct the camera calibration by exploiting the known groundtruth motion. The correction is described by a discrete deformation field over a rectangular superpixel lattice covering the whole image. The deformation field is recovered by optimizing the reprojection error of point feature correspondences in neighboring stereo frames under the groundtruth motion. We evaluate the proposed approach by performing leave one out evaluation experiments on a collection of KITTI sequences with common calibration parameters, by comparing the accuracy of stereoscopic visual odometry with original and corrected calibration parameters. The results suggest a clear and significant advantage of the proposed approach. Our best algorithm outperforms all other approaches based on two-frame correspondences on the KITTI odometry benchmark.

Paper Nr: 271
Title:

Evaluation of 3D Analysis Error Caused by SVP Approximation of Fisheye Lens

Authors:

Nobuyuki Kita

Abstract: We have been doing research about visual SLAM, 3D measurements and robot controls by using images obtained through fisheye lenses. Though fisheye lens is non-single viewpoint (NSVP), we established the 3D analysis methods based on single viewpoint (SVP) model. In this paper, we call such substitution SVP for NSVP as “SVP approximation” and evaluate 3D analysis errors caused by the SVP approximation in the case using two different types of fisheye lenses.

Paper Nr: 321
Title:

Edge based Foreground Background Estimation with Interior/Exterior Classification

Authors:

Gianni Allebosch, David Van Hamme, Francis Deboeverie, Peter Veelaert and Wilfried Philips

Abstract: Foreground background estimation is an essential task in many video analysis applications. Considerable improvements are still possible, especially concerning light condition invariance. In this paper, we propose a novel algorithm which attends to this requirement. We use modified Local Ternary Pattern (LTP) descriptors to find likely strong and stable “foreground gradient” locations. The proposed algorithm then classifies pixels as interior or exterior, using a shortest path algorithm, which proves to be robust against contour gaps.

Paper Nr: 329
Title:

How to Choose the Best Embedded Processing Platform for on-Board UAV Image Processing ?

Authors:

Dries Hulens, Jon Verbeke and Toon Goedemé

Abstract: For a variety of tasks, complex image processing algorithms are a necessity to make UAVs more autonomous. Often, the processing of images of the on-board camera is performed on a ground station, which severely limits the operating range of the UAV. Often, offline processing is used since it is difficult to find a suitable hardware platform to run a specific vision algorithm on-board the UAV. First of all, it is very hard to find a good trade-off between speed, power consumption and weight of a specific hardware platform and secondly, due to the variety of hardware platforms, it is difficult to find a suitable hardware platform and to estimate the speed the user’s algorithm will run on that hardware platform. In this paper we tackle those problems by presenting a framework that automatically determines the most-suited hardware platform for each arbitrary complex vision algorithm. Additionally, our framework estimates the speed, power consumption and flight time of this algorithm for a variety of hardware platforms on a specific UAV.We demonstrate this methodology on two real-life cases and give an overview of the present top processing CPU-based platforms for on-board UAV image processing.

Short Papers
Paper Nr: 21
Title:

Discrete Optimal View Path Planning

Authors:

Sebastian Haner and Anders Heyden

Abstract: This paper presents a discrete model of a sensor path planning problem, with a long-term planning horizon. The goal is to minimize the covariance of the reconstructed structures while meeting constraints on the length of the traversed path of the sensor. The sensor is restricted to move on a graph representing a discrete set of configurations, and additional constraints can be incorporated by altering the graph connectivity. This combinatorial problem is formulated as an integer semi-definite program, the relaxation of which provides both a lower bound on the objective cost and input to a proposed genetic algorithm for solving the original problem. An evaluation on synthetic data indicates good performance.

Paper Nr: 39
Title:

An Online Vision System for Understanding Complex Assembly Tasks

Authors:

Thiusius Rajeeth Savarimuthu, Jeremie Papon, Anders Glent Buch, Eren Erdal Aksoy, Wail Mustafa, Florentin Wörgötter and Norbert Krüger

Abstract: We present an integrated system for the recognition, pose estimation and simultaneous tracking of multiple objects in 3D scenes. Our target application is a complete semantic representation of dynamic scenes which requires three essential steps; recognition of objects, tracking their movements, and identification of interactions between them. We address this challenge with a complete system which uses object recognition and pose estimation to initiate object models and trajectories, a dynamic sequential octree structure to allow for full 6DOF tracking through occlusions, and a graph-based semantic representation to distil interactions. We evaluate the proposed method on real scenarios by comparing tracked outputs to ground truth part trajectories and compare the results to Iterative Closest Point and Particle Filter based trackers.

Paper Nr: 72
Title:

High-Speed and Robust Monocular Tracking

Authors:

Henning Tjaden, Ulrich Schwanecke, Frédéric Stein and Elmar Schömer

Abstract: In this paper, we present a system for high-speed robust monocular tracking (HSRM-Tracking) of active markers. The proposed algorithm robustly and accurately tracks multiple markers at full framerate of current high-speed cameras. For this, we have developed a novel, nearly co-planar marker pattern that can be identified without initialization or incremental tracking. The pattern also encodes a unique ID to identify different markers. The individual markers are calibrated semi-automatically, thus no time-consuming and error-prone manual measurement is needed. Finally we show that the minimal spatial structure of the marker can be used to robustly avoid pose ambiguities even at large distances to the camera. This allows us to measure the pose of each individual marker with high accuracy in a vast area.

Paper Nr: 86
Title:

Improvement of Phase Unwrapping Algorithms by Epipolar Constraints

Authors:

Johannes Köhler, Jan C. Peters, Tobias Nöll and Didier Stricker

Abstract: Phase unwrapping remains a challenging problem in the context of fast 3D reconstruction based on structured light, in particular for objects with complex geometry. In this paper we suggest to support phase unwrapping algorithms by additional constraints induced by the scanning setup. This is possible when at least two cameras are used, a likely case in practice. The constraints are generalized for two or more cameras by introducing the concept of a candidate map. We claim that this greatly reduces the complexity for any subsequent unwrapping algorithm, their performance is thereby strongly increased. We demonstrate this by exemplarily integrating the candidate map into a local path following and a global minimum norm unwrapping method.

Paper Nr: 117
Title:

Adaptive Tracking via Multiple Appearance Models and Multiple Linear Searches

Authors:

Tuan Nguyen and Tony Pridmore

Abstract: We introduce a unified tracker, named as a feature based multiple model tracker (FMM), which adapts to changes in target appearance by combining two popular generative models: templates and histograms, maintaining multiple instances of each in an appearance pool, and enhances prediction by utilising multiple linear searches. These search directions are sparse estimates of motion direction derived from local features stored in a feature pool. Given only an initial template representation of the target, the proposed tracker can learn appearance changes in a supervised manner and generate appropriate target motions without knowing the target movement in advance. During tracking, it automatically switches between models in response to variations in target appearance, exploiting the strengths of each model component. New models are added, automatically, as necessary. The effectiveness of the approach is demonstrated using a variety of challenging video sequences. Results show that this framework outperforms existing appearance based tracking frameworks.

Paper Nr: 135
Title:

Estimating the Best Reference Homography for Planar Mosaics From Videos

Authors:

Fabio Bellavia and Carlo Colombo

Abstract: This paper proposes a novel strategy to find the best reference homography in mosaics from video sequences. The reference homography globally minimizes the distortions induced on each image frame by the mosaic homography itself. This method is designed for planar mosaics on which a bad choice of the first reference image frame can lead to severe distortions after concatenating several successive homographies. This often happens in the case of underwater mosaics with non-flat seabed and no georeferential information available. Given a video sequence of an almost planar surface, sub-mosaics with low distortions of temporally close image frames are computed and successively merged according to a hierarchical clustering procedure. A robust and effective feature tracker using an approximated global position map between image frames allows us to build the mosaic also between locally close but not temporally consecutive frames. Sub-mosaics are successively merged by concatenating their relative homographies with another reference homography which minimizes the distortion on each frame of the fused image. Experimental results on challenging real underwater videos show the validity of the proposed method.

Paper Nr: 144
Title:

Robust and Fast Teat Detection and Tracking in Low-resolution Videos for Automatic Milking Devices

Authors:

Matthew van der Zwan and Alexandru Telea

Abstract: We present a system for detection and tracking of cow teats, as part of the construction of automatic milking devices (AMDs) in the dairy industry. We detail algorithmic solutions for the robust detection and tracking of teat tips in low-resolution video streams produced by embedded time-of-flight cameras, using a combination of depth images and point-cloud data. We present a visual analysis tool for the validation and optimization of the proposed techniques. Compared to existing state-of-the-art solutions, our method can robustly handle occlusions, variable poses, and geometries of the tracked shape, and yields a correct tracking rate for over 90% for tests involving real-world images obtained from an industrial AMD robot.

Paper Nr: 150
Title:

Horizontal Stereoscopic Display based on Homologous Points

Authors:

Bruno Eduardo Madeira, Carlos Frederico de Sá Volotão, Paulo Fernando Ferreira Rosa and Luiz Velho

Abstract: In this paper we establish the relation between camera calibration and the generation of horizontal stereoscopic images. After that, we introduce a new method that handles the problem of generating stereoscopic pairs without using calibration patterns, instead we use the correspondence of homologous points. The method is based on the optimization of a measure that we call Three-dimensional Interpretability Error, which has a simple geometric interpretation. We also prove that this optimization problem has four global minima, one of which corresponds to the desired solution. After that, we present techniques to initialize the problem avoiding the convergence to a wrong global minimum. Finally, we present some experimental results.

Paper Nr: 201
Title:

Omni-directional Reconstruction of Human Figures from Depth Data using Mirrors

Authors:

Tanwi Mallick, Rishabh Agrawal, Partha Pratim Das and Arun Kumar Majumdar

Abstract: In this paper we present a method for omni-directional 3D reconstruction of a human figure using a single Kinect while two mirrors provide the 360o view. We get three views from a single depth (and its corresponding RGB) frame – one is the real view of the human and other two are the virtual views generated through the mirrors. Using these three views our proposed system reconstruct 360o view of a human. The reconstruction system is robust as it can reconstruct the 360o view of any object (though it is particularly designed for human figures) from single depth and RGB images. These system overcomes the difficulties of synchronization and removes the problem of interference noise of multi-Kinect system. The methodology can be used for a nonKinect RGB-D camera and can be improved in several ways in future.

Paper Nr: 221
Title:

MAPTrack - A Probabilistic Real Time Tracking Framework by Integrating Motion, Appearance and Position Models

Authors:

Saikat Basu, Manohar Karki, Malcolm Stagg, Robert DiBiano, Sangram Ganguly and Supratik Mukhopadhyay

Abstract: In this paper, we present MAPTrack - a robust tracking framework that uses a probabilistic scheme to combine a motion model of an object with that of its appearance and an estimation of its position. The motion of the object is modelled using the Gaussian Mixture Background Subtraction algorithm, the appearance of the tracked object is enumerated using a color histogram and the projected location of the tracked object in the image space/frame sequence is computed by applying a Gaussian to the Region of Interest. Our tracking framework is robust to abrupt changes in lighting conditions, can follow an object through occlusions, and can simultaneously track multiple moving foreground objects of different types (e.g., vehicles, human, etc.) even when they are closely spaced. It is able to start tracks automatically based on a spatio-temporal filtering algorithm. A "dynamic" integration of the framework with optical flow allows us to track videos resulting from significant camera motion. A C++ implementation of the framework has outperformed existing visual tracking algorithms on most videos in the Video Image Retrieval and Analysis Tool (VIRAT), TUD, and the Tracking-Learning-Detection (TLD) datasets.

Paper Nr: 225
Title:

Fast and Accurate Refinement Method for 3D Reconstruction from Stereo Spherical Images

Authors:

Marek Solony, Evren Imre, Viorela Ila, Lukas Polok, Hansung Kim and Pavel Zemcik

Abstract: Realistic 3D models of the environment are beneficial in many fields, from natural or man-made structure inspection and volumetric analysis, to movie-making, in particular, special effects integration to natural scenes. Spherical cameras are becoming popular in environment modelling because they capture the full surrounding scene visible from the camera location as a consistent seamless image at once. In this paper, we propose a novel pipeline to obtain fast and accurate 3D reconstructions from spherical images. In order to have a better estimation of the structure, the system integrates a joint camera pose and structure refinement step. This strategy proves to be much faster, yet equally accurate, when compared to the conventional method, registration of a dense point cloud via iterative closest point (ICP). Both methods require an initial estimate for successful convergence. The initial positions of the 3D points are obtained from stereo processing of pair of spherical images with known baseline. The initial positions of the cameras are obtained from a robust wide-baseline matching procedure. The performance and accuracy of the 3D reconstruction pipeline is analysed through extensive tests on several indoor and outdoor datasets.

Paper Nr: 228
Title:

Coherent Selection of Independent Trackers for Real-time Object Tracking

Authors:

Salma Moujtahid, Stefan Duffner and Atilla Baskurt

Abstract: This paper presents a new method for combining several independent and heterogeneous tracking algorithms for the task of online single-object tracking. The proposed algorithm runs several trackers in parallel, where each of them relies on a different set of complementary low-level features. Only one tracker is selected at a given frame, and the choice is based on a spatio-temporal coherence criterion and normalised confidence estimates. The key idea is that the individual trackers are kept completely independent, which reduces the risk of drift in situations where for example a tracker with an inaccurate or inappropriate appearance model negatively impacts the performance of the others. Moreover, the proposed approach is able to switch between different tracking methods when the scene conditions or the object appearance rapidly change. We experimentally show with a set of Online Adaboost-based trackers that this formulation of multiple trackers improves the tracking results in comparison to more classical combinations of trackers. And we further improve the overall performance and computational efficiency by introducing a selective update step in the tracking framework.

Paper Nr: 298
Title:

Tracking your Detector Performance - How to Grow an Effective Training Set in Tracking-by-Detection Methods

Authors:

Liliana Lo Presti and Marco La Cascia

Abstract: In many tracking-by-detection approaches, a self-learning strategy is adopted to augment the training set with new positive and negative instances, and to refine the classifier weights. Previous works focus mainly on the learning algorithm and assume the detector is never wrong while classifying samples at the current frame; the most confident sample is chosen as the target, and the training set is augmented with samples selected in its surrounding area. A wrong choice of such samples may degrade the classifier parameters and cause drifting during tracking. In this paper, the focus is on how samples are chosen while retraining the classifier. A particle filtering framework is used to infer what sample set to add to the training set until some evidence about its correctness becomes available. In preliminary experiments, a simple learning algorithm together with the proposed method to build the training set outperforms our baseline tracking-by-detection algorithm.

Paper Nr: 346
Title:

KinFu MOT: KinectFusion with Moving Objects Tracking

Authors:

Michael Korn and Josef Pauli

Abstract: Using a depth camera, the KinectFusion algorithm permits tracking the camera poses and building a dense 3D reconstruction of the environment simultaneously in real-time. We present an extension to this algorithm that allows additionally the concurrent tracking and reconstruction of several moving objects within the perceived environment. This is achieved through an expansion of the GPU processing pipeline by several new functionalities. Our system detects moving objects from the registration results and it creates a separate storing volume for such objects. Each object and the background are tracked and reconstructed individually. Since the size of an object is uncertain at the moment of detection the storing volume grows dynamically. Moreover, a sliding reduction method stabilizes the tracking of objects with ambiguous registrations. We provide experimental results showing the effects of our modified matching strategy. Furthermore, we demonstrate the system’s ability to deal with three different challenging situations containing a moving robot.

Paper Nr: 351
Title:

Time-to-Contact in Scattering Media

Authors:

Wooseong Jeong, Laksmita Rahadianti, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a method for estimating time-to-contact in scattering media, such as fog. Images taken in the scattering media are unclear, and thus, we cannot detect appropriate geometric information from images for computing 3D information. In this paper, we consider not geometric information but photometric information such as observed intensity. In our method, we can eliminate the effect of scattering media and estimate the time-to-contact toward objects without any prior knowledge.

Posters
Paper Nr: 2
Title:

Improved Confidence Measures for Variational Optical Flow

Authors:

Maren Brumm, Jan Marek Marcinczak and Rolf-Rainer Grigat

Abstract: In the last decades variational optical flow algorithms have been intensively studied by the computer vision community. However, relatively few effort has been made to obtain robust confidence measures for the estimated flow field. As many applications do not require the whole flow field, it would be helpful to identify the parts of the field where the flow is most accurate. We propose a confidence measure based on the energy functional that is minimized during the optical flow calculation and analyze the performance of different data terms. For evaluation, 7 datasets of the Middlebury benchmark are used. The results show that the accuracy of the flow field can be improved by 53.3 % if points are selected according to the proposed confidence measure. The suggested method leads to an improvement of 35.2 % compared to classical confidence measures.

Paper Nr: 15
Title:

Temporal Selection of Images for a Fast Algorithm for Depth-map Extraction in Multi-baseline Configurations

Authors:

Dimitri Bulatov

Abstract: Obtaining accurate depth maps from multi-view configurations is an essential component for dense scene reconstruction from images and videos. In the first part of this paper, a plane sweep algorithm for sampling an energy function for every depth label and a dense set of points is presented. The distinctive features of this algorithm are 1) that despite a flexible model choice for the underlying geometry and radiometry, the energy function is performed by merely image operations instead of pixel-wise computations, and 2) that it can be easily manipulated by different terms, such as triangle-based smoothing term, or post-processed by one of the numerous state-of-the-art non-local energy minimization algorithms. The second contribution of this paper is a search for optimal ways to aggregate multiple observations in order to make the cost function more robust near the image border and in occlusions areas. Experiments with different data sets show the relevance of the proposed research, emphasize the potential of the algorithm, and provide ideas of future work.

Paper Nr: 16
Title:

Interest Point Detection based on the Extended Structure Tensor with a Scale Space Parameter

Authors:

Anders Hast

Abstract: Feature extraction is generally based on some kind of interest point detector, such as Harris, the determinant of the Hessian or difference of Gaussians, just to mention a few. The first two are based on tensors, while the latter computes the difference of two images in scale space. It is proposed herein to combine the structure tensor with a scale space parameter, yielding a 33 structure tensor. The determinant of this tensor can be simplified and it will be shown how two rather different detectors can be obtained from this new formulation. It is shown under what conditions they will be less invariant to scale and rotations than previous approaches. It will also be shown that they find different points and why this could be useful for making the matching faster and also how the subsequent RANSAC could be implemented in parallel, working on different sets of matches.

Paper Nr: 30
Title:

Towards Human Pose Semantic Synthesis in 3D based on Query Keywords

Authors:

Mo'taz Al-Hami and Rolf Lakaemper

Abstract: The work presented in this paper is part of a project to enable humanoid robots to build a semantic understanding of their environment adopting unsupervised self-learning techniques. Here, we propose an approach to learn 3-dimensional human-pose conformations, i.e. structural arrangements of a (simplified) human skeleton model, given only a minimal verbal description of a human posture (e.g. "sitting", "standing", "tree pose"). The only tools given to the robot are knowledge about the skeleton model, as well as a connection to the labeled images database "google images". Hence the main contribution of this work is to filter relevant results from an images database, given a human-pose specific query words, and to transform the information in these (2D) images into a 3D pose that is the most likely to fit the human understanding of the keywords. Steps to achieve this goal integrate available 2D human-pose estimators using still images, clustering techniques to extract representative 2D human skeleton poses, and the 3D-pose from 2D-pose estimation. We evaluate the approach using different query keywords representing different postures.

Paper Nr: 31
Title:

BarvEye - Bifocal Active Gaze Control for Autonomous Driving

Authors:

Ernst Dieter Dickmanns

Abstract: With the capability of autonomous driving for road vehicles coming closer to market introduction a critical consideration is given to the design parameters of the vision systems actually investigated. They are chosen for relatively simple applications on smooth surfaces. In the paper, this is contrasted with more demanding tasks human drivers will expect to be handled by autonomous systems in the longer run. Visual ranges of more than 200 m and simultaneous fields of view of at least 100º seem to be minimal requirements; potential viewing angles of more than 200º are desirable at road crossings and at traffic circles. Like in human vision, regions of high resolution may be kept small if corresponding gaze control is available. Highly dynamic active gaze control would also allow suppression of angular perturbations during braking or driving on rough ground. A 'Bifocal active road vehicle Eye' (BarvEye) is discussed as an efficient compromise for achieving these capabilities. For approaching human levels of performance, larger knowledge bases on separate levels for a) image features, b) objects / subjects, and c) situations in application domains have to be developed in connection with the capability of learning on all levels.

Paper Nr: 34
Title:

An Image-based Ensemble Kalman Filter for Motion Estimation

Authors:

Yann Lepoittevin, Isabelle Herlin and Dominique Béréziat

Abstract: This paper designs an Image-based Ensemble Kalman Filter (IEnKF), whose components are defined only from image properties, to estimate motion on image sequences. The key elements of this filter are, first, the construction of the initial ensemble, and second, the propagation in time of this ensemble on the studied temporal interval. Both are analyzed in the paper and their impact on results is discussed with synthetic and real data experiments. The initial ensemble is obtained by adding a Gaussian vector field to an estimate of motion on the first two frames. The standard deviation of this normal law is computed from motion results given by a set of optical flow methods of the literature. It describes the uncertainty on the motion value at initial date. The propagation in time of the ensemble members relies on the following evolution laws: transport by velocity of the image brightness function and Euler equations for the motion function. Shrinking of the ensemble is avoided thanks to a localization method and the use of observation ensembles, both techniques being defined from image characteristics. This Image-based Ensemble Kalman Filter is quantified on synthetic experiments and applied on traffic and meteorological images.

Paper Nr: 35
Title:

A Self-adaptive Likelihood Function for Tracking with Particle Filter

Authors:

Séverine Dubuisson, Myriam Robert-Seidowsky and Jonathan Fabrizio

Abstract: The particle filter is known to be efficient for visual tracking. However, its parameters are empirically fixed, depending on the target application, the video sequences and the context. In this paper, we introduce a new algorithm which automatically adjusts online two majors of them: the correction and the propagation parameters. Our purpose is to determine, for each frame of a video, the optimal value of the correction parameter and to adjust the propagation one to improve the tracking performance. On one hand, our experimental results show that the common settings of particle filter are sub-optimal. On another hand, we prove that our approach achieves a lower tracking error without needing to tune these parameters. Our adaptive method allows to track objects in complex conditions (illumination changes, cluttered background, etc.) without adding any computational cost compared to the common usage with fixed parameters.

Paper Nr: 103
Title:

AR Visualization of Thermal 3D Model by Hand-held Cameras

Authors:

Kazuki Matsumoto, Wataru Nakagawa, Hideo Saito, Maki Sugimoto, Takashi Shibata and Shoji Yachida

Abstract: In this paper, we propose a system for AR visualization of thermal distribution on the environment. Our system is based on color 3D model and thermal 3D model of the target scene generated by KinectFusion using a thermal camera coupled with an RGB-D camera. In off-line phase, Viewpoint Generative Learning (VGL) is applied to the colored 3D model for collecting its stable keypoints descriptors. Those descriptors are utilized in camera pose initialization at the start of on-line phase. After that, our proposed camera tracking which combines frame-to-frame camera tracking with VGL based tacking is performed for accurate estimation of the camera pose. From estimated camera pose, the thermal 3D model is finally superimposed to current mobile camera view. As a result, we can observe the wide area thermal map from any viewpoint. Our system is applied for a temperature change visualization system with a thermal camera coupled with an RGB-D camera and it is also enables the smartphone to interactively display thermal distribution of a given scene.

Paper Nr: 118
Title:

WaPT - Surface Normal Estimation for Improved Template Matching in Visual Tracking

Authors:

Nagore Barrena, Jairo Roberto Sánchez and Alejandro García-Alonso

Abstract: This paper presents an algorithm which is an improvement of the template matching technique. The main goal of the algorithm is to match 3D points with their corresponding 2D points in the images. In the presented method, each 3D point is enriched with a normal vector that approximates the orientation of the surface where the 3D point is lying. This normal improves the transfer process of patches providing more precise warped patches, because perspective deformation is taken into account. The results obtained with the proposed transfer method confirm that matching is more accurate than traditional approaches.

Paper Nr: 127
Title:

SIFT-EST - A SIFT-based Feature Matching Algorithm using Homography Estimation

Authors:

Arash Shahbaz Badr, Luh Prapitasari and Rolf-Rainer Grigat

Abstract: In this paper, a new feature matching algorithm is proposed and evaluated. This method makes use of features that are extracted by SIFT and aims at reducing the processing time of the matching phase of SIFT. The idea behind this method is to use the information obtained from already detected matches to restrict the range of possible correspondences in the subsequent matching attempts. For this purpose, a few initial matches are used to estimate the homography that relates the two images. Based on this homography, the estimated location of the features of the reference image after transformation to the test image can be specified. This information is used to specify a small set of possible matches for each reference feature based on their distance to the estimated location. The restriction of possible matches leads to a reduction of processing time since the quadratic complexity of the one-to-one matching is undermined. Due to the restrictions of 2D homographies, this method can only be applied to images that are related by pure-rotational transformations or images of planar object.

Paper Nr: 173
Title:

UAV Autonomous Motion Estimation Methodologies

Authors:

Anand Abhishek and K. S. Venkatesh

Abstract: Unmanned aerial vehicle(UAV) are widely used for commercial and military purposes. Various computer vision based methodologies are used for aid in autonomous navigation. We have presented an implicit extended square root Kalman filter based approach to estimate the states of an UAV using only onboard camera which can be either used alone or assimilated with the IMU output to enable reliable, accurate and robust navigation. Onboard camera present information rich sensor alternative for obtaining useful information form the craft, with the added benefits of being light weight, small and no extra payload. The craft system model is based on differential epipolar constraint with planar constraint assuming the scene is slowly moving. The optimal state is then estimated using current measurement and defined the system model. Pitch and roll is also estimated from above formulations. The algorithms results are compared with real time data collected from the IMU.

Paper Nr: 192
Title:

Accurate 3D Reconstruction from Naturally Swaying Cameras

Authors:

Yasunori Nishioka, Fumihiko Sakaue, Jun Sato, Kazuhisa Ishimaru, Naoki Kawasaki and Noriaki Shirai

Abstract: In this paper, we propose a method for reconstructing 3D structure accurately from images taken by unintentionally swaying cameras. In this method, image super-resolution and 3D reconstruction are achieved simultaneously by using series of motion blur images. In addition, we utilize coded exposure in order to achieve stable super resolution. Furthermore, we show efficient stereo camera arrangement for stable 3D reconstruction from swaying cameras. The experimental results show that the proposed method can reconstruct 3D shape very accurately.

Paper Nr: 255
Title:

Detecting Objects Thrown over Fence in Outdoor Scenes

Authors:

Róbert Csordás, László Havasi and Tamás Szirányi

Abstract: We present a new technique for detecting objects thrown over a critical area of interest in a video sequence made by a monocular camera. Our method was developed to run in real time in an outdoor surveillance system. Unlike others, we use an optical flow based motion detection and tracking system to detect the object’s trajectories and for parabolic path search. The system successfully detects thrown objects of various sizes and is unaffected by the rotation of the objects.

Paper Nr: 268
Title:

Vehicle Tracking and Origin-destination Counting System for Urban Environment

Authors:

Jean Carlo Mendes, Andrea Gomes Campos Bianchi and Álvaro R. Pereira Júnior

Abstract: Automatic counting of vehicles and estimation of origin-destination tables have become potential applications for traffic surveillance in urban areas. In this work we propose an alternative to Optical Flow tracking to segment and track vehicles with scale/size variation during movement known as adaptive size tracking problem. The performance evaluation of our proposed framework has been carried out on both public and privacy data sets. We show that our approach achieves better origin destination tables for urban traffic than the Optical Flow method which is used as baseline.

Paper Nr: 296
Title:

An Experimental Study of Visual Tracking in Surgical Applications

Authors:

Jiawei Zhou and Shahram Payandeh

Abstract: Tracking surgical tools in mono-endoscopic surgery can offer a conventional (non-robotics) application of this type of procedure a versatile surgeon-computer interface. For example, tracking the surgical tools can enable the surgeon to interact with the overlaid menu which allows them to have access to medical information of the patient. Another example is the capability that such tracking can offer where the surgeon through surgical tool can manually register per-operative images of the patient approach on the surgical site. This paper presents the results of some of the tracking schemes which we have explored and analysed as a part of our studies. Tracking framework based on both Gaussian and non-Gaussian framework are explored and compared. Although majority of the approaches can offer a robust performance when used in the real surgical scene, the method based on Particle Filter is found to have a better success rate. Based on these experimental results, the paper also offers some discussions and suggestions for future research.

Paper Nr: 301
Title:

Structure from Motion in the Context of Active Scanning

Authors:

Johannes Köhler, Tobias Nöll, Norbert Schmitz, Bernd Krolla and Didier Stricker

Abstract: In this paper, we discuss global device calibration based on Structure from Motion (SfM) (Hartley and Zisserman, 2004) in the context of active scanning systems. Currently, such systems are usually pre-calibrated once and partial, unaligned scans are then registered using mostly variants of the Iterative Closest Point (ICP) algorithm (Besl and McKay, 1992). We demonstrate, that SfM-based registration from visual features yields a significantly higher precision. Moreover, we present a novel matching strategy that reduces the influence of an object’s visual features, which can be of low quality, and introduce novel hardware that allows to apply SfM to untextured objects without visual features.

Paper Nr: 309
Title:

Crowd Event Detection in Surveillance Video - An Approach based on Optical Flow High-frequency Feature Analysis

Authors:

Ana Paula G. S. de Almeida, Vitor de Azevedo Faria and Flavio de Barros Vidal

Abstract: Many real-world actions occur often in crowded and dynamic environments. Video surveillance application uses crowd analysis for automatic detection of anomalies and alarms. In this position paper we propose a crowd event detection technique based on optical flow high-frequency feature analysis to build a robust and stable descriptor. The proposed system is designed to be used in surveillance videos to automatic violence acts detection. Preliminary results show that the proposed methodology is able to perform the detection process with success and allows the development of an efficient recognition stage in further works.

Paper Nr: 310
Title:

Shape-from-Silhouettes Algorithm with Built-in Occlusion Detection and Removal

Authors:

Maarten Slembrouck, Dimitri Van Cauwelaert, Peter Veelaert and Wilfried Philips

Abstract: Occlusion and inferior foreground/background segmentation still poses a big problem to 3D reconstruction from a set of images in a multi-camera system because it has a destructive nature on the reconstruction if one or more of the cameras do not see the object properly. We propose a method to obtain a 3D reconstruction which takes into account the possibility of occlusion by combining the information of all cameras in the multicamera setup. The proposed algorithm tries to find a consensus of geometrical predicates that most cameras can agree on. The results show a performance with an average error lower than 2cm on the centroid of a person in case of perfect input silhouettes. We also show that tracking results are significantly improved in a room with a lot of occlusion.

Paper Nr: 334
Title:

A New 2-Point Absolute Pose Estimation Algorithm under Plannar Motion

Authors:

Sung-In Choi and Soon-Yong Park

Abstract: Several motion estimation algorithms, such as n-point and perspective n-point (PnP) have been introduced over the last few decades to solve relative and absolute pose estimation problems. Since the n-point algorithms cannot decide the real scale of robot motion, the PnP algorithms are often addressed to find the absolute scale of motion. This paper introduces a new PnP algorithm which uses only two 3D-2D correspondences by considering only planar motion. Experiment results prove that the proposed algorithm solves the absolute motion in real scale with high accuracy and less computational time compared to previous algorithms.