Institut für Raumbezogene Informations- und Messtechnik
Hochschule Mainz - University of Applied Sciences

Knowledge-based object Detection in Image and Point cloud (KnowDIP)

The KnowDIP project aims at the conception of a framework for an automatic object detection in unstructured and heterogeneous data. This framework uses a representation of human knowledge in order to improve the flexibility, the accuracy, and the efficiency of the processing.
Motivation und Ziele: 

Object recognition is a vast field of research, which is applied to different types of data, such as images, point clouds, and videos. Many strategies and tools have been developed to achieve object recognition. However, existing systems are mainly specialized in one type of data. One reason for this is that the use of algorithms depends on the type of data. For example, the RANSAC algorithm can be applied in images and point clouds, but its implementation differs between a 2D and a 3D application.

Besides, object recognition is still tricky in large-scale data due to internal heterogeneity (such as non-uniform density in a point cloud), which requires adaptation of the algorithm and its parameter. Therefore, the KnowDIP project aims at automatically and dynamically adapting the object detection process.

Such adaptation must consider the type and specificity of the data but also the characteristics of the target object.  The objective is to use the knowledge and the addition of meaning to allow such an adaptation.  The knowledge is expressed using semantic technologies as an ontology that allows the efficient representation of human knowledge.

In the field of object recognition, the necessary knowledge is the characteristics of data, objects, and algorithms that can be applied to the data.

For example, to process a point cloud, it is necessary to know that it is 3D data and to know the algorithms that can be applied to the 3D data and that are relevant to the target object.

Therefore, the project aims at creating a framework that can understand and use knowledge to guide the detection of the object to improve its accuracy and adaptability.


The KnowDIP framework consists of five modules: a knowledge module, a reasoning module, a self-learning module, an algorithm toolbox, and a bridge between the knowledge base and the algorithm toolbox. The knowledge module is composed of a SPARQL interpreter and an ontology that represents knowledge about objects, data, algorithms, and the acquisition process (see figure 1).

The toolbox contains algorithms for object recognition processing. The reasoning module uses the vocabulary and information defined in the knowledge module to determine from the toolbox the set of algorithms required for the requested task (depending mainly on the target object and the data provided by the user) and to determine how to use and combine these algorithms. This reasoning module uses knowledge of data and objects to select and configure the algorithms efficiently. The algorithms are then executed through the use of build-in in SPARQL queries that act as a bridge between the knowledge base and the algorithm toolbox. The execution of the algorithms then enriches the knowledge base with their results. Knowledge-based reasoning enhances the logical descriptions of the objects, which then allows for identifying and classifying the objects in the data.  Besides, the system contains a learning module that adapts processing to the diversity of objects and data characteristics. Figure 2 illustrates the structure of the framework.

The framework was used to detect objects in a point cloud of cultural heritage points representing Ephesos provided by the Austrian Archaeological Institute (ÖAI) and the Römisch-Germanisches Zentralmuseum (RGZM). This application case aimed at detecting a watermill in this point cloud (see Figure 3). The detection process starts with the identification of the floor, then the detection of walls. The result of the detection of these two objects is added to the knowledge base to detect rooms.

The main activity of the year 2019 was the creation of the Knowledge-Based Self-Learning process. The complete framework has been applied for further application contexts. It has been used to detect rooms in the "2D-3D-Semantic dataset" of Stanford1, see figure 4. This application case has allowed for comparing the performance of the framework with other approaches. It has also allowed for quantifying the improvement brought by the knowledge-based self-learning. The results have been published in the article entitled: “Automatic Detection of Objects in 3D Point Clouds Based on Exclusively Semantic Guided Processes”2.

1. Dataset:, [Armeni et al., 2017] Armeni, I., Sax, A., Zamir, A. R., and Savarese, S. Joint 2D3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (2017). URL Visited on 2019-08-1, 1702.01105.

2. Ponciano, J.-J.; Trémeau, A.; Boochs, F. Automatic Detection of Objects in 3D Point Clouds Based on Exclusively Semantic Guided Processes. ISPRS Int. J. Geo-Inf. 2019, 8, 442,


Through the reasoning process and the analysis of topological links between the detected objects, the results obtained go beyond machine learning approaches3,4 (see Figure 5). The results underline the relevance of the framework for object detection in various contexts. Besides, the learning process allows a clear improvement in the quality of the results, as shown in figure 6.

3. [Armeni et al., 2016] Armeni, Iro, Sener, Ozan, Zamir, Amir R., Jiang, Helen, Brilakis, Ioannis, Fischer, Martin, and Savarese, Silvio. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2016)

4. [Bobkov et al., 2017] Bobkov, Dmytro, Kiechle, Martin, Hilsenbeck, Sebastian, and Steinbach, Eckehard. Room segmentation in 3d point clouds using anisotropic potential ïŹelds. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pages 727–732. IEEE (2017).


Zeitraum:     01.05.2016 - 30.09.2019
  • a) 3D Point Cloud, b) Object detection
, , , , ,


Die Verteidigung seiner Dissertation Mitte November an der UniversitĂ€t Saint-Etienne markierte fĂŒr Jean-Jacques Ponciano den erfolgreichen Abschluss seiner

Mit zwei VortrÀgen war das i3mainz auf den 18. Oldenburger 3D-Tagen vom 6. bis 7. Februar 2018 vertreten.

Jean-Jacques Ponciano stellte einige

Verwandte Projekte

Im Zuge dieser Sonderausstellung „Die Bilderwelt der Kelten“ sind keltische Kleinplastiken aus Bronze mittels hochprĂ€ziser optischer 3D-Messtechnik in Kombination mit hochauflösend

Im Zuge der Publikation und Analyse der 3D-Daten eines unberĂŒhrten Gruftkomplexes zu Qatna ist die Bereitstellung der Daten anhand einer plattformunabhĂ€ngigen Web-Applikation besch


Automatic Detection of Objects in 3D Point Clouds Based on Exclusively Semantic Guided Processes


J.J. Ponciano,
A. Trémeau,
F. Boochs


ISPRS International Journal of Geo-Information

In the domain of computer vision, object recognition aims at detecting and classifying objects in data sets. Model-driven approaches are typically constrained through their focus on either a specific type of data, a context (indoor, outdoor) or a set of objects. Machine learning-based approaches are more flexible but also constrained as they need annotated data sets to train the learning process. That leads to problems when this data is not available through the specialty of the application field, like archaeology, for example. In order to overcome such constraints, we present a fully semantic-guided approach. The role of semantics is to express all relevant knowledge of the representation of the objects inside the data sets and of the algorithms which address this representation. In addition, the approach contains a learning stage since it adapts the processing according to the diversity of the objects and data characteristics. The semantic is expressed via an ontological model and uses standard web technology like SPARQL queries, providing great flexibility. The ontological model describes the object, the data and the algorithms. It allows the selection and execution of algorithms adapted to the data and objects dynamically. Similarly, processing results are dynamically classified and allow for enriching the ontological model using SPARQL construct queries. The semantic formulated through SPARQL also acts as a bridge between the knowledge contained within the ontological model and the processing branch, which executes algorithms. It provides the capability to adapt the sequence of algorithms to an individual state of the processing chain and makes the solution robust and flexible. The comparison of this approach with others on the same use case shows the efficiency and improvement this approach brings.

Connected Semantic Concepts as a Base for Optimal Recording and Computer-Based Modelling of Cultural Heritage Objects


J.J. Ponciano,
A. Karmacharya,
S. Wefers,
P. Atorf,


Structural Analysis of Historical Constructions

3D and spectral digital recording of cultural heritage monuments is a common activity for their documentation, preservation, conservation management, and reconstruction. Recent developments in 3D and spectral technologies have provided enough flexibility in selecting one technology over another, depending on the data content and quality demands of the data application. Each technology has its own pros/cons, suited perfectly to some situations and not to others. They are mostly unknown to humanities experts, besides having a limited understanding of the data requirements demanded by the research question. These are often left to technical experts who again have a limited understanding of cultural heritage requirements. A common point of view has to be achieved through interdisciplinary discussions. Such agreements need to be documented for their future references and re-uses. We present a method based on semantic concepts that not only documents the semantic essence of such discussions, but also uses it to infer a guidance mechanism that recommends technologies/technical process to generate the required data based on individual needs. Experts' knowledge is represented explicitly through a knowledge representation that allows machines to manage and infer recommendations. First, descriptive semantics guide end users to select the optimal technology/technologies for recording data. Second, structured knowledge controls the processing chain extracting and classifying objects contained in the acquired data. Circumstantial situations during object recording and the behaviour of the technologies in that situation are taken into account. We will explain the approach as such and give results from tests at a CH object.

Identification and classification of objects in 3D point clouds based on a semantic concept


J.J. Ponciano,
F. Boochs,
A. Trémeau



Knowledge-based object recognition in point clouds and image data sets


J.J. Ponciano,
A. Trémeau