Knowledge-based object Detection in Image and Point cloud (KnowDIP)

KnowDIP i3mainz, CC BY SA 4.0

The KnowDIP project aims at the conception of a framework for an automatic object detection in unstructured and heterogeneous data. This framework uses a representation of human knowledge in order to improve the flexibility, the accuracy, and the efficiency of the processing.


Object recognition is a vast field of research, which is applied to different types of data, such as images, point clouds, and videos. Many strategies and tools have been developed to achieve object recognition. However, existing systems are mainly specialized in one type of data. One reason for this is that the use of algorithms depends on the type of data. For example, the RANSAC algorithm can be applied in images and point clouds, but its implementation differs between a 2D and a 3D application.

Besides, object recognition is still tricky in large-scale data due to internal heterogeneity (such as non-uniform density in a point cloud), which requires adaptation of the algorithm and its parameter. Therefore, the KnowDIP project aims at automatically and dynamically adapting the object detection process.

Such adaptation must consider the type and specificity of the data but also the characteristics of the target object. The objective is to use the knowledge and the addition of meaning to allow such an adaptation. The knowledge is expressed using semantic technologies as an ontology that allows the efficient representation of human knowledge.

In the field of object recognition, the necessary knowledge is the characteristics of data, objects, and algorithms that can be applied to the data.

For example, to process a point cloud, it is necessary to know that it is 3D data and to know the algorithms that can be applied to the 3D data and that are relevant to the target object.

Therefore, the project aims at creating a framework that can understand and use knowledge to guide the detection of the object to improve its accuracy and adaptability.


The KnowDIP framework consists of five modules: a knowledge module, a reasoning module, a self-learning module, an algorithm toolbox, and a bridge between the knowledge base and the algorithm toolbox. The knowledge module is composed of a SPARQL interpreter and an ontology that represents knowledge about objects, data, algorithms, and the acquisition process (see figure 1).

The toolbox contains algorithms for object recognition processing. The reasoning module uses the vocabulary and information defined in the knowledge module to determine from the toolbox the set of algorithms required for the requested task (depending mainly on the target object and the data provided by the user) and to determine how to use and combine these algorithms. This reasoning module uses knowledge of data and objects to select and configure the algorithms efficiently. The algorithms are then executed through the use of build-in in SPARQL queries that act as a bridge between the knowledge base and the algorithm toolbox. The execution of the algorithms then enriches the knowledge base with their results. Knowledge-based reasoning enhances the logical descriptions of the objects, which then allows for identifying and classifying the objects in the data. Besides, the system contains a learning module that adapts processing to the diversity of objects and data characteristics. Figure 2 illustrates the structure of the framework.

The framework was used to detect objects in a point cloud of cultural heritage points representing Ephesos provided by the Austrian Archaeological Institute (ÖAI) and the Römisch-Germanisches Zentralmuseum (RGZM). This application case aimed at detecting a watermill in this point cloud (see Figure 3). The detection process starts with the identification of the floor, then the detection of walls. The result of the detection of these two objects is added to the knowledge base to detect rooms.

The main activity of the year 2019 was the creation of the Knowledge-Based Self-Learning process. The complete framework has been applied for further application contexts. It has been used to detect rooms in the “2D-3D-Semantic dataset” of Stanford1, see figure 4. This application case has allowed for comparing the performance of the framework with other approaches. It has also allowed for quantifying the improvement brought by the knowledge-based self-learning. The results have been published in the article entitled: “Automatic Detection of Objects in 3D Point Clouds Based on Exclusively Semantic Guided Processes”2.

  1. Dataset:, [Armeni et al., 2017] Armeni, I., Sax, A., Zamir, A. R., and Savarese, S. Joint 2D3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (2017). URL Visited on 2019-08-1, 1702.01105.
  2. Ponciano, J.-J.; Trémeau, A.; Boochs, F. Automatic Detection of Objects in 3D Point Clouds Based on Exclusively Semantic Guided Processes. ISPRS Int. J. Geo-Inf. 2019, 8, 442,


Through the reasoning process and the analysis of topological links between the detected objects, the results obtained go beyond machine learning approaches3,4 (see Figure 5). The results underline the relevance of the framework for object detection in various contexts. Besides, the learning process allows a clear improvement in the quality of the results, as shown in figure 6.

  1. [Armeni et al., 2016] Armeni, Iro, Sener, Ozan, Zamir, Amir R., Jiang, Helen, Brilakis, Ioannis, Fischer, Martin, and Savarese, Silvio. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2016)
  2. [Bobkov et al., 2017] Bobkov, Dmytro, Kiechle, Martin, Hilsenbeck, Sebastian, and Steinbach, Eckehard. Room segmentation in 3d point clouds using anisotropic potential fields. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pages 727–732. IEEE (2017).