Intelligent data acquisition, storage and provision at public authorities

Finished
Linked open data as a basis for the automated, intelligent creation of maps Vanessa Liebler for the i3mainz, CC BY SA 4.0

The aim of the project is to set up and integrate a linked data infrastructure at the German Federal Agency for Cartography and Geodesy (BKG) on the basis of a number of selected datasets. Ontologies for data standards will be defined and best practices for semantic integration will be tested in the practice.

Motivation

The Federal Agency for Cartography and Geodesy (BKG) is the central authority for the provision of geodata in the Federal Republic of Germany. Currently, this includes the typical OGC formats (Open Geospatial Consortium) like GML (Geography Markup Language) or shapefiles, either as direct downloads or as web services (WMS, WFS), as open data or as a paid product. A process of standardization of geodata is also being pursued by the European Commission with its INSPIRE initiative, for example, and will be implemented within the next few years. The aim of this initiative is to harmonize European data formats on a syntactic level. However, the trend – not only within public authorities, but also in various other communities – is to provide geodata as linked data under the principles of the 5-Star Open Data model.

As such, the aim of the project is to support the BKG in setting up a linked data infrastructure and to develop the integration of various types of data with other linked data repositories. Based on use cases with specific data managed by the BKG, a semantic integration of geodata will be performed as an example. Here, the reconversion back to the respective source formats are to be guaranteed and the results of the conversion are to be capable of visualization on a map. Possible enhancements of the integrated geodata are to be checked and also made available to the end user, marked accordingly. This should enable the BKG to develop an integration platform for the availability of linked data and to demonstrate the advantages of semantic integration on the basis of sample maps/sample services.

Activities

After suitable data for semantic integration was selected by the state and federal authorities in 2019, in the spring of 2020 it was analyzed, procedures for its integration were developed and the integration was performed. Ontologies which were extracted from XML schemas for further use were submitted to the architecture working group for evaluation with the aim of making them available as a standard in the SDI Germany (GDI-DE) for various state authorities in the future.

The data was integrated into a triplestore provided at the BKG, which is intended to be used for the publication of all data converted into linked data at the BKG. So far there are no connections to triplestores in conventional software for the provision of geodata, and this probably will not change in the foreseeable future due to the differences between the technologies. Consequently, one of the project’s objectives was to develop appropriate software for this purpose.

After integrating the aforementioned datasets into the project and making them available, it was decided to expand the project by including work on collecting geodata times, managing their metadata, enhancing and correcting the datasets, improving the user interface and having external users test the new software infrastructure in a hackathon.

The second phase of the project started in September 2020 and will end in August 2021. The initial focus was on the integration and semantic management of metadata, and management and storage of schemas used to export RDF data to geospatial formats. Metadata management was approached by developing a web service based on the recommendations of the OGC API Records and a web interface that allows access to this service and the manipulation of the metadata. The interface was then expanded to store data schemas and semantically link them to the display of the corresponding data. Finally, work began on integrating time in the context of geospatial data, with the expansion of the ontological model to include the concepts of time and versions. This work on spatiotemporal data will continue in 2021 to develop a web interface that will enable visualization and comparison of this data.

Results

Result of the first project phase (Fig. 2):

  • The SemanticWFS: A Java web application that enables content from triplestores to be made available as feature collections. This application uses the OGC defined OGC API Features interface and the OGC predefined Web Feature Service standard. This allows the BKG to define its own feature collections on linked data and make them available in a way that can be displayed and processed by traditional GIS software. Additionally, the web application provides a standardized web interface for the web browser.
  • GeoPubby: A linked data frontend for displaying instances in the BKG’s linked data graph. The linked data frontend from a previous Pubby project was expanded to provide export capabilities for more than 15 geospatial data formats, along with the ability to download instances to various coordinate reference systems.
  • SPARQLUnicorn QGIS Plugin: A QGIS plugin to run SPARQL queries on linked data graphs, enrich geodata layers and convert data to RDF. The plugin assists users in creating queries, shows them existing geo-related concepts, and is capable of preparing data for integration into a linked data repository.
  • SemanticImporter: An importer tool for geospatial data into the BKG’s triplestore. The tool, equipped with a rudimentary web interface, enables the uploading of geodata, the definition of mappings to linked data vocabularies and the saving of these mappings. The mappings – if exported – can also be used in the SPARQLUnicorn QGIS plugin to prepare geodata for linked data. The importer also adds provenance information and other vocabularies needed to describe metadata, so that they can be taken into account by the aforementioned tools.

Results of the second project phase: 

  • GeoTime Web Service: A Spring-based Java web application that provides access to and manipulation of information related to the spatiotemporal data contained in the triplestores. This application is designed to provide several services. The first available service follows the recommendations of the ‘OGC API - Records - Part 1: Core’, which is currently being standardized to provide a more modern service than the catalog web services. It is based on the current web architecture and best practices for geodata on the web. It allows the BKG to manage the metadata describing the feature collections provided by SemanticWFS and make it available for viewing and editing using conventional GIS software. The second service enables the storage and use of schemas associated with feature collections. Lastly, the third service, currently under development, aims to manipulate spatiotemporal data.

 The web application also provides a standardized interface for the web browser:

  • GeoTime Frontend: By using this web interface, users can import, view, edit and export metadata, as well as create links to feature collections. This frontend also provides an interface to implement the OGC API Records. It allows users to import and save schemas after identifying them in GitLab. Furthermore, it provides the possibility to verify whether the export of geodata respects the corresponding schema.