The integration of heterogeneous data from various sources is an important issue in the geographic information science (GIS) domain. In such practical problems as data interoperability, spatial database updating, and change detection, it plays a significant role not only in the derivation of combined abstracted information but also in the extraction of differences among data. To update information, for instance, the latest data are integrated into an existing database and a large-scale database is compared with the small-scale database using map generalization. In this "big data" era, with rapid development of computational power and the Internet, an increasing number of spatial data infrastructures have been completed. The necessity to share spatial information over the Internet has increased the need for data integration (Ruiz et al. 2011). In particular, the integration of official data and VGI (volunteered geographic information) data (e.g., OpenStreetMap) is becoming a new and important way for the detection of data changes, data description enrichment (Du et al. 2012; Yang, Zhang, and Luan 2013), and data quality enhancement (Koukoletsos, Haklay, and Ellul 2012).
Due to differences in abstraction levels, semantic hierarchies, accuracy descriptions, or other properties, spatial data from various sources may be conflicting in terms of their semantic descriptions, geometric representations, or topological relationships during the integration process. As logical consistency is a key aspect of spatial data quality (Goodchild 1991), maintaining consistency is an important issue in the fields of spatial database construction and integration. Consistency maintenance results in matching problems that some of which are at schema levels and some are at data levels (Devogele, Parent, and Spaccapietra 1998). Schema matching aims to find corresponding concepts in different data models, whereas data matching deals with the identification of related features in different databases.
Several methods have been developed to address the data matching problem. The earliest work was done by Saalfeld (1985, 1988) who consolidated digital maps from the US Geological Survey and the US Census Bureau. The proposed method first builds corresponding relationships between road nodes from different databases and then uses those counterpart points as references to transform aligning features by rubber-sheet transformation. This conflation concept was defined by Cobb (1998) as involving two steps: (1) feature matching, that is, finding corresponding features between two data-sets, and (2) feature adjustment, that is, detecting differences based on matching relationships and eliminating inconsistencies using semantic or geometric transformations. This two-step method was applied in subsequent studies under different scenarios. Some studies focused on data of different scales in the same region (e.g., Kieler et al. 2009; Mustiere and Devogele 2008; Zhang et al. 2014), while others compared data with similar detail but from different domains (e.g., Walter and Fritsch 1999; Huh, Yu, and Heo 2011; Song et al. 2011; Safra et al. 2013; Ai et al. 2013) or from different times (Masuyama 2006). In addition, many studies focused on matching factors such as distance and shape (Ai et al. 2013).
Matching of homogeneous features, such as the matching of road networks with different scales, dates or viewpoints, is the subject of most studies in literature. Little research was, however, done on matching methods or inconsistency corrections for matching heterogeneous features, such as hydrographic and terrain data, transportation and built-up area data, and vegetation and land-use data. These features usually have some co-location relationships.
Compared with the matching of homogeneous features, matching of heterogeneous features faces more challenges. First, criteria to establish the corresponding links between heterogeneous features and qualify their difference are lacking. Methods based on proximity of features' properties (e.g., distance measures, geometry, topology, attributes) to construct corresponding relationships are not suitable for heterogeneous features. In addition, the control relationships in the adjustment process are more complex because of constraints between different feature classes.
When matching heterogeneous features, the intercontext background of the features involved should be considered (Rodriguez 2005). A typical case was provided by Yang, Zhang, and Lu (2014) who proposed an approach for integrating VGI POIs (points of interest) and professional road networks. The key step in the matching process is the extraction of distribution patterns of POIs and roads. In this study, we attempt to investigate the matching problem of hydrographic and terrain data and propose an approach for inconsistency detection and correction based on spatial constraint knowledge. The spatial constraint knowledge used concerns the distribution relationships between contours and river networks, that is, the river should flow into the implied talweg in the terrain representation. Based on this implicit knowledge, we are able to get correct logical relationships between river and contour features and detect and correct inconsistencies.
The remainder of this article is organized as follows: The next section analyzes the possible inconsistencies between river and contour data and presents the main idea of this study. The following section elaborates the spatial knowledge-based approach for inconsistency detection and adjustment. The experimental studies for demonstrating the effectiveness of the proposed approach are discussed next. The final section presents an outline of future work.
Problems arising from matching river network and contour data
Causes and classification of inconsistencies
Hydrographic and terrain features are two important types of natural objects in the physical world which typically act as references for other semantic features in spatial representations. In topographic databases, these two types of features have usually been collected, generalized, and maintained using separate methods or in different times. This may result in spatial conflicts during their integration. The following types of inconsistency are usually present.
(1) Contours fall into watercourses. A typical example is in Figure 1(a) where a contour line intersects one side of a double-line river twice. This phenomenon implies that the river surface is a slope in the direction perpendicular to the water flow direction.
(2) Rivers climb up the slopes. Figure 1 (b) shows a river partially flowing from lower to higher contours. Obviously, it is a violation of the principle that water should flow from higher to lower places.
(3) Rivers deviate from the corresponding talwegs. The talweg derived from the contour data is a valley line acting as the channel of water flow along. A river should flow into the talweg implied in the contour representation. An example of this type of inconsistency is shown in Figure 1(c) where the river segment and its talweg do not coincide with each other.
The third type of inconsistency occurred more frequently than the first two types. It can be created not only by accuracy difference between river and contour data but also by operations such as simplification in map generalization. Chen et al. (2007) defined a set of rules for the determination of spatial conflicts between river and contour features and proposed an approach for automatic detection of spatial conflicts by comparing the calculated line-line relationship using predefined rules. That approach, however, does not explicitly establish corresponding relationships between river and contour features, and the inconsistency correction is not mentioned. In this study, we propose a complete solution for detecting and correcting the third type of inconsistency listed earlier.
Spatial knowledge and its application in consistency matching
According to Tobler's first law of geography (1970), all entities on a geographic surface are related to each other, and some spatial distribution patterns or dependent relationships will be formed between associated entities under the influence of nature or human forces. With topographic databases, these types of impacts and interactions can be represented by spatial knowledge behaving as a series of constraints. The following types of constraints have been identified:
* Intra-layer constraints. This type of constrains exists in the same geographic class and relates to the topological consistency. For example, a road segment should connect with other road segments in the vicinity; a linear river should be connected to the neighboring polygonal lakes.
* Inter-layer constraints. These constrains are usually implicit in the distribution of features from different geographic classes and indicate the spatial correlation characteristics between different semantic features. There are some typical situations such as location-sharing constraints (e.g., a bridge's location should be on the corresponding road segment), structure-associating constraints (e.g., the distribution of houses should be aligned with the road network), area-inclusion constraints (e.g., the distribution area of a certain vegetation class is...