Model generalization of two different drainage patterns by self-organizing maps.

Author:Sen, Alper


Because problems in updating digital geographic databases are a major impediment to the effective use of geographic data in production environments, multiple representation databases have become an important research topic in digital cartography. Besides, in many geographical information systems (GISs) applications, users need to visualize and inspect data at different scales, which requires different representations to be stored at different levels of detail. The flexibility of a multiple representation database lies in its ability to derive different types of maps, using generalization methods. In this respect, automated spatial data generalization techniques are as important as data modeling, management, and distribution (Kilpalainen 1997; Sarjakoski 2007).

Generalization is a process used for reducing the data volume of a spatial dataset, while preserving the important structures (Sester 2005). Map generalization operations concerned with the abstraction of a database come under the heading of "model generalization", while operations concerned with the optimal visualization of the selected data are known as "cartographic generalization". Model generalization is relevant to activities other than visualization. In particular, it has relevance to data mining (Mackaness 2007). Selection and elimination is often used interchangeably with model generalization and database abstraction.

As data become easier to collect, techniques that can efficiently handle massive datasets with a large number of variables become essential. However, in some cases, it is not possible to build a function that will represent the data well while being useful for prediction as it is in generalization. In such a case, artificial neural networks (ANNs) are excellent exploratory tools (Ratle et al. 2008, 95-96). Self-organizing map (SOM) is an ANN method for clustering known as unsupervised learning.

The goal of the proposed method is to use an unsupervised classification approach (SOM) to select objects in two different drainage patterns (dendritic pattern and modified basic pattern composed of mostly trellis and partially rectangular), which should be present in the target scale. The data were obtained from the United States Geological Survey (USGS) National Hydrography Datasets (NHDs) and are described below, in the section on Data sources.

Related work

Selection and elimination operations have primary importance in map generalization. Competition for the map space is a fundamental principle of map design. Unnecessary features are eliminated while important features are retained at smaller scales. The completeness of a map is affected by the elimination of features due to generalization procedures. Robinson et al. (1995, 450-457) define selection as a model generalization operation, but selection is not a part of cartographic generalization, which includes classification, simplification, exaggeration, and symbolization.

Topfer and Pillewiser (1966) suggested a mathematical formula, the Principle of Selection, also known as the Radical Law, which relates the number of occurrences of a particular feature at a source and at a derived map scale. Topfer's law is the only quantitative rule in the selection of features and as such yields the number of features to be displayed, but it does not reveal which of the features should be chosen. The principle can be expressed in its simplest form as:

[n.sub.f] = [n.sub.a] [square root of [M.sub.a]/[M.sub.f]] (1)

where [n.sub.f] is the number of features that can be shown at the derived scale, [n.sub.a] is the number of features shown on the source map, and [M.sub.a] and [M.sub.f] are the scale denominators of the source and the derived map, respectively. Topfer generalized this rule to include multiplicative constants and an exponent, such that the general form of the law is:

[n.sub.f] = [n.sub.a]C [square root of ([M.sub.a]/[M.sub.f].sup.x])] (2)

where x is the exponent of the ratio specified for point, linear, and areal features. The exponent of two (x = 2) should be applied to the ratio for linear features (Topfer and Pillewiser 1966; Topfer 1974; Dutton 1999; Stanislawski 2009).

The goal of selection is to reduce the number of objects but to preserve the original characteristics of the objects, such as their density and distribution. In the case of network structures such as hydrographic or road networks, the primary aim is to preserve the connectivity of features.

In many studies, the selection is performed in accordance with a hierarchy, which is known as "stream order", and developed by Horton (1945), Strahler (1957), and Shreve (1966), assigned to the components of a stream network. Wolf (1988) suggested using weighted network data for hydrographic generalization. Richardson (1994) presented a method to select streams by creating a fine hierarchy and filtering. Thomson and Brooks (2000) used the principle of good continuation to build strokes in stream networks in order to perform selection. Ai, Liu, and Chen (2006) presented a method to select a stream network using a watershed area threshold based on watershed hierarchical partitioning. Touya (2007) focused on model generalization and used the principle of good continuation to enrich the database with stream strokes. In Touya's study, the main criterion used in the selection of strokes was a hierarchical organization of strokes. Stanislawski (2009) used a pruning algorithm considering the upstream drainage area. Stanislawski and Savino (2011) compared two pruning approaches, stratified pruning and pruning of the length and density of hydrographic networks.

Jiang and Harrie (2004) introduced the centrality measures (i.e., degree, betweenness, and closeness) using the connectivity graph to characterize the structural properties of an urban road network and select important roads. In Jiang and Harrie's study, the roads were clustered by SOM, but the clusters that included important roads were selected by the user. This approach is thus a semi-automated approach for model generalization of road networks. In other research, Sester (2008) proposed using SOM for the typification of buildings. And, in a first, Sen and Gokgoz (2012) tested SOM and k-means clustering for the selection of hydrographic objects.


Before introducing the method we developed to select streams in drainage areas, it is beneficial to give basic definitions related to the method.

Morphometric parameters

The quantitative description and analysis of the geometric characteristics of the landscape is defined as geomorphometry. It extracts surface parameters and characteristics using a set of numerical measures usually derived from digital elevation models (DEMs). Drainage basin morphometric parameters (Table 1) are useful for comparing the basins (Ferentinou et al. 2011).

Drainage patterns

A drainage pattern is the pattern formed by a stream network and refers to the connectivity of the stream tributaries in an area, regardless of whether the stream tributaries are occupied by perennial streams. Drainage patterns can be subdivided into basic and modified basic patterns. Basic patterns are dendritic, parallel, trellis, radial, centripetal, annular, deranged, and rectangular. Modified basic patterns, such as subdendritic, pinnate, recurved trellis, centripetal and so on, although usually recognized as belonging to one of the basic types, differ in certain regional characteristics (Howard 1967; DeBarry 2004, 98-100).

Streams showing a dendritic pattern form a treelike, or dendritic, arrangement of small streams or tributaries in headwaters (branches) that flow in a variety of directions and continually join to eventually form the "major" stream or river. Streams with a trellis pattern typically have a main stream flowing parallel to bedrock structure, with the tributaries flowing into the main stem of the stream at right angles toward each other from opposite sides. In a rectangular or gridlike drainage pattern, streams form angular, near 90[degrees] turns, due primarily to following the fissures, tectonic faults, or joints in the bedrock (DeBarry 2004). Sample drainage patterns are shown in Figure 1.

Stream types

There are two stream types. Perennial streams are those that flow continuously, whereas intermittent streams appear to "dry up" when the flow has the potential of being totally absorbed by the bed and underlying material. Intermittent streams may flow continuously during "wet" years. Streams are often intermittent in the headwaters, and become perennial as the watershed drainage area supplies enough base flow to support continued flow (DeBarry 2004, 105).

Data source

The National Hydrography Database (NHD) developed by USGS is the data source in this study. The database has the following subdivisions: region, subregion, accounting unit, and subbasin watershed areas. Two high-resolution NHDs (1:24,000-scale or 24K) and two corresponding medium-resolution NHDs (l:100,000-scale or 100K) were used as the source and target datasets, respectively, as well as six blocks of 1/3 arc-second DEM (approx. 10 m grid spacing and 2.44 m root mean square error), for computation of drainage basin morphometric parameters in Table 1. The datasets contain two subbasins, namely, the hilly and humid Pomme de Terre (PT) on the Ozark plateau in the interior highlands and the mountainous humid South Branch Potomac (SBP) in the Appalachian highlands (Stanislawski et al. 2007; Brewer, Buttenfield, and Usery 2009). The Hydrologic Unit Code (HUC) for PT is 10290107 and that of SBP is 02070001. PT has 20 subunits and SBP has 40 subunits.

SBP is larger and more rugged, and several of its parameters (such as total length of the streams and 10 m interval contour lines at 24K map, drainage density, basin relief, and basin slope) are greater than those of PT.

PT represents the dendritic pattern, while SBP represents a modified...

To continue reading