How can I associate a consistent polygon area with a set of attributes using a topology?

How can I associate a consistent polygon area with a set of attributes using a topology?

Consider the following topology.

SELECT CreateTopology('topotestschema', 0, 0.01); CREATE TABLE topotest (id SERIAL PRIMARY KEY, data TEXT); SELECT AddTopoGeometryColumn('topotestschema', 'public', 'topotest', 'topogeom', 'polygon'); INSERT INTO topotest (data, topogeom) VALUES ('outer', toTopoGeom('POLYGON((0 0,0 10,10 10,10 0,0 0))'::geometry, 'topotestschema', 1, 0.01));

When we look at what's in the topology:

SELECT *, ST_AsText(topogeom::geometry) FROM topotest;

we get

╔════╦═══════╦═══════════╦═══════════════════════════════════════════╗ ║ id ║ data ║ topogeom ║ st_astext ║ ╠════╬═══════╬═══════════╬═══════════════════════════════════════════╣ ║ 1 ║ outer ║ (3,1,1,3) ║ MULTIPOLYGON(((0 0,0 10,10 10,10 0,0 0))) ║ ╚════╩═══════╩═══════════╩═══════════════════════════════════════════╝

But this gets changed around when I add more shapes:

INSERT INTO topotest (data, topogeom) VALUES ('right', toTopoGeom('POLYGON((9 1,11 1,11 9,9 9,9 1))'::geometry, 'topotestschema', 1, 0.01)) ,('inner', toTopoGeom('POLYGON((3 3,3 7,7 7,7 3,3 3))'::geometry, 'topotestschema', 1, 0.01)) ;

Now the layer is:

╔════╦═══════╦═══════════╦══════════════════════════════════════════════════════╗ ║ id ║ data ║ topogeom ║ st_astext ║ ╠════╬═══════╬═══════════╬══════════════════════════════════════════════════════╣ ║ 1 ║ outer ║ (3,1,1,3) ║ MULTIPOLYGON(((10 9,10 1,10 0,0 0,0 10,10 10,10 9))) ║ ║ 2 ║ right ║ (3,1,2,3) ║ MULTIPOLYGON(((10 9,11 9,11 1,10 1,9 1,9 9,10 9))) ║ ║ 3 ║ inner ║ (3,1,3,3) ║ MULTIPOLYGON(((3 3,3 7,7 7,7 3,3 3))) ║ ╚════╩═══════╩═══════════╩══════════════════════════════════════════════════════╝

How can I keep it from modifying the polygon associated with existing rows when I add new ones?

Please feel free to say something if it seems like I'm on entirely the wrong track here. I'm basically experimenting with a topology and trying to figure out if it's useful for what I'm doing.


Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See Chapter 2.

Haralick, R.M., 1980. "A Spatial Data Structure for Geographic Information Systems," in H. Freeman and G.G. Pieroni, eds., Map Data Processing, Academic Press, New York.

Peuker, T.K., and N. Chrisman, 1975. "Geographic Data Structures," American Cartographer 2(1):55-69.

van Roessel, J.W., and E.A. Fosnight, 1984. "A relational approach to vector data structure conversion," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp. 78-95.

1. Make a list of the kinds of relationships which can exist between pairs of spatial objects, for each pair of points, lines and areas, e.g. point to point, point to line, area to point etc. Are there any examples of relationships between triples of objects, e.g. point-point-point?

2. The GIS industry has traditionally provided data models which assume that within any one layer of the database, polygon objects do not overlap, and exhaust the space available. Comment on the degree to which this assumption has limited the application of GIS databases in specific areas. Are these sufficiently significant to warrant a change of data models in the future?

3. Discuss areas of application in which the concept of a complex feature type would be useful. What operations would you want to perform on complex and simple features respectively?


Theme Keywords

Thesaurus Keyword
Global Change Master Directory (GCMD) Science Keywords Earth Science > Human Dimensions > Environmental Impacts > Oil Spills
ISO 19115 Topic Category biota
ISO 19115 Topic Category environment
None Coastal resources
None Coastal Zone Management
None Environmental Monitoring
None ESI
None Human use resources
None Managed areas
None Management areas
None Oil spill planning
None Sensitivity maps
None Socioeconomic resources

Spatial Keywords

Thesaurus Keyword
Global Change Master Directory (GCMD) Location Keywords Continent > North America > United States Of America > Maine
Global Change Master Directory (GCMD) Location Keywords Continent > North America > United States Of America > New Hampshire

Dataset languages English (UNITED STATES) Dataset character set utf8 - 8 bit UCS Transfer Format

Status completed Spatial representation type vector

Supplemental information * Processing environment Microsoft Windows 7 Version 6.1 (Build 7601) Service Pack 1 Esri ArcGIS

Credits ArcGIS item properties * Name GISVIEW.MEIFW.Twwh * Location,1633 Database=GISVIEW User=meifw Version=dbo.DEFAULT * Access protocol ArcSDE Connection

Vector Data Models Structures

Vector data models can be structured many different ways. We will examine two of the more common data structures here. The simplest vector data structure is called the spaghetti data model (Dangermond 1982).Dangermond, J. 1982. &ldquoA Classification of Software Components Commonly Used in Geographic Information Systems.&rdquo In Proceedings of the U.S.-Australia Workshop on the Design and Implementation of Computer-Based Geographic Information Systems, 70&ndash91. Honolulu, HI. In the spaghetti model, each point, line, and/or polygon feature is represented as a string of X, Y coordinate pairs (or as a single X, Y coordinate pair in the case of a vector image with a single point) with no inherent structure (Figure 4.9 "Spaghetti Data Model"). One could envision each line in this model to be a single strand of spaghetti that is formed into complex shapes by the addition of more and more strands of spaghetti. It is notable that in this model, any polygons that lie adjacent to each other must be made up of their own lines, or stands of spaghetti. In other words, each polygon must be uniquely defined by its own set of X, Y coordinate pairs, even if the adjacent polygons share the exact same boundary information. This creates some redundancies within the data model and therefore reduces efficiency.

Figure 4.9 Spaghetti Data Model

Despite the location designations associated with each line, or strand of spaghetti, spatial relationships are not explicitly encoded within the spaghetti model rather, they are implied by their location. This results in a lack of topological information, which is problematic if the user attempts to make measurements or analysis. The computational requirements, therefore, are very steep if any advanced analytical techniques are employed on vector files structured thusly. Nevertheless, the simple structure of the spaghetti data model allows for efficient reproduction of maps and graphics as this topological information is unnecessary for plotting and printing.

In contrast to the spaghetti data model, the topological data model is characterized by the inclusion of topological information within the dataset, as the name implies. Topology is a set of rules that model the relationships between neighboring points, lines, and polygons and determines how they share geometry. For example, consider two adjacent polygons. In the spaghetti model, the shared boundary of two neighboring polygons is defined as two separate, identical lines. The inclusion of topology into the data model allows for a single line to represent this shared boundary with an explicit reference to denote which side of the line belongs with which polygon. Topology is also concerned with preserving spatial properties when the forms are bent, stretched, or placed under similar geometric transformations, which allows for more efficient projection and reprojection of map files.

Three basic topological precepts that are necessary to understand the topological data model are outlined here. First, connectivity describes the arc-node topology for the feature dataset. As discussed previously, nodes are more than simple points. In the topological data model, nodes are the intersection points where two or more arcs meet. In the case of arc-node topology, arcs have both a from-node (i.e., starting node) indicating where the arc begins and a to-node (i.e., ending node) indicating where the arc ends (Figure 4.10 "Arc-Node Topology"). In addition, between each node pair is a line segment, sometimes called a link, which has its own identification number and references both its from-node and to-node. In Figure 4.10 "Arc-Node Topology", arcs 1, 2, and 3 all intersect because they share node 11. Therefore, the computer can determine that it is possible to move along arc 1 and turn onto arc 3, while it is not possible to move from arc 1 to arc 5, as they do not share a common node.

Figure 4.10 Arc-Node Topology

The second basic topological precept is area definition. Area definition states that an arc that connects to surround an area defines a polygon, also called polygon-arc topology. In the case of polygon-arc topology, arcs are used to construct polygons, and each arc is stored only once (Figure 4.11 "Polygon-Arc Topology"). This results in a reduction in the amount of data stored and ensures that adjacent polygon boundaries do not overlap. In the Figure 4.11 "Polygon-Arc Topology", the polygon-arc topology makes it clear that polygon F is made up of arcs 8, 9, and 10.

Figure 4.11 Polygon-Arc Topology

Contiguity, the third topological precept, is based on the concept that polygons that share a boundary are deemed adjacent. Specifically, polygon topology requires that all arcs in a polygon have a direction (a from-node and a to-node), which allows adjacency information to be determined (Figure 4.12 "Polygon Topology"). Polygons that share an arc are deemed adjacent, or contiguous, and therefore the &ldquoleft&rdquo and &ldquoright&rdquo side of each arc can be defined. This left and right polygon information is stored explicitly within the attribute information of the topological data model. The &ldquouniverse polygon&rdquo is an essential component of polygon topology that represents the external area located outside of the study area. Figure 4.12 "Polygon Topology" shows that arc 6 is bound on the left by polygon B and to the right by polygon C. Polygon A, the universe polygon, is to the left of arcs 1, 2, and 3.

Figure 4.12 Polygon Topology

Topology allows the computer to rapidly determine and analyze the spatial relationships of all its included features. In addition, topological information is important because it allows for efficient error detection within a vector dataset. In the case of polygon features, open or unclosed polygons, which occur when an arc does not completely loop back upon itself, and unlabeled polygons, which occur when an area does not contain any attribute information, violate polygon-arc topology rules. Another topological error found with polygon features is the sliver. Slivers occur when the shared boundary of two polygons do not meet exactly (Figure 4.13 "Common Topological Errors").

In the case of line features, topological errors occur when two lines do not meet perfectly at a node. This error is called an &ldquoundershoot&rdquo when the lines do not extend far enough to meet each other and an &ldquoovershoot&rdquo when the line extends beyond the feature it should connect to (Figure 4.13 "Common Topological Errors"). The result of overshoots and undershoots is a &ldquodangling node&rdquo at the end of the line. Dangling nodes aren&rsquot always an error, however, as they occur in the case of dead-end streets on a road map.

Figure 4.13 Common Topological Errors

Many types of spatial analysis require the degree of organization offered by topologically explicit data models. In particular, network analysis (e.g., finding the best route from one location to another) and measurement (e.g., finding the length of a river segment) relies heavily on the concept of to- and from-nodes and uses this information, along with attribute information, to calculate distances, shortest routes, quickest routes, and so forth. Topology also allows for sophisticated neighborhood analysis such as determining adjacency, clustering, nearest neighbors, and so forth.

Now that the basics of the concepts of topology have been outlined, we can begin to better understand the topological data model. In this model, the node acts as more than just a simple point along a line or polygon. The node represents the point of intersection for two or more arcs. Arcs may or may not be looped into polygons. Regardless, all nodes, arcs, and polygons are individually numbered. This numbering allows for quick and easy reference within the data model.

How can I associate a consistent polygon area with a set of attributes using a topology? - Geographic Information Systems


Compiled with assistance from David H. Douglas, University of Ottawa


Compiled with assistance from David H. Douglas, University of Ottawa

    previous units have been concerned with specifying and transforming locations

  • objects (points, lines and areas)
  • attributes associated with objects
  • relationships between objects

  • many alternatives exist for structuring spatial data within a digital store
  • here we review some of the most common which have been proven useful by years of experience and application

  • spatial objects - points, lines, areas - can be coded as x,y coordinate pairs:
    • point: (x,y)
    • line: (x1,y1), (x2,y2), . , (xn,yn)
    • area: (x1,y1), (x2,y2), . , (xn,yn)
    • note that the digital representation of the three spatial objects is identical, n=1 in the first case
    • note the convention used throughout this unit:
      • the name of the record type, followed by a colon, then the items forming the record

        attributes of objects can be stored as tables

      • the data structure usually consists of two parts:
        • coordinates in one file, each set representing a single object identified by a unique ID
        • attributes in a table with one attribute identifying the objects to which each is linked

          many common packages for mapping use this structure
            SAS/GRAPH and ATLAS (from Strategic Locations Planning) are examples

            the key to a GIS data structure, as distinct from cartographic databases, is the emphasis on the coding of relationships between objects
              in GIS, the term topology is used to refer to these relationships between objects

              topological properties are those which are preserved when an object is stretched or distorted, and are therefore distinct from geometrical properties
                e.g. a circle can be stretched to form any shape of polygon, but no amount of distortion will make it into a cube

              • relationships in networks
              • relationships between areas

              • networks consist of two types of objects:
                • lines, also known as links, edges or arcs
                • nodes, also known as intersections or junctions

                • 1. arc coordinates: (x1,y1), (x2,y2), . , (xn,yn)
                • 2. arc attributes: to-node, from-node, length, attributes

                • the DIME datasets created by the Bureau of the Census for the 1970 Census used this concept to code US street networks
                  • each node or intersection was given a unique ID

                  • this could be done by adding a third type of record:
                  • 3. node: (x,y), adjacent arcs (positive for to- node, negative for from-node)
                    • see overhead

                    positionarc nodeposition 1 1 a 1 2 -5 b 3 3 3 c 6 4 2 d 9 5 -1

                      knowing adjacency is important when working with area objects
                        many programs are more efficient if we know which areas share common boundaries

                      • duplication in digitizing
                      • problems which arise when the two versions of each common boundary do not coincide

                      • overhead/handout - Relationships between areas
                      • a polygon attribute table
                      • an arc attribute table
                      • a set of (x,y) pairs representing the arc geometry
                      • note: in ARC/INFO these are referred to as the .PAT, .AAT and .ARC files respectively

                      • to construct polygons, must search for arcs with correct polygon IDs and then match node numbers
                        • for polygon B above, the result would be arcs 3, 4 and 5, with 5 in reverse order

                          an example of a more fully developed data structure is the database of the Canadian Soil Information System (CanSIS)
                            developed by the Canadian Department of Agriculture in the 1970s

                            • soil types would be coded as objects
                            • an object can describe many discontiguous polygons sharing the same attributes 2. Polygon: object ID, next-polygon, first-arc, last-arc
                            • here "object" is the object of which the polygon is a part 3. Arc: R-polygon, L-polygon, next-R-arc, next-L- arc, previous-R-arc, previous-L-arc, first-point, last-point
                            • the arc pointers are to the next arcs around the left and right polygons

                              • first-point and last-point identify the first and last (x,y) pairs of this arc in the point data below 4. Point: (x,y)
                              • the points owned by each arc are stored in sequence in this dataset

                                areas do not always exhaust the space
                                  the method may be inefficient for coding data sets which consist of isolated polygons, e.g. woodlots in an agricultural area, various types of land use, house footprints on an urban map

                                  a database of old burns in a forest contains polygons which may overlap and do not exhaust the space, so there are few if any common boundaries

                                  the network and area data structures discussed above reflect common practice in existing GIS, but are far from comprehensive

                                  • arcs are more efficient than polygons for many operations 2. accurate modeling of reality
                                  • objects are abstractions of reality the conditions imposed, e.g. non-overlapping polygons, will affect the accuracy of the abstraction

                                  • a simple feature such as a point can be part of several complex features
                                  • this idea is useful in utilities applications, where it may be necessary to group together several objects, such as a house, land parcel, pipe, shutoff valve and gas meter, into a complex object ("account")

                                  Example: analysts of spatial information must often deal with the fact that reporting zones, such as counties, change from time to time

                                  • Great American History Project - to analyze the spatial distribution of the US population by county since 1800 requires a database which can present the user with different views of the set of counties at different times, as boundaries change
                                  • one solution is to define a common set of arcs, but to build them selectively into area objects at each time period
                                    • the arcs list contains every line which has ever been a part of a US county boundary
                                    • the boundaries of objects (counties) are defined differently at each time period
                                    • an arc is part of the network of boundaries at time period t if the polygon IDs on its right and left belong to different objects at time t

                                    Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See Chapter 2.

                                    Haralick, R.M., 1980. "A Spatial Data Structure for Geographic Information Systems," in H. Freeman and G.G. Pieroni, eds., Map Data Processing, Academic Press, New York.

                                    Peuker, T.K., and N. Chrisman, 1975. "Geographic Data Structures," American Cartographer 2(1):55-69.

                                    van Roessel, J.W., and E.A. Fosnight, 1984. "A relational approach to vector data structure conversion," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp. 78-95.

                                    1. Make a list of the kinds of relationships which can exist between pairs of spatial objects, for each pair of points, lines and areas, e.g. point to point, point to line, area to point etc. Are there any examples of relationships between triples of objects, e.g. point-point-point?

                                    2. Write out the CanSIS data structure for a simple map of three or four polygons, forming an equal or smaller number of objects (include the x,y coordinate pairs) (need to include a drawn example).

                                    3. The GIS industry has traditionally provided data models which assume that within any one layer of the database, polygon objects do not overlap, and exhaust the space available. Comment on the degree to which this assumption has limited the application of GIS databases in specific areas. Are these sufficiently significant to warrant a change of data models in the future?

                                    4. Discuss areas of application in which the concept of a complex feature type would be useful. What operations would you want to perform on complex and simple features respectively?

                                    Please send comments regarding content to: Brian Klinkenberg
                                    Please send comments regarding web-site problems to: The Techmaster
                                    Last Updated: August 30, 1997.

                                    4. Shapefiles

                                    Since 2007, TIGER/Line extracts from the MAF/TIGER database have been distributed in shapefile format. Esri introduced shapefiles in the early 1990s as the native digital vector data format of its ArcView software product. The shapefile format is proprietary but open its technical specifications are published and can be implemented and used freely. Largely as a result of ArcView’s popularity, shapefile has become a de facto standard for creation and interchange of vector geospatial data. The Census Bureau’s adoption of Shapefile as a distribution format is therefore consistent with its overall strategy of conformance with mainstream information technology practices.

                                    Elements of a Shapefile Data Set

                                    The first thing GIS pros need to know about shapefiles is that every shapefile data set includes a minimum of three files. One of the three required files stores the geometry of the digital features as sets of vector coordinates. A second required file holds an index that, much like the index in a book, allows quick access to the spatial features and therefore speeds processing of a given operation involving a subset of features. The third required file stores attribute data in dBASE© format, one of the earliest and most widely-used digital database management system formats. All of the files that make up a Shapefile data set have the same root or prefix name, followed by a three-letter suffix or file extension. The list below shows the names of the three required files making up a shapefile data set named “counties.” Take note of the file extensions:

                                    • counties.shp: the main shape file, containing vector coordinate data
                                    • counties.shx: the index file
                                    • counties.dbf: the dBASE table

                                    Esri lists twelve additional optional files, and practitioners are able to include still others. Two of the most important optional files are the “.prj” file, which includes the coordinate system definition, and “.xml”, which stores metadata. (Why do you suppose that something as essential as a coordinate system definition is considered “optional”?)

                                    Try This!

                                    Downloading and viewing a TIGER/Line Shapefile

                                    In this Try This! (the second of 3 dealing with TIGER/Line Shapefiles), you will download a TIGER/Line Shapefile dataset, investigate the file structure of a typical Esri shapefile, and view it in GIS software.

                                    You can use a free software application called Global Mapper (originally known as dlgv32 Pro) to investigate TIGER/Line shapefiles. Originally developed by the staff of the USGS Mapping Division at Rolla, Missouri as a data viewer for USGS data, Global Mapper has since been commercialized but is available in a free trial version. The instructions below will guide you through the process of installing the software and opening the TIGER/Line data.

                                    1. Downloading TIGER/Line Shapefiles: You are going to use the 2010 TIGER/Line Shapefiles.
                                      • Return to the 2010 TIGER/Line Shapefiles download page.
                                      • From the Select a layer type pick list, under Features, choose All Lines, and click submit. (You are welcome to download and investigate any TIGER/Line Shapefile(s), but we will use an All Lines dataset in the geocoding Try This later in the chapter, so your downloading one here will make you more familiar with the content.)
                                      • From the All Lines pick list, select a state or territory, and click Submit.
                                      • Select a County from the next pick list that appears, and click Download.
                                      • Save the file to your computer.
                                        The file you download should have a name like The root name of this file, tl_2010_42027_edges in this example, will also be the name of the shapefile dataset. The 42027 is a federal code that represents Pennsylvania (state 42) and Centre County (county 027). The five-digit code in your file name will depend on which state and county you selected.
                                      • The data are compressed in a .zip archive. Extract the data to a new named folder in a known location. (Within the file hierarchy that is extracted, there may be a second .zip file that needs to be uncompressed.)
                                    2. Investigating the shapefile data set:
                                      • Navigate to within the folder in which you stored your uncompressed TIGER/Line Shapefile dataset.
                                      • Notice the multiple files which make up the shapefile dataset, including:
                                        • tl_2010_42027_edges.shp, containing the vector coordinate data
                                        • tl_2010_42027_edges.shp.xml, containing metadata
                                        • tl_2010_42027_edges.shx, the index file
                                        • tl_2010_42027_edges.dbf, the dBASE file
                                        • tl_2010_42027_edges.prj, containing the projection/spatial reference
                                      • All of the files work in concert to store the necessary components of the Esri shapefile data set. You may be familiar with some of the individual files types. The contents of three of them can be easily viewed. Let's open those three. You can double click on the file and then select "from a list of installed programs,” or you may need to run the suggested application and open the file from within it.
                                        • Open the .dbf file using Microsoft Excel.
                                          Note the typical row-column structure of a flat-file database. Can you find the four columns, or fields, that hold the address range information? Look for LFROMADD, etc. The field name LFROMADD is shorthand for Left From Address. The 10-character length of the field name points up one of the constraints of the dBASE format -- field names are limited to 10 characters.
                                        • Open the .xml file using your web browser.
                                          You should see the metadata information bracketed by tags contained within directional brackets < >. XML stands for Extensible Markup Language and is a common set of rules for encoding documents. Can you locate the portion of the document having to do with horizontal spatial accuracy? (Spatial accuracy metadata is available when you've chosen the All Lines file as your candidate shapefile.)
                                        • Open the .prj file using Notepad, or any vanilla text editor.
                                          There are five pieces of information in this file, separated by commas. What are they? They should reinforce some of what you learned in Chapter 2 regarding what defines a geographic coordinate system.
                                        • The .shp and .shx files are proprietary and specific to the functionality of the shapefile data set.
                                      • Note that one should not alter the contents of any of these files with any application other than a GIS program that is designed for that task.
                                    3. Viewing the shapefile dataset in Global Mapper:
                                      • Download and install the Global Mapper software:
                                        1. Navigate to the Blue Marble Global Mapper site.
                                        2. Download the trial version of the software.
                                        3. Double-click on the setup file you downloaded to install the program.
                                        4. Launch the Global Mapper program.

                                        Shapefile Primitives

                                        A single shapefile data set can contain one of three types of spatial data primitives, or features – points, lines or polygons (areas). The technical specification defines these as follows:

                                        • Points: A point consists of a pair of double-precision coordinates in the order X,Y.
                                        • Lines: More specifically a polyline, is an ordered set of points, or vertices, that consists of one or more parts. A part is a connected sequence of two or more points. Parts may or may not be connected to one another. Parts may or may not intersect one another.
                                        • Polygons: A polygon consists of one or more rings. A ring is a connected sequence of four or more points, or vertices, that form a closed, non-self-intersecting loop.
                                        • Other: M (measured route data) and Z (3D vertical datum) versions of point, polyline, and polygon Shapefile data sets can be created, but are not included in the TIGER/Line Shapefile extracts.

                                        At left in the figure above, a polygon Shapefile data set holds the Census blocks in which the edges from the MAF/TIGER database have been combined to form two distinct polygons, P1 and P2. The diagram shows the two polygons separated to emphasize the fact that what is the single E12 edge in the MAF/TIGER database (see the Figure 4.4.1 on page 4) is now present in each of the Census block polygon features.

                                        In the middle of the illustration, above, a polyline Shapefile data set holds seven line features (L1-7) that correspond to the seven edges in the MAF/TIGER database. The directionality of the line features that represent streets corresponds to address range attributes in the associated dBASE© table. Vertices define the shape of a polygon or a line, and the Start and End Nodes from the MAF/TIGER database are now First and Last Vertices.

                                        Finally, at right in the illustration above, a point Shapefile data set holds the three isolated nodes from the MAF/TIGER database.

                                        8. Representation Strategies for Mapping

                                        Recall that data consist of symbols that represent measurements. Digital geographic data are encoded as alphanumeric symbols that represent locations and attributes of locations measured at or near Earth's surface. No geographic data set represents every possible location, of course. The Earth is too big, and the number of unique locations is too great. In much the same way that public opinion is measured through polls, geographic data are constructed by measuring representative samples of locations. And just as serious opinion polls are based on sound principles of statistical sampling, so, too, do geographic data represent reality by measuring carefully chosen samples of locations. Vector and raster data are, at essence, two distinct sampling strategies.

                                        The vector approach involves sampling locations at intervals along the length of linear entities (like roads), or around the perimeter of areal entities (like property parcels). When they are connected by lines, the sampled points form line features and polygon features that approximate the shapes of their real-world counterparts.

                                        Try This!

                                        Click the graphic above (Figure 1.9.1) to download and view the animation file (vector.avi, 1.6 Mb) in a separate Microsoft Media Player window.

                                        The aerial photograph above (Figure 1.9.1) shows two entities, a reservoir and a highway. The graphic above right illustrates how the entities might be represented with vector data. The small squares are nodes: point locations specified by latitude and longitude coordinates. Line segments connect nodes to form line features. In this case, the line feature colored red represents the highway. Series of line segments that begin and end at the same node form polygon features. In this case, two polygons (filled with blue) represent the reservoir.

                                        The vector data model is consistent with how surveyors measure locations at intervals as they traverse a property boundary. Computer-aided drafting (CAD) software used by surveyors, engineers, and others, stores data in vector form. CAD operators encode the locations and extents of entities by tracing maps mounted on electronic drafting tables, or by key-entering location coordinates, angles, and distances. Instead of graphic features, CAD data consist of digital features, each of which is composed of a set of point locations.

                                        The vector strategy is well suited to mapping entities with well-defined edges, such as highways or pipelines or property parcels. Many of the features shown on paper maps, including contour lines, transportation routes, and political boundaries, can be represented effectively in digital form using the vector data model.

                                        The raster approach involves sampling attributes at fixed intervals. Each sample represents one cell in a checkerboard-shaped grid.

                                        Try This!

                                        Click the graphic above (Figure 1.9.2) to download and view the animation file (raster.avi, 0.8 Mb) in a separate Microsoft Media Player window.

                                        The graphic above (Figure 1.9.2) illustrates a raster representation of the same reservoir and highway as shown in the vector representation. The area covered by the aerial photograph has been divided into a grid. Every grid cell that overlaps one of the two selected entities is encoded with an attribute that associates it with the entity it represents. Actual raster data would not consist of a picture of red and blue grid cells, of course they would consist of a list of numbers, one number for each grid cell, each number representing an entity. For example, grid cells that represent the highway might be coded with the number "1" and grid cells representing the reservoir might be coded with the number "2."

                                        The raster strategy is a smart choice for representing phenomena that lack clear-cut boundaries, such as terrain elevation, vegetation, and precipitation. Digital airborne imaging systems, which are replacing photographic cameras as primary sources of detailed geographic data, produce raster data by scanning the Earth's surface pixel by pixel and row by row.

                                        Both the vector and raster approaches accomplish the same thing: they allow us to caricature the Earth's surface with a limited number of locations. What distinguishes the two is the sampling strategies they embody. The vector approach is like creating a picture of a landscape with shards of stained glass cut to various shapes and sizes. The raster approach, by contrast, is more like creating a mosaic with tiles of uniform size. Neither is well suited to all applications, however. Several variations on the vector and raster themes are in use for specialized applications, and the development of new object-oriented approaches is underway.

                                        Geographic Information

                                        In GIS geographic information is made up of spatial location data and attribute data.

                                        Spatial location data is data with geographic component. It answers the question where something is.

                                        Attribute data answers the question what something is.

                                        The database that contains these geographic information is called an attribute table and it can be used to symbolise features, make queries and to analyse the data.

                                        The example below shows a choropleth map of Internet users (per 100 people) in 2016 and its associated attribute table.

                                        World map of Internet users (per 100 people) in 2016

                                        Screenshot of the associated attribute table

                                        How can I associate a consistent polygon area with a set of attributes using a topology? - Geographic Information Systems

                                        Data Model Definition: an abstraction of real world entities and their relationships into structures that can be implemented with a computer language.

                                        1. Definition
                                        2. Requirements for a DBMS
                                        3. Terminology
                                        4. Entity-Relationship conceptual model
                                        5. Hierarchical logical model
                                        6. Network logical model
                                        7. Relational logical model
                                        8. Integration of DBMS with spatial data models
                                        1. Introduction
                                        2. Raster conceptual data model
                                        3. Vector conceptual data model
                                        4. Object-oriented data model
                                          : referencing of an entity to a coordinate system (i.e. UTM, state plane . etc).
                                      • Data description:
                                        1. data as entities: geographic data often described in phenomenological concepts such as roads, towns, rives floodplains, eoctypes, soil associations, . etc. These concepts are often referred to as entities.
                                        2. entity hierarchy: Data is often hierarchical in form (i.e. country > state > county > village forest > deciduous/coniferous > upland/lowland)
                                        3. gradients between entities: separations between some entities are not always clear cut and there may be a transitional zone between entities (ecotones).
                                        4. Geographic data can be represented using three basic topological concepts, the point, the line and the area. Every geographical phenomena can in principle be represented by these three concepts plus a label or attribute that defines it.
                                        5. What is a map? "A map is a set of points, lines and areas that are defined by their spatial location with respect to a coordinate system and by their non-spatial attributes" (Burrough 1986). A map legend links the non-spatial attributes to the spatial attributes.
                                        1. a record is a data structure containing information about an entity that can be manipulated as a unit.
                                        2. a pointer is a memory address that references the start of data in the RAM
                                        1. simple lists: unstructured data, each item is placed at the end of a list, search time (n + 1)/2.
                                        2. ordered sequential files: additions placed in proper position (insertion). Binary search is possible reducing search time log2(n+1). Item found in set of 65,535 + 1 items in 16 tries.
                                        3. indexed files:
                                          : Rapid data retrieval according to key attribute (i.e. dictionary spelling. Key attribute + additional information). In direct files the data items themselves provides means of ordering (soil series name with index to location of each name beginning with a particular letter) : May have ordered soil profiles but may want info on soil depth, drainage, ph, texture or erosion. If the poorly drained soils need to be identified we must use a linear search unless we invert the file. Inverted files are initially ordered using a linear search. An example of an inverted file is a topic index in a book.
                                          1. limitations of simple structures: file modification is difficult, a new record is added to the end of a file, then the index is updated. Data can be accessed only via a key contained in the indexed file, while other types of information requires a sequential search

                                          These data structures provide very efficient access to information pertaining to a single entity. But we need more. We need to relate different entities.

                                          II. Data Models (Laurini and Thompson, 1992 and ANSI/X3/SPARC, 1978)

                                          As data management became more complex a framework was need to understand the transformation of real world systems and processes into structures that could be implemented in a computer.

                                          1. External model: provide the basis for understanding the real world (e.g. non-spatial: a set entities spatial: the world as a constantly varying surface the world as a discrete set of objects in space or as a set of thematic layers)

                                          2. Conceptual data model: provide the organizing principles that translates the external data models into functional descriptions of how data objects are related to one another (e.g. non-spatial: E-R model spatial: raster, vector, object representation).

                                          3. Logical data model: provide the explicit forms that the conceptual models can take and is the first step in computing (e.g. non-spatial: hierarchical, network, relational spatial: 2-d matrix, map file, location list, point dictionary, arc/nodes).

                                          4. Internal data model: low level data structures, records, pointers, etc.

                                          III. Data Base Management Systems (DBMS)

                                          1- Definition:

                                          Data Base Management Systems: A system used to organize, access, maintain and manipulate object or entity data. A DBMS controls input, output, storage and retrieval of entity data. Essential features of a data base are fast access and cross referencing of entities.

                                          2- Requirements for a DBMS

                                          A DBMS should provide:

                                          1. Data Independence: the data base can change with little or no impact on the user programs
                                          2. Data Sharing: must have coordinated simultaneous access. Concurrency control mechanism.
                                          3. Maintenance of Data Integrity: DBMS helps enforce certain consistency constraints (i.e. coordinate has both lat and long, # of seats sold on an airplane <= # seats on plane)
                                          4. Security: DBMS provides mechanism for security/authorization from disclosure/destruction of data.
                                          5. Centrality of Control: DB administrator to resolve conflicts and meet user requirements
                                          6. Reduce Application Development Time

                                          3- E-R data model: a conceptual data model in which information is represented by entities and relationships between entities

                                          a. entity - a distinguishable object in the real world (people, forest stand, watershed, . etc.)
                                          b. relationship - a correspondence or association between two or more entities.
                                          c. attributes - the properties which describe an entity.
                                          d. functionality - how many entities from one entity set can be associated with another set
                                          e. primary key - main key for entity identification, one record per indexed attribute.
                                          f. secondary key - may have multiple record occurrences per index attribute.

                                          + easy to update and expand.
                                          + easy data access for keys.
                                          + ideal for data that is inherently hierarchical.
                                          - poor access for associated attributes.
                                          - Restrictive paths.
                                          - one to many relationships

                                          + reduces redundancy.
                                          + more flexible paths to data.
                                          + very fast
                                          - pointers expensive and difficult to update when inserting and deleting.

                                          7- Relational data model: data stored as records known as tuples grouped together in two-dimensional tables known as relations. Whereas hierarchical structures rely on the hierarchy and networks depend on pointers to associate entities, the relational model uses data redundancy in the form of unique keys that identify records in each file. Simplifies data maintenance because data for an entity type is stored in simple tables. Relational joins are used to cross reference entities using a primary key in one table and a foreign key in another table. Thus, in order to perform relational joins there needs to be at least one column in common between tables being related.

                                          The relational model is design to reduce redundancy of data whenever possible. A set of rules called the normal forms were developed by Codd (1970) to guide this process.

                                          + structures very flexible.
                                          + boolean logic and math operations.
                                          + insert and delete easy.
                                          - often use sequential search unless previously sorted.

                                          8- Integration of DBMS with the spatial data models

                                          IV. Data Models for Spatial Data

                                          Data structures are complex for GIS because they must include information pertaining to entities with respect to: position, topological relationships, and attribute information. It is the topologic and spatial aspects of GIS that distinguish it from other types of data bases.

                                          1. Introduction: There are presently three types of representations for geographic data: raster vector, and objects.

                                            - set of cells on a grid that represents an entity (entity --> symbol/color --> cells). -an entity is represented by nodes and their connecting arc or line segment (entity --> points, lines or areas --> connectivity)
                                      • object – an entity is represented by an object which has as one of its attributes spatial information.
                                      • Definition: realization of the external model which sees the world as a continuously varying surface (field) through the use of 2-D Cartesian arrays forming sets of thematic layers. Space is discretized into a set of connected two dimensional units called a tessellation.

                                        1. Each overlay is a 2-D matrix of points carrying the value of a single attribute.
                                        2. Each point is represented by a vertical array in which each array position carries a value of the attribute associated with the overlay. - each mapping unit has the coordinates for cell in which it occurs (greater structure, many to one relationship).

                                        Vertical array not conducive to compact data coding because it references different entities in sequence and it lacks many to one relationship. The third structure references a set of points for a region (or mapping unit) and allows for compaction.

                                        + reduced storage.
                                        + area, perimeter, shape est.
                                        - overlay difficult.

                                        + reduce storage.
                                        - overlay difficult.

                                        + reduced storage.
                                        + U & I of regions easy.

                                        Definition:realization of the discrete model of real world using structures for storing and relating points, lines and polygons in sets of thematic layers.

                                          1. represents an entity as exact as possible.
                                          2. coordinate space continuous (not quantized like raster).
                                          3. Structured as a set of thematic layers
                                            1. Point entities: geographic entities that are positioned by a single x,y coordinate. (historic site, wells, rare flora. The data record consists for x,y - attribute.
                                            2. Line Entity: (rivers, roads, rail)
                                              1. all linear feature are made up of line segments.
                                              2. a simple line 2 (x,y) coordinates.
                                              3. an arc or chain or string is a set of n (x,y) coordinate pairs that describe a continuous line. The shorter the line segments the closer the chain will approximate a continuous curve. Data record n(x,y).
                                              4. a line network gives information about connectivity between line segments in the form of pointers or relations contained in the data structure. Often build into nodes pointers to define connections and angles indicating orientation of connections (fully defines topology).

                                              3. Area Entity: data structures for storing regions. Data types, land cover, soils, geology, land tenure, census tract, etc.

                                                1. Cartographic spaghetti or "connect the dots". Early development in automated cartography, a substitute for mechanical drawing. Numerical storage, spatial structure evident only after plotting, not in file.
                                                  • describe each entity by specifying coordinates around its perimeter.
                                                  • shared lines between polygons.
                                                  • polygon sliver problems.
                                                  • no topology (neighbor and island problems).
                                                  • error checking a problem.
                                        • unique points for entire file, no sharing of lines as in location lists (eliminate sliver problem) but still has other problems.
                                        • expensive searches to construct polygons.

                                        d. Dime Files (Dual Independent Mapping and Encoding)

                                          • designed to represent points lines and areas that form a city though a complete representation of network of streets and other linear features.
                                          • allowed for topologically based verification.
                                          • no systems of directories linking segments together (maintenance problem).
                                            • same topological principles as the DIME system.
                                            • DIME defined by line segments, chains based on records of uncrossed boundary lines (curved roads a problem for DIME).
                                            • chains or boundaries serve the topological function of connecting two end points called a node and separating two zones.
                                            • points between zones cartographically not topologically required (generalization possible).
                                            • solves problems discussed above (neighbor, dead ends, weird polygons).
                                            • can treat data input and structure independently.

                                            Definition: realization of the discrete model of real world using an object centered approach in which an object has both physical (attribute) and geometric characteristics. Different types of objects can interact because they are not confined to separate layers.

                                            The biggest single difference between the object-oriented conceptual model and the vector-layered based conceptual model, for representing geographic information, is that in the object model, the real world object is the basis for abstraction, not its geometry. In other words, the objects not the geometric components of layers are the "units" for modeling and interactions