PointStacker - Random locations at different zoom level
I'm applying PointStacker transformation on one of my layers using Geoserver 2.5.2 and OpenLayers 2.
I know, that I have 7 datapoints located on the exact same location. However, when I change the zoom level the stacked point is moving around randomly. I would've thought that the stacked point's location is calculated by the mean of the vector points being stacked. In other words, if they are located at the same location, it shouldn't move around changing zoom levels.
What am i doing wrong?
zoom level 7:
zoom level 8:
zoom level 9:
zoom level 10:
zoom level 11:
Your assumption about how PointStacker calculates the cluster point's location is incorrect. It doesn't use centre of mass in the way you'd expect.
The bounds of the images request are broken up into a grid and clusters are created for each cell of that grid. So if you have what looks to you like a natural cluster, e.g. 3 points close to each other, they will only be clustered together if all 3 are in the same grid cell. If they fall into 3 grid cells they will be assigned to 3 separate clusters.
As you zoom in the points do not land in the same grid cell as in the previous image and therefore the cluster is given a new location relative to the image's width and height.
This is not the most intuitive behaviour but it does make for a much simpler clustering algorithm. You can see the code here: https://github.com/geotools/geotools/blob/master/modules/unsupported/process-feature/src/main/java/org/geotools/process/vector/PointStackerProcess.java
I believe that this is caused by the cellSize parameter.
If you set this parameter to 1 (the locations you want to aggregate are exactly the same then the marker should be displayed at that location.
If you increase the cellSize the marker will be placed somewhat randomly inside a buffer defined by the cellSize in pixel.
A Place-Oriented, Mixed-Level Regionalization Method for Constructing Geographic Areas in Health Data Dissemination and Analysis
Similar geographic areas often have great variations in population size. In health data management and analysis, it is desirable to obtain regions of comparable population by decomposing areas of large population (to gain more spatial variability) and merging areas of small population (to mask privacy of data). Based on the Peano curve algorithm and modified scale-space clustering, this research proposes a mixed-level regionalization (MLR) method to construct geographic areas with comparable population. The method accounts for spatial connectivity and compactness, attributive homogeneity, and exogenous criteria such as minimum (and approximately equal) population or disease counts. A case study using Louisiana cancer data illustrates the MLR method and its strengths and limitations. A major benefit of the method is that most upper level geographic boundaries can be preserved to increase familiarity of constructed areas. Therefore, the MLR method is more human-oriented and place-based than computer-oriented and space-based.
Spatial clustering or regionalization methods are commonly used in geographic information systems (GIS) and public health for confirmatory or exploratory purposes (Cromley and McLafferty 2012b). Clustering has two different definitions and both are well accepted: partitioning, which assigns a unique cluster membership to any location in the study area, and nonpartitioning (i.e., identifying cluster centers), which does not have an inclusive requirement for all places (Neuberger and Lynch 1982 Hanson and Wieczorek 2002 Szwarcwald, Andrade, and Bastos 2002 Oliver et al. 2006 Schootman et al. 2007 Shishehbor et al. 2008 Moore et al. 2009 Nelson et al. 2009). This article focuses on regionalization, but some discussions also use the term clustering as a convention in the literature. A challenge for many of these methods is not the development of algorithm, computation, or technical implementation but, rather, making sense of or interpreting the findings. Meaningful results are not just about the size and shape of clusters but the clusters’ alignment with existing zonings, particularly boundaries of major geographic units. A fundamental purpose of regionalization is to group and simplify data, not to introduce further complexity by adding more boundaries that are not recognizable by administrators, public practitioners, or the general public.
“Place is security, space is freedom” (Tuan 1977, 3). Tuan's (1974, 1977, 2012) humanist geography approach has influenced generations of geographers by clarifying the relationship between place and space. Tuan illustrated the functions of boundary as bounding place to space such as an Eskimo's sense (or attachment) of trading locations and hunting space (Carpenter, Varley, and Flaherty 1959), and identified space as place with familiar landmarks and paths that are often seen as boundaries. Our regionalization method is inspired by this conceptualization of “place + space + identity + attachment” by geographers (Tuan 1974, 1977 Sack 1980, 2003 Adams, Hoelscher, and Till 2001). Yiannakoulias (2011) advocated a “placefocused” or “place-informed” approach to incorporate locally relevant factors in all aspects of human activities into forming places or regions for meaningful public health surveillance of spatial aberrations. Space is more general and abstract, and place is more attached to people and the environment. Although many regionalization methods are space-oriented, this research is designed to develop a place-oriented regionalization or clustering method that preserves major geopolitical boundaries as a key element of identity and attachment.
Boundaries are important for maintaining the familiarity and hierarchy in a map (Lloyd and Steinke 1986). Geographic, cartographic, and psychological research has shown that map readers organize and process their spatial memory hierarchically in clusters, and rely on familiar features to interpret and understand map contents (McNamara, Hardy, and Hirtle 1989 Rittschof et al. 1996 Fotheringham and Curtis 1999 Jones et al. 2004) and spatial characteristics of the environment (Hirtle and Jonides 1985). Boundary plays an interrelated role in psychological and geographical compartmentalization (Sack 2003). Boundaries and bordering are also discussed in the context of calculable space, place, security, and territory (Rose-Redwood 2012). Geographic data are provided in a hierarchical way using units of state, county, census tract, and others, and boundaries of these units serve as an essential reference to familiarity. In addition to geopolitical units, it is also important to keep other geographic boundaries, within which underlying forces and processes under study differ. For example, in F. Wang, Guo, and McLafferty (2012), a regionalization method is applied to areas of distinctive urbanicity categories separately to preserve their boundaries.
Population size usually varies substantially across areas at the same level. In public health data analysis and dissemination, it is often desirable to obtain regions of comparable population (F. Wang, Guo, and McLafferty 2012). Areas of large population need to be decomposed to gain more spatial variability, and areas of small population need to be merged to protect geoprivacy. Would keeping upper level geographic boundaries make a regionalization method more place-oriented? For example, if the data are available at the census tract level, should county boundaries be preserved as much as possible in regionalization? This research proposes a place-oriented, mixed-level regionalization (MLR) or spatial clustering method. Specifically, the conceptualization of “place = space + identity + attachment” is addressed twofold. As boundary serves as an important identifier for places, our method aims to preserve the boundaries of upper level geographic units and minimize operations at the lower level. Attachment is accounted for by imposing a constraint of attributive similarity on the regionalization method. By doing so, the resulting regions still look familiar or recognizable.
When working with health data, geoprivacy is a common concern that leads to aggregating individual data to area units. The overall objective of this research is to develop a regionalization method for disseminating and analyzing health data accounting for not only commonly considered spatial compactness and attributive homogeneity but also familiarity and geoprivacy. This description serves as an overarching problem statement, and detailed settings are illustrated by a case study of health (specifically cancer) data. It can certainly benefit any studies that involve the small population problem, including crime analysis (F. Wang and O𠆛rien 2005).
This is a practical guide for georeferencing. It describes the protocols to determine the shapes of features and how to use them as the basis for georeferencing with the point-radius georeferencing method (Wieczorek 2001, Wieczorek et al. 2004, Chapman & Wieczorek 2020) using the Georeferencing Calculator (Wieczorek & Wieczorek 2020), and its associated Georeferencing Calculator Manual (Bloom et al. 2020), maps, gazetteers, and other resources from which coordinates and spatial boundaries for places can be found. This document is a citable georeferencing protocol. If a derived protocol is used, a new document with attribution to this Guide should be made publicly available and cited.
This Guide is based on a first version of the Guide (Wieczorek et al. 2012)), which was in turn an adaptation of Georeferencing for Dummies (Spencer et al. 2008). It explains the recommended georeferencing procedures for the most commonly encountered type of localities. This Guide should be used in parallel with the Georeferencing Best Practices (Chapman & Wieczorek 2020), which contains the theoretical background and more detailed information about concepts used here.
Underlined terms throughout this document (e.g. accuracy) link to definitions in the Glossary. Terms in italics (e.g. Input Latitude ) refer to fields and/or labels in the Georeferencing Calculator (Wieczorek & Wieczorek 2020) (hereafter referred to as 'the Calculator'). Darwin Core terms are displayed in monospace (e.g. dwc:georeferenceRemarks) in all GBIF digital documentation.
At the end of this document is a Georeference Quick Reference Guide Key to Locality Types, which contains a quick summary of the protocols for the most common locality types, described in detail in the sections of this guide.
This document provides guidance on how to georeference using the point-radius method. This Guide also provides the methods for determining the boundaries of features, which form the basis of the shape georeferencing method.
1.2. Target Audience
This document is a practical guide for anyone who needs to georeference textual locality descriptions so that they can be used in spatial filtering or analysis in research, education, or the maintenance of biological collections data.
This document is one of three that cover recommended requirements and methods to georeference locations. It provides a practical how-to guide for putting the theory of the point-radius georeferencing method into practice.
The Guide relies on the Georeferencing Best Practices (Chapman & Wieczorek 2020) for background, definitions, and more detailed explanations of the theory behind the methods and calculations found here and in the Calculator.
These documents DO NOT provide guidance on georectifying images or geocoding street addresses – distinct operations that are sometimes called "georeferencing".
1.4. Changes from Previous Version
This version of the Guide is a complete remake of its previous edition, reorganized and augmented to include graphical examples of each type of location and detailed steps for how to georeference them. There have been a few changes in terminology since the previous edition, which include:
Extent in the previous version has been changed to radial. Extent, where retained, is used in a more traditional way to mean the entire space within a location.
"Named place" has been replaced with feature.
Where the geographic center was recommended in the past, corrected center based on the geographic radial is now used. This is an important change because the geographic center did not necessarily yield the smallest uncertainty due to the extent of a feature the corrected center and geographic radial does.
1.5. Using Darwin Core
Georeferences using the methods in this Guide will be of greatest value if as much information as possible is captured about and during the georeferencing process. The Darwin Core standard (Darwin Core Maintenance Group 2020) defines all of the fields recommended for the capture of reproducible georeferences, as follows:
the combination of these fields provide the reference for the center of the point-radius representation of the georeference.
The horizontal distance in meters from the given decimalLatitude and decimalLongitude that describes the smallest enclosing circle that contains the whole of the location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term. This term corresponds with the geographic radial of the final georeference.
the individual(s) who last modified the georeference and when that happened. These correspond to the final authority on the georeference in its current state, regardless of who might have worked on previous versions of the georeference.
A description or reference to the methods used to determine the shape using the shape georeferencing method, or the coordinates and uncertainty using the point-radius method. If the protocol in this Guide is used unaltered, then the georeferenceProtocol should be the citation for this document.
A list (concatenated and separated) of maps, gazetteers, or other resources used to georeference the location, described specifically enough to allow anyone in the future to use the same resources.
USGS 1:24000 Florence Montana Quad 1967
Terrametrics 2008 Google Earth
Wieczorek C & J Wieczorek (2020) Georeferencing Calculator, version yyyymmdd. Available: http://georeferencing.org/georefcalculator/gc.html.
A categorical description of the extent to which the georeference has been verified to represent the best possible spatial description. Recommended best practice is to use a controlled vocabulary. Note that this verification could only be performed in relation to the occurrence or event that is being recorded.
verified by collector
verified by curator
Notes or comments out of the ordinary about the georeference, explaining assumptions made in addition or opposition to those formalized in the method referred to in georeferenceProtocol.
assumed distance by road (Hwy. 101)
Notes or comments of interest about the location (not the georeference of the location, which go in georeferenceRemarks).
Villa Epecuen was inundated in November 1985 and ceased to be inhabited until 2009
1.6. Georeferencing Concepts
One of the goals of georeferencing following best practices is to be sure that enough information is provided in the output so that the georeference is repeatable (see Principles of Best Practice in Georeferencing Best Practices (Chapman & Wieczorek 2020)). To that end, this document provides a set of recipes for georeferencing various locality types using the Georeferencing Calculator. The Calculator allows you to make distinct kinds of calculations based on the locality type (§1.6.1). When the locality type is chosen from the predefined list, the Calculator presents input boxes for all of the parameters needed for that type of calculation. Note that the locality type is for the most specific clause in the locality description (see Parsing the Locality Description in Georeferencing Best Practices (Chapman & Wieczorek 2020)). However, there may be information for other clauses or other parts of the location record that help to constrain the location and come into play when a feature boundary is determined. Many Calculator parameters are used for more than one locality type. Rather than repeat the explanation for each locality type, they are collected here for common reference. Some locality types require specific parameters, for which the corresponding explanations are included in each subsection of §2. Refer to the Georeferencing Calculator Manual (Bloom et al. 2020) for details about the Calculator not answered in this document.
1.6.1. Locality Type
The locality type refers to the pattern of the most specific part of a locality description to be georeferenced – the one that determines which calculation method to use. The Calculator has options to compute georeferences for six basic locality types:
Distance along orthogonal directions
Selecting a locality type will configure the Calculator to show all of the parameters that need to be set to perform the georeference calculation. This Guide gives specific instructions for how to set the parameters for many different examples of each of the locality types.
1.6.2. Corrected Center
The corrected center is the point within a location, or on its boundary, that minimizes the geographic radial (see §1.6.3). This point is obtained by finding the smallest enclosing circle that contains the entire feature, and then taking the center of that circle (Figure 1A). If that center does not fall on or inside the boundaries of the feature, find the smallest enclosing circle that contains the entire feature, but has its center on the boundary of the feature (Figure 1B). Note that in the corrected case, the new circle, and hence the radial, will always be larger than the uncorrected one. In the Calculator, the coordinates corresponding to the corrected center are labelled as Input Latitude and Input Longitude . See Appendix B: Methods to Find the Corrected Center and Geographic Radial for techniques to determine the corrected center.
1.6.3. Radial of Feature
A feature is a place in the locality description that has an extent and can be delimited by a boundary. The geographic radial of the feature (shown as Radial of Feature in the Calculator) is the distance from the corrected center of the feature to the furthest point on the geographic boundary of that feature (see Figure 1 and Extent of a Location in Georeferencing Best Practices (Chapman & Wieczorek 2020)). Note that the radial was called "extent" in earlier documents and versions of the Calculator. See Appendix B: Methods to Find the Corrected Center and Geographic Radial for techniques to determine the geographic radial.
|The final georeference will have a geographic radial distinct from the geographic radial of any of the features in the locality description (because it will also encompass all sources of uncertainty), and once the calculation is performed, this will be displayed in the output from the Calculator in the Uncertainty field.|
Labelled as Input Latitude in the Calculator. The geographic coordinate north or south of the equator (where latitude is 0) that represents the starting point for a georeference calculation and depends on the locality type.
Latitudes in decimal degrees north of the equator are positive by convention, while latitudes to the south are negative. The Calculator supports three degree-based geographic coordinate formats for latitude and longitude: decimal degrees (e.g. −41.0570673), degrees decimal minutes (e.g. 41° 3.424") and degrees minutes and seconds (e.g. 41° 3' 25.44" S).
Labelled as Input Longitude in the Calculator. The geographic coordinate east or west of the prime meridian (an arc between the north and south poles where longitude is 0) that represents the starting point for a georeference calculation and depends on the locality type.
Longitudes in decimal degrees east of the prime meridian are positive by convention, while longitudes to the west are negative. The Calculator supports three degree-based geographic coordinate formats for latitude and longitude: decimal degrees (−71.5246934), degrees decimal minutes (71° 31.482") and degrees minutes and seconds (71° 31' 28.90" W).
1.6.6. Coordinate Source
The Coordinate Source is the type of resource (map type, GPS, gazetteer, locality description) from which the starting Input Latitude and Longitude were derived.
|More often than not, the original coordinates are used to find the general vicinity of the location on a map, after which the process of determining the corrected center provides the new coordinates. The Coordinate Source to use in the Calculator in this case is the map from which the corrected center was determined, not the original source used to determine the general vicinity on the map. For example, suppose the original coordinates came from a gazetteer, but the boundary and corrected center of the feature were determined from Google Maps, the Coordinate Source would be "Google Earth/Maps 2008", not "gazetteer".|
This term is related to, but NOT the same as, the Darwin Core term georeferenceSources, which requires the specific resources used rather than their type. Note that the uncertainties from the two sources gazetteer and locality description can not be anticipated universally, and therefore do not contribute to the global uncertainty in the calculations. If the error characteristics of these sources are known, they can be added in the Measurement Error field before calculating. If the source GPS is selected, the label for Measurement Error will change to GPS Accuracy , which is where the accuracy of the GPS (see Using a GPS in Georeferencing Best Practices (Chapman & Wieczorek 2020)) at the time the coordinates were taken should be entered.
1.6.7. Coordinate Format
The Coordinate Format in the Calculator defines the representation of the original geographic coordinates (decimal degrees, degrees minutes and seconds (DMS) or degrees decimal minutes) of the coordinate source.
|When the calculation type is not “Coordinates only”, the original coordinates are often used to find the general vicinity of the location on a map, after which the process of determining the corrected center provides the new coordinates. The Coordinate Format to use in the Calculator in this case is the coordinate format on the map from which the corrected center was determined, not the coordinate format of the original source used to determine the general vicinity on the map. For example, suppose the original coordinates came from a gazetteer in DMS, but the boundary and corrected center of the feature were determined from Google Maps, the Coordinate Format would be decimal degrees, not DMS.|
This term is only equivalent to the Darwin Core term verbatimCoordinateSystem if no conversions had to be performed from the original source to the format used in the Input Latitude and Input Longitude (e.g. if the original coordinates were UTM and you had to convert them to DMS, then the Coordinate Format in the Calculator will be DMS, but the verbatimCoordinateSystem will be UTM. Selecting the original coordinate format allows the coordinates to be entered in their native format and forces the Calculator to present appropriate options for coordinate precision. Changing the coordinate format will automatically reset the coordinate precision value to nearest degree. Be sure to correct this for the actual coordinate precision. The Calculator stores coordinates in decimal degrees to seven decimal places. This is to preserve the correct coordinates in all formats regardless of how many coordinate transformations are done.
1.6.8. Coordinate Precision
Labelled in the Calculator as Precision in the first column of input parameters, this drop-down list is populated with levels of precision in keeping with the coordinate format chosen. For example, with a Coordinate Format of degrees minutes seconds , an Input Latitude of 35° 22' 24" N and an Input Longitude of 105° 22' 28" W, the Coordinate Precision would be nearest second . A value of exact is any level of precision higher than the otherwise highest precision given on the list. Sources of coordinate precision may include paper or digital maps, digital imagery, GPS, gazetteers, or locality descriptions.
|The Coordinate Precision to use in the Calculator is the coordinate precision of the source from which the corrected center was determined, not the coordinate precision of the original source used to determine the general vicinity on the map. For example, suppose the original coordinates came from a gazetteer, but the boundary and corrected center of the feature were determined from Google Maps, the Coordinate Precision would be determined by the number of digits of decimal degrees you captured from the corrected center on Google Maps, not the Coordinate Precision of the coordinates from the original gazetteer entry. If you use all of the digits provided on Google Maps, the Coordinate Precision would be "exact".|
|This term is similar to, but NOT the same as, the Darwin Core term coordinatePrecision, which applies to the output coordinates.|
Defines the position of the origin and orientation of an ellipsoid upon which the coordinates are based for the given Input Latitude and Longitude (see coordinate reference system).
|The Datum to use in the Calculator is the datum (or ellipsoid) of the source from which the corrected center was determined. For example, suppose the original coordinates came from a gazetteer with an unknown datum, but the boundary and corrected center of the feature were determined from Google Maps, the Datum would be "WGS84", not "datum not recorded."|
The term Datum in the Calculator is equivalent to the Darwin Core term geodeticDatum. The Calculator includes ellipsoids on the Datum drop-down list, as sometimes that is all that coordinate source shows. The choice of datum in the Calculator has two important effects. The first is the contribution to uncertainty if the datum of the input coordinates is not known. If the datum and ellipsoid are not known, datum not recorded must be selected. Uncertainty due to an unknown datum can be severe and varies geographically in a complex way, with a worst-case contribution of 5359 m (see Coordinate Reference System in Georeferencing Best Practices (Chapman & Wieczorek 2020)). The second important effect of the datum selection is to provide the characteristics of the ellipsoid model of the earth, on which the distance calculations depend.
The Direction in the Georeferencing Calculator is the heading given in the locality description, either as a standard compass point (see Boxing the compass) or as a number of degrees in the clockwise direction from north. True North is not the same as Magnetic North (see Headings in Georeferencing Best Practices (Chapman & Wieczorek 2020)). If a heading is known to be a magnetic heading, it will have to be converted into a true heading (see NOAA’s Magnetic Field Calculator) before it can be used in the Calculator. If degrees from N is selected, a text box will appear to the right of the selection, into which the degree heading should be entered.
|Some marine locality descriptions reference a direction (azimuth) toward a landmark rather than a heading from the current location (e.g. "327° to Nubble Lighthouse"). To make a Distance a heading calculation for such a locality description, use the compass point 180 degrees from the one given in the locality description (147° in the example above) as the Direction.|
1.6.11. Offset Distance
The Offset Distance in the Calculator is the linear surface distance from a point of origin. Offsets are used for the Locality Types Distance at a heading and Distance only . If the Locality Type Distance along orthogonal directions is selected, there are two distinct offsets:
The distance to the north or south (set with the selection box to the right of the distance text box) of the Input Latitude .
East or West Offset Distance
The distance to the east or west (set with the selection box to the right of the distance text box) of the Input Longitude .
1.6.12. Distance Units
The Distance Units selection denotes the real world units used in the locality description. It is important to select the original units as given in the description. This is needed to incorporate the uncertainty from Distance Precision properly. If the locality description does not include distance units, use the distance units of the map from which measurements are derived.
select mi for "10 mi E (by air) Bakersfield"
select km for "3.2 km SE of Lisbon"
select km for measurements in Google Maps where the distance units are set to km.
|All distances used in a given calculation must use the same units. For example, if an offset distance was given in miles in the locality description, when entering the radial value, you must do so in miles.|
1.6.13. Distance Precision
The Distance Precision , labelled in the Calculator as Precision in the second column of input parameters, refers to the precision with which a distance was described in a locality (see Uncertainty Related to Offset Precision in Georeferencing Best Practices (Chapman & Wieczorek 2020)). This drop-down list is populated based on the Distance Units chosen and contains powers of ten and simple fractions to indicate the precision demonstrated in the verbatim original offset.
select 1 mi for "6 mi NE of Davis"
select ¼ km for "3.75 km W of Hamilton"
1.6.14. Measurement Error
The Measurement Error accounts for error associated with the ability to distinguish one point from another using any measuring tool, such as rulers on paper maps or the measuring tools on Google Maps or Google Earth. The units of measurement must be the same as those in the locality description as captured in Distance Units (see §1.6.12). The Distance Converter at the bottom of the Calculator is provided to aid in changing a measurement to the locality description units. For example, a reasonable value for measurement error on a map is 1 mm, which on a map of 1:24,000 scale would be 24 m.
1.6.15. GPS Accuracy
When GPS is selected from the Coordinate Source drop-down list, the label for the Measurement Error text box changes to GPS Accuracy . We recommend entering a value that is at least twice the value given by the GPS at the time the coordinates were captured (see Uncertainty due to GPS in Georeferencing Best Practices (Chapman & Wieczorek 2020). If GPS Accuracy is not known, enter 100 m for standard hand-held GPS coordinates taken before 1 May 2000 when Selective Availability was discontinued. After that, use 30 m as a conservative default value.
The Uncertainty in the Calculator is the calculated result of the combination of all sources of uncertainty (coordinate precision, unknown datum, data source, GPS accuracy, measurement error, feature extent, distance precision and heading precision) expressed as a linear distance – the geographic radial of the georeference and the radius in the point-radius method (Wieczorek et al. 2004). Along with the Output Latitude , Output Longitude , and Datum , the radius defines a circle containing all of the possible places a locality description could mean. In the Calculator the Uncertainty is given in meters.
Geospaital functionality in Atlas: integration of AEGIS #649
Motivation: During the 2018 OHDSI Symposium, Washington DC USA - J Cho, SC You, K Kim, Y Soh, D Kim, RW Park - presented a software demonstration called 'Application for Epidemiological Geographic Information System (AEGIS) - An open
source spatial analysis tool based on CDM' . See AEGIS under related work.
AEGIS, was built to support 5.x version of the OMOP CDM. This version did not have latitude and longitude in location table. AEGIS developers used observation and fact_relationship table to design AEGIS using CDM 5.x. The OMOP CDM 6+ (released October 2018) has two location tables (location and location_history). The location table has fields for latitude and longitude. These new fields may be used to represent precise location of persons, providers or care_sites.
During the presentation, a decision was made to upgrade AEGIS to support CDM 6.x with new location table, and to evaluate if it was possible to integrate AEGIS like functionality, and the work of the OHDSI GIS workgroup into ATLAS.
Background: Spatial epidemiology is the description, analysis or surveillance of a populations health related factors such as medical service, diseases, in relation to other person level or area level factors like demographic, environmental exposure, behavioral determinants, socio-economic indicators, genetic and infectious risk factors. Two types of spatial epidemiology are discussed below.
Descriptive mapping, widely used in spatial epidemiology, is useful for establishing initial hypotheses about the patterns of incidence/prevalence in an area, or the correlation between exposure to specific factors and disease.
Cluster detection is a more advanced statistical method that may reveal geographic clusters, based on patterns and spatial correlation.
The text was updated successfully, but these errors were encountered:
We are unable to convert the task to an issue at this time. Please try again.
The issue was successfully created but we are unable to update the comment at this time.
Lidar Base Specification v. 2.1: Glossary
(h) is equal to the ellipsoid height and
(N) is equal to the geoid height.
(n) is the rank of the observation that contains the Pth percentile,
(P) is the proportion (of 100) at which the percentile is desired (for example, 95 for 95th percentile), and
(N) is the number of observations in the sample dataset.
Once the rank of the observation is determined, the percentile (Qp) can then be interpolated from the upper and lower observations using the following equation:
Qp is the Pth percentile the value at ran n
A is the array of the absolute values of the samples, indexed in ascending order from 1 to N
A[i] is the sample value of array A at index i (for example, nw or nd). i must be an integer between 1 and N
n is the rank of the observation that contains the Pth percentile
nw is the whole number component of n (for example, 3 of 3.14) and
nd is the decimal component of n (for example, 0.14 of 3.14).
RSMEx is the RMSE in the x direction, and
RSMEY is the RMSE in the y direction.
xn is the set of N x coordinates being evaluated,
x'n is the corresponding set of check point x coordinates for the points being evaluated,
N is the number of x coordinate check points, and
n is the identification number of each check point from 1 through N.
yn is the set of N y coordinates being evaluated,
y'n is the corresponding set of check point y coordinates for the points being evaluated,
N is the number of y coordinate check points, and
n is the identification number of each check point from 1 through N.
zn is the set of N z values (elevations) being evaluated,
z'n is the corresponding set of check point elevations for the points being evaluated,
N is the number of z check points, and
n is the identification number of each check point from 1 through N.
PointStacker - Random locations at different zoom level - Geographic Information Systems
A handful of the many changes resulting from the Affordable Care Act underscore the need for a geographic understanding of existing and prospective member communities. Health exchanges require that health provider networks are geographically accessible to underserved populations, and nonprofit hospitals nationwide are required to conduct community health needs assessments every three years. Beyond these requirements, health care providers are using maps and spatial analysis to better address health outcomes that are related in complex ways to social and economic factors.
Kaiser Permanente is applying geographic information systems, with spatial analytics and map-based visualizations, to data sourced from its electronic medical records and from publicly and commercially available datasets. The results are helping to shape an understanding of the health needs of Kaiser Permanente members in the context of their communities. This understanding is part of a strategy to inform partnerships and interventions in and beyond traditional care delivery settings.
During the past decade, the use of geographic information systems (GIS) for mapping and spatial analytics has evolved at Kaiser Permanente (KP). With roots in care delivery facilities planning, GIS next became an important part of KP's effort to illuminate disparities in care and improve quality of care. More recently, the Patient Protection and Affordable Care Act (ACA) 1 is reinforcing the need for a geographic understanding of existing and prospective member communities, including health status and outcomes, access to care, and cultural preferences. 2 For example, state and federal health exchanges require evidence that health provider networks are geographically accessible to underserved populations. 3,4 The ACA also mandates that nonprofit hospitals conduct a community health needs assessment every three years. 5 Other health systems have similarly recognized the utility of GIS to understand primary care needs at the community level 6 and to galvanize multisector collaborations to better address health outcomes that are related in complex ways to social and economic factors. 7
This article highlights two recent projects required by the ACA in which GIS played an important role: 1) measuring network adequacy and 2) conducting community health needs assessments. We also outline a GIS-based approach that uses data from KP's electronic health record (EHR) to identify neighborhood-level spatial variation in the prevalence of chronic conditions. Developed as a complement to the community health needs assessment process, the resulting hot spot maps protect patient/member confidentiality, while showing that the variation in health outcomes is often spatially correlated with social determinants across the community. Last, we discuss other uses for hot spot mapping, geospatial analytics, and the evolving role of GIS in targeting community-based disease prevention and management efforts.
In health care organizations, great care must be taken when working with protected health information using any technology. The use of GIS technology is no exception, for reasons ranging from compliance with the Health Insurance Portability and Accountability Act (HIPAA) to preventing unethical targeting of groups on the basis of race, ethnicity, or sociodemographics. For these reasons, much of our efforts focus on protecting individual confidentiality when working with data from KP members' EHRs.
Measuring Network Adequacy and Accessibility
Health exchanges are an important vehicle for making health insurance available via the ACA. The application process requires health plans to report network adequacy in geographically specific ways. For example, the Qualified Health Plan application for California's Health Benefit Exchange required time (30 minutes) and distance (15 miles) calculations from low-income populations (≤ 200% of federal poverty level) to primary care physicians across all counties where the Health Plan would offer insurance. GIS tools were used to measure accessibility via the street network between the low-income population and KP care delivery locations. Although not originally required, KP's internal project team requested maps, which were ultimately submitted as part of the application. As an example, the map for San Diego County (Figure 1, enlarged, full-color version is available online at: www.thepermanentejournal.org/images/Spring2014/GIS1.jpg) indicates that very few low-income residents live beyond a 30-minute drive to KP primary care locations.
Similarly, the federal requirements measure access to care providers by focusing on high-need zip codes. These zip codes have been designated as a Health Professional Shortage Area by the US Department of Health and Human Services Health Resources and Services Administration or have a high percentage (≥ 30%) of the population living at or below 200% of the federal poverty level. The number of primary care physicians in the Health Plan who practice in or adjacent to these high-need zip codes are compared with the number of Essential Community Providers, as defined by the Health Resources and Services Administration. 8 This measure ensures that the Health Plan provides at-risk populations with sufficient geographic access to care providers, and GIS analysis was necessary to answer the question of zip code adjacency. Although measures of network adequacy may evolve in the face of more virtual access to care (eg, telemedicine, care coordination, and broadband access in rural areas), geographically based measures of network adequacy will continue to require GIS technology for accurate measurement and reporting.
Supporting Community Health Needs Assessment
Since 1994, the state of California has required that nonprofit hospitals develop and implement community health needs assessments. 9 Starting in 2013, the ACA requires community health needs assessments for nonprofit hospitals nationwide to be repeated every three years to identify changes in health needs. 1 This requirement aligns well with KP's mission to provide high-quality, affordable health care services and to improve the health of our members and the communities we serve.
Building on years of experience with community health needs assessments in California and inspired by the ACA mandate, KP conducted a project to support the community health needs assessment process. A crossfunctional team from KP identified indicators and benchmarks, developed toolkits to outline workflows, and partnered with the Institute for People, Place & Possibility in Columbia, MO, and the Center for Applied Research and Environmental Systems at the University of Missouri, Columbia, to build a Web-based reporting and mapping tool. The resulting data platform (www.CHNA.org/KP) streamlines access to a broad set of data indicators, helping planners to explore and to learn about the health needs of a community, and to produce tables, charts, interactive maps, and reports to communicate their findings. 10 The community health needs assessment indicators are organized into categories: demographics, social and economic factors (eg, crime, education, poverty), physical environment (eg, fast food, parks, and air quality), clinical care (eg, access to preventive care), health behaviors (eg, eating fruits and vegetables), and health outcomes (eg, diabetes prevalence). Together these indicators provide insight on health outcomes and clinical care as well as upstream factors that also have an impact on health. In partnership with the Centers for Disease Control and Prevention in Atlanta, GA, the Institute for People, Place & Possibility, and the Center for Applied Research and Environmental Systems, KP has provided the CHNA.org platform as a free GIS community asset to support community health needs assessment efforts nationwide. 11
Data challenges still exist, however. Although many states are recognizing the limitations of publicly available health data and taking initial steps to address these limitations (eg, All Payer All Claims Database in Oregon and California's Free the Data initiative), many important public health statistics are still reported only at the state or county levels. From a national perspective, these statistics provide useful benchmarks, as they can be trended over time and indicate regional variation. However, overaggregation can mask underlying disparities, 12 limiting efforts to target interventions and detect changes at the local level.
Mapping Neighborhood-Level Geographic Variation in Health Outcomes
In Summer 2012, we piloted an internal project to address the lack of neighborhood-level insight regarding health outcomes across seven KP Regions in eight states (CA, CO, GA, HI, MD, OR, VA, and WA) and the District of Columbia. We used data derived from KP's EHR to produce neighborhood-level hot spot maps of disease prevalence in KP member communities for high-impact chronic conditions: adult and child obesity, asthma, diabetes, heart disease, and hypertension. We also analyzed self-reported physical activity measures, referred to as "Exercise as a Vital Sign," for several Regions. To protect member privacy while providing actionable insights, we scored neighborhoods by how their prevalence rate compared with the regional KP average rate, but no absolute rates were communicated and no member-level data were presented.
Using GIS tools, we geocoded each member's home address and aggregated member-level health outcomes to the census tract, providing an initial level of protection for member/patient identifiable information. Regions of KP range in size from a few hundred thousand to more than 3 million members, representing up to 30% or more of the total population in some census tracts. Table 1 lists the number of tracts by Region. Although perhaps imperfect for our purposes, census tracts are intended to be socioeconomically homogeneous, and they have origins in public health applications. 13 This level of aggregation provided a balance between detailed geographic measurement, adequate sample size, and individual privacy. 12 After aggregating member-level chronic conditions data into census tract rates, we used a documented approach with origins in analysis of medication adherence 14 to determine 1) whether individual tract rates stood out compared with other tracts in the Region and then 2) whether there were entire neighborhood rates that stood out compared with the KP Region.
The analysis standardized rates across census tracts to account for variability in KP member density. The resulting tract-level standardized rates (Z scores) incorporate the number of members in each tract along with the rate to indicate how many standard deviations each tract rate is from the regional rate. This highlights individual tract rates that are statistically significantly different from the overall regional rate.
To determine if there were entire neighborhoods, or groups of census tracts, with significantly higher or lower rates, it was important to first define neighborhoods. Neighborhoods were ultimately defined around each census tract as either 1) all additional census tracts within two miles, for densely populated urban areas, or 2) the two additional closest census tracts, measured from the centroid, for sparsely populated rural areas. On the basis of these neighborhood definitions, we created models in ArcGIS for Desktop (Esri, Redlands, CA) to run the Hot Spot Analysis (Getis-Ord Gi* statistic) on the standardized tract rates for each chronic condition for each Region. The Getis-Ord Gi* statistic is a Z score identifying statistically significant spatial clustering of higher and lower values. When applied to the standardized chronic condition prevalence rates, the results identify multitract hot spots where neighborhood rates significantly differ from the overall regional rate. To further protect member confidentiality, we then recategorized the hot spot Z scores into a hot spot index value for each census tract, as specified in Figure 2. Each hot spot index value corresponds to a standard deviation and confidence interval. These classifications allowed us to share actionable relative prevalence data and maps, while completely masking the actual rates.
This method revealed neighborhoods with significant spatial clustering of similar tract rates, which were either significantly higher or lower than the regional average. These results indicated that the variation in some chronic conditions across KP member communities mirrored key drivers (eg, obesity and low educational attainment Figure 3, enlarged version is available online at: www.thepermanentejournal.org/images/Spring2014/GIS3.jpg). Although some of these associations have been previously described in published studies, 15-17 this approach systematically identified and quantified the geographic variation and generated compelling visualizations that both protected individual member data and were easily understood by nontechnical audiences.
Additional Cases Using Geographic Information Systems
The hot spot modeling method and maps described earlier were initially developed to complement a robust ACA-mandated community health needs assessment process, but KP clinicians are finding new uses for them. We have recently used this approach to 1) inform planning efforts for prediabetes interventions in Georgia and the Northwest, 2) support the case for investment in an at-home healthy meals delivery program for patients with heart failure after discharge in the San Francisco Bay Area, and 3) identify KP communities where fewer people get a flu vaccine to target efforts to increase vaccination rates in Southern California.
In the future, GIS could play a vital role in improving clinical operations. In the spirit of the work done at the University of Florida Family Data Center, we are mapping heart attack risk in KP member communities across San Diego to help target deployment of a mobile health van. 7 In addition, an early prototype has indicated some value for using GIS-based route planning tools to help optimize the work of home health care providers. Although this application is nascent at KP, related work has documented benefits such as reduced cost via reduced travel time for providers as well as improved patient satisfaction. 18
Care transformation is likely to happen on multiple scales, from the clinical care team to the community. Insight and information based on GIS could help by supplementing decision support for care teams, informing partnerships with planning and public health agencies, and empowering communities to improve their health collectively.
GIS can supplement decision support for clinical care teams. Care teams are increasingly prescribing walking as a therapy for chronic conditions. After-visit summaries or patient-facing tools could include suggestions for walking routes or other healthy lifestyle resources near the patient's home or work. GIS also have been used to investigate patterns of community-acquired methicillin-resistant Staphylococcus aureus, for which geographic area proved to be a significant risk factor for children presenting with this infection. 19 The authors suggested that this information could guide antibiotic selection before culture results are available. 19
GIS maps and analyses support a common language that can inform partnerships with local planning and public health agencies and affect policy change. Health in All Policies: A Guide for State and Local Governments outlines ways in which decisions made in sectors such as transportation, education, and economic development affect health. The policy suggests that "better health can support the goals of these multiple sectors." 20 Regional Equity Atlases, such as those available for Portland, OR Denver, CO and Atlanta, GA, provide another example of the use of GIS to communicate interrelationships between planning sectors, social determinants, and health outcomes that can help galvanize policy change.
Finally, and perhaps most important, GIS can help empower community members to improve their health collectively. Learning what is already working in some neighborhoods can inform strategies in neighborhoods that face similar social determinants. Increasingly, crowdsourcing is used to allow people to vote online, in a geographically specific way, on investments that are important to them. Portland Bike Share is one example. 21 The same could be done for understanding which neighborhood-level investments would help people become or stay healthy, be it a grocery store, improved park, or better access to transportation.
Use of GIS at KP has evolved over the years and has recently become important for regulatory aspects of health care reform related to network adequacy and community health needs assessment. As part of these efforts, we identified systemic variation in the prevalence of chronic conditions across KP member communities at the census tract and neighborhood levels. This geographic variation is not random, suggesting that geographically informed interventions may be part of a multifaceted solution. Furthermore, these results are generating interest in other parts of KP to understand the effects of place and to respond accordingly. These findings reinforce Ethan Berke, MD's call for "place as a vital sign." 22 GIS make it possible to give geographic context to data from an EHR, understand individual health in the context of community health, and begin to assess the importance of place as a vital sign. Within KP, the use of GIS is growing, results are compelling, and engagement is high.
The author(s) have no conflicts of interest to disclose.
The authors would like to thank Pamela Schwartz, MPH, and Jean Nudelman, MPH, for leadership in developing the Community Health Needs Assessment platform and for providing constructive feedback during refinement of the hot spot methodology.
Kathleen Louden, ELS, of Louden Health Communications provided editorial assistance.
1. The Patient Protection and Affordable Care Act of 2010. Public Law 111-148, 111th Congress, 124 Stat 119, HR 3590, enacted 2010 Mar 23.
2. Kehoe B. Mapping out care delivery with an assist from GIS. Hosp Health Netw 2011 Jan85(1):16-7.
3. California health benefit exchange [Internet]. Sacramento, CA: National Business Coalition on Health 2012 Nov 16 [amended 2012 Dec 28 cited 2013 Aug 27]. Available from: www.healthexchange.ca.gov/Solicitations/Documents/FINAL%20SOLICITATION%2011-16-12%20updated%2012-28-12.pdf. p 25.
4. Supplementary response: inclusion of essential community providers [Internet]. Baltimore, MD: Centers for Medicare & Medicaid Services 2013 Mar 8 [cited 2013 Aug 27]. Available from: www.cms.gov/CCIIO/Programs-and-Initiatives/Files/Downloads/ecp_supplemental_response_Form_03_08_13.pdf. p 3.
5. New requirements for 501(c)(3) hospitals under the Affordable Care Act [Internet]. Washington, DC: Internal Revenue Service updated 2013 Nov 7 [cited 2013 Aug 27]. Available from: www.irs.gov/Charities-&-Non-Profits/Charitable-Organizations/New-Requirements-for-501%28c%29%283%29-Hospitals-Under-the-Affordable-Care-Act.
6. Dulin MF, Ludden TM, Tapp H, et al. Using Geographic Information Systems (GIS) to understand a community's primary care needs. J Am Board Fam Med 2010 Jan-Feb23(1):13-21. DOI: http://dx.doi/org/10.3122/jabfm.2010.01.090135.
7. Hardt NS, Muhamed S, Das R, Estrella R, Roth J. Neighborhood-level hot spot maps to inform delivery of primary care and allocation of social resources. Perm J 2013 Winter17(1):4-9. DOI: https://doi.org/10.7812/TPP/12-090.
8. Essential community providers [Internet]. Rockville, MD: Health Resources and Services Administration, HIV/AIDS Programs 2013 [cited 2013 Dec 5]. Available from: http://hab.hrsa.gov/affordablecareact/ecp.html.
9. Not-for-profit hospital community benefit legislation. SB 697, CA Stat of 1994 ch 812, §127340-127365.
10. CHNA data platform: background [Internet]. Oakland, CA: Kaiser Permanente 2013 [cited 2013 Aug 27]. Available from: http://assessment.communitycommons.org/KP/Background.aspx.
11. Resources for implementing the community health needs assessment process [Internet]. Atlanta, GA: Centers for Disease Control and Prevention, Office of the Associate Director for Policy 2013 Sep 11 [cited 2013 Aug 27]. Available from: www.cdc.gov/policy/chna/.
12. Krieger N, Chen JT, Waterman PD, Soobader MJ, Subramanian SV, Carson R. Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?: the Public Health Disparities Geocoding Project. Am J Epidemiol 2002 Sep 1156(5):471-82. DOI: https://doi.org/10.1093/aje/kwf068.
13. Krieger N. A century of census tracts: health & the body politic (1906-2006). J Urban Health 2006 May83(3):355-61. DOI: https://doi.org/10.1007/s11524-006-9040-y.
14. Hoang C, Kolenic G, Kline-Rogers E, Eagle KA, Erickson SR. Mapping geographic areas of high and low drug adherence in patients prescribed continuing treatment for acute coronary syndrome after discharge. Pharmacotherapy 2011 Oct31(10):927-33. DOI: https://doi.org/10.1592/phco.31.10.927.
15. Link BG, Phelan J. Social conditions as fundamental causes of disease. J Health Soc Behav 1995Spec No:80-94.
Semantics Geographic information system_section_28
Tools and technologies emerging from the World Wide Web Consortium's Semantic Web are proving useful for data integration problems in information systems. Geographic information system_sentence_298
Correspondingly, such technologies have been proposed as a means to facilitate interoperability and data reuse among GIS applications. Geographic information system_sentence_299
and also to enable new analysis mechanisms. Geographic information system_sentence_300
Ontologies are a key component of this semantic approach as they allow a formal, machine-readable specification of the concepts and relationships in a given domain. Geographic information system_sentence_301
This in turn allows a GIS to focus on the intended meaning of data rather than its syntax or structure. Geographic information system_sentence_302
For example, reasoning that a land cover type classified as deciduous needleleaf trees in one dataset is a specialization or subset of land cover type forest in another more roughly classified dataset can help a GIS automatically merge the two datasets under the more general land cover classification. Geographic information system_sentence_303
Tentative ontologies have been developed in areas related to GIS applications, for example the hydrology ontology developed by the Ordnance Survey in the United Kingdom and the SWEET ontologies developed by NASA's Jet Propulsion Laboratory. Geographic information system_sentence_304
Also, simpler ontologies and semantic metadata standards are being proposed by the W3C Geo Incubator Group to represent geospatial data on the web. Geographic information system_sentence_305
GeoSPARQL is a standard developed by the Ordnance Survey, United States Geological Survey, Natural Resources Canada, Australia's Commonwealth Scientific and Industrial Research Organisation and others to support ontology creation and reasoning using well-understood OGC literals (GML, WKT), topological relationships (Simple Features, RCC8, DE-9IM), RDF and the SPARQL database query protocols. Geographic information system_sentence_306
Recent research results in this area can be seen in the International Conference on Geospatial Semantics and the Terra Cognita – Directions to the Geospatial Semantic Web workshop at the International Semantic Web Conference. Geographic information system_sentence_307
PointStacker - Random locations at different zoom level - Geographic Information Systems
To assess the impact of geographic health services factors on the timely diagnosis of autism.
Children residing in central North Carolina were identified by records-based surveillance as meeting a standardized case definition for autism. Individual-level geographic access to health services was measured by the density of providers likely to diagnose autism, distance to early intervention service agencies and medical schools, and residence within a Health Professional Shortage Area. We compared the presence of an autism diagnosis by age 8 and timing of first diagnosis across level of accessibility, using Poisson regression and Cox proportional hazards regression and adjusting for family and neighborhood characteristics.
Of 206 identified cases, 23% had no previous documented diagnosis of autism. Most adjusted estimates had confidence limits including the null. Point estimates across analyses suggested that younger age at diagnosis was found for areas with many neurologists and psychiatrists and proximal to a medical school but not areas with many primary care physicians or proximal to early intervention services agencies.
Further study of the distribution of medical specialists diagnosing autism may suggest interventions to promote the early diagnosis, and initiation of targeted services, for children with autism spectrum disorders.
Normal Hotspots vs Clumpy Ones in Raleigh
The open data I use for Raleigh, North Carolina for the NIBRS dataset goes back to June 2014, and has data updated in the beginning of March 2021. I pull out larcenies from motor vehicles, and for the historical train dataset use car larcenies from 2014 through 2019 (n = 17,681). For the test dataset I use car larcenies in 2020 and what is available so far in 2021 (n = 3,376). Again these are grid cells generated over the city boundaries at 500 by 500 foot intervals. For illustration I grab out the top 1% of the city (209 grid cells). I use a train/test dataset as out of sample test data will typically result in reduced predictions. Here are the PAI stats for train vs test when selecting the top 1%.
For all subsequent selections I always use the historical training data to select the hot spots, and the test dataset to evaluate the PAI.
If we do the typical approach of just taking the highest crime grid cells based on the historical data, here are the results both for the PAI and the CI (clumpy index). For those not familiar, PAI is % Crime Capture/% Area , so if the denominator is 1%, and the PAI (for the test data) is 17, that means the hot spots capture 17% of the total thefts from vehicles. The CI ranges from -1 (spread apart) to 1 (entirely clustered). Here it is just over 0, suggesting these are basically randomly distributed in terms of clustering.
You may think that almost spatial randomness in terms of clumping seems at odds with that crime clusters! But it is not really – a consistent relationship with crime hot spots is that they are intensely localized, and often you can go down the street and be in a low crime area (Harries, 2006). The same idea when people say high crime neighborhoods often are spotty interior – they tend to have mostly low crime areas and just a few specific hot spots.
OK, so now to show off my linear program. So what happens if we use theta=0.9 ?
The total crime numbers are here for the historical data, and it ends up capturing the exact same number of crimes as the select top 1% does (3,664). But, it switches the selection of one of the areas. So what happens here is that we have ties – even with basically little weight assigned to the interior connections, it will prioritize tied crime areas to be connected to other chosen hot spots (whereas before the ties are just random in the way I chose the top 1%). So if you have many ties at the threshold for your hot spot, this is a great way to prioritize particular tied areas.
What happens if we turn down theta to 0.5? So this is saying you would trade off one for one – one interior edge is equal to one crime.
You can see that it changed the selections slightly more here, traded off 24 areas compared to the original just rank solution. Lets check out the map and the CI:
The CI value is now 0.17 (up from 0.08). You can see some larger blobs, but it is still pretty spread apart. But the reduction in the total number of crimes captured is pretty small, going from a PAI of 17 to now a PAI of 16. How about if we crank down theta even more to 0.2?
This trades off a much larger number of areas and total amount of crime – over half of the chosen grid cells are flipped in this scenario. In the subsequent map you can see the hot spots are much more clumpy now, and have a CI of 0.64.
The PAI of 12.6 is a bit of a hit as well, but is not too shabby still. I typically take a PAI of 10 to be the ballpark of what is reasonable based on Weisburd’s Law of Crime Concentration – 5% of the areas contain 50% of the crime (which is a PAI of 10).
So this shows one linear programming approach to trade off clumpy chosen areas vs disconnected speckles over the map. It may be the case though that other approaches are more reasonable, such as using some type of clustering to begin with. E.g. I could use DBSCAN on the gridded predicted values (Wheeler & Reuter, 2020) as see how clumpy those hot spots are. This approach is nice though if you have a fixed area you want to cover though.
Revisiting the estimations of PM2.5-attributable mortality with advancements in PM2.5 mapping and mortality statistics
With the advancements of geospatial technologies, geospatial datasets of fine particulate matter (PM2.5) and mortality statistics are increasingly used to examine the health effects of PM2.5. Choices of these datasets with difference geographic characteristics (e.g., accuracy, scales, and variations) in disease burden studies can significantly impact the results. The objective of this study is to revisit the estimations of PM2.5-attributable mortality by taking advantage of recent advancements in high resolution mapping of PM2.5concentrations and fine scale of mortality statistics and to explore the impacts of new data sources, geographic scales, and spatial variations of input datasets on mortality estimations. We estimate the PM2.5-mortality for the years of 2000, 2005, 2010 and 2015 using three PM2.5 concentration datasets [Chemical Transport Model (CTM), random forests-based regression kriging (RFRK), and geographically weighted regression (GWR)] at two resolutions (i.e., 10 km and 1 km) and mortality rates at two geographic scales (i.e., regional-level and county-level). The results show that the estimated PM2.5-mortality from the 10 km CTM-derived PM2.5 dataset tend to be smaller than the estimations from the 1 km RFRK- and GWR-derived PM2.5 datasets. The estimated PM2.5-mortalities from regional-level mortality rates are similar to the estimations from those at county level, while large deviations exist when zoomed into small geographic regions (e.g., county). In a scenario analysis to explore the possible benefits of PM2.5 concentrations reduction, the uses of the two newly developed 1 km resolution PM2.5 datasets (RFRK and GWR) lead to discrepant results. Furthermore, we found that the change in PM2.5 concentration is the primary factor that leads to the PM2.5-attributable mortality decrease from 2000 to 2015. The above results highlight the impact of the adoption of input datasets from new sources with varied geographic characteristics on the PM2.5-attributable mortality estimations and demonstrate the necessity to account for these impact in future disease burden studies.
We revisited the estimations of PM2.5-attributable mortality with advancements in PM2.5 mapping and mortality statistics, and demonstrated the impact of geographic characteristics of geospatial datasets on mortality estimations.