Exporting GeoTIffs from Data Driven Pages for use on Tablet?

Exporting GeoTIffs from Data Driven Pages for use on Tablet?

I'm a beginner at ArcGIS 10.2 and have encountered a small problem that I couldn't solve by googling. I'm trying to export a set of raster and vector data for use on a tablet. The aim is to find LiDAR features in the field using the GPS from the tablet and I want to be able to switch between various visualizations and the polygons and lines of the interpretation.

For this purpose I try to batch create GeoTiffs with various visualizations of the data and export them using Data Driven Pages, since it lets you very conveniently tile a map and I wanted to create a hard copy mapbook of the data anyway. I found an ArcPy script that batch exports Data Driven Pages to tiffs and this is working fine, with the major problem that it's not possible to include spatial information when exporting from a Layout. Since I'm generating a very lage amount of tiles (over 600) and would like to do so with various visualizations, manually georeferencing them is not an option.

A work around would be to use the ExportToTIFF method within the data driven page loop. Here is a code example:

mxd = arcpy.mapping.MapDocument("CURRENT") df = arcpy.mapping.ListDataFrames(mxd, "Layers")[0] for pageNum in range(1, mxd.dataDrivenPages.pageCount + 1): mxd.dataDrivenPages.currentPageID = pageNum arcpy.mapping.ExportToTIFF(mxd, r"C:Tempimage_{0}.tif".format(pageNum), df, df_export_width=600, df_export_height=400, geoTIFF_tags=True)

Unfortunately I cannot see a Data Driven Pages solution for exporting layouts to GeoTIFF becoming available until that is possible to do in the core product without DDP being involved.

Ideas for workarounds to that can be found at:

Exporting MXD layout view to GeoTIFF?

At the same time I think you will be wise to look for or create an ArcGIS Idea to have this implemented.


M. Sultan , . K. Chouinard , in Climate Vulnerability , 2013


Integrated studies ( hydrogeology , geochemistry, remote sensing, geographic information systems, geophysics, and hydrologic modeling) were conducted to investigate the hydrologic setting of the Nubian Sandstone Fossil Aquifer of northeast Africa and to assess the response of the system to climatic and anthropic forcing parameters. Results indicate: ( 1 ) the Nubian Aquifer System is formed of discrete subbasins ( 2 ) Paleo and modern-recharge areas were delineated and recharge from modern precipitation and from Lake Nasser were simulated ( 3 ), previously unrecognized natural discharge locations were identified,and (AbuZeid and Hefny 1992) analysis of temporal gravity solutions indicated declining water supplies in Egypt. Recommendations for sustainable management of the Aquifer include: ( 1 ) construction of local retention structures in areas of relatively high precipitation, ( 2 ) channeling excess Lake Nasser water across the western plateau to recharge the aquifer, and ( 3 ) construction of transient numerical flow models to account for our observations and represent the Nubian Aquifer complex flow system

An integrated study using multiple approaches (surficial geology, hydrogeology, geochemistry, remote sensing, geographic information systems, geophysics, and hydrologic modeling) was conducted to accomplish the following: ( 1 ) gain insights into the hydrologic setting of the Nubian Sandstone Fossil Aquifer of northeast Africa (Egypt, Sudan, Libya, and Chad), ( 2 ) investigate the response and vulnerabilities of the system to a variety of climatic and anthropic forcing parameters over a range of time scales up to a million years, and ( 3 ) identify solutions aimed at sustainable management of the Nubian Aquifer resource. Findings included: ( 1 ) the Nubian Aquifer System is more likely to be formed of a number of discrete subbasins that are largely disconnected from one another, and within each of these postulated subbasins the ages of the groundwater will increase along the flow direction ( 2 ) potential paleo-recharge areas (buried channels) were delineated from Shuttle Radar Topography Mission data (SRTM) and Spaceborne Imaging Radar-C/Synthetic Aperture Radar (SIR-C) data ( 3 ) areas receiving modern natural recharge were identified (Tropical Rainfall Measuring Mission [TRMM] distribution of Nubian outcrops) on regional scales in the south (northern Sudan and northeast Chad) and locally in the north (e.g., central and southern Sinai, where recharge was estimated at 13.0 × 10 6 m 3 by using a continuous rainfall–runoff model ( 4 ) the total recharge (10 11 m 3 ) from Lake Nasser to the aquifer was simulated by using a calibrated groundwater flow model for periods of high lake levels (1975–83: 6 × 10 10 m 3 1993–2001: 4 × 10 10 m 3 yr −1 ) ( 5 ) previously unrecognized natural discharge locations were identified by remote sensing, geophysics, and geochemistry, and quantified with hydrologic models along the River Nile basin and the Gulf of Suez fault complexes ( 6 ) analysis of interannual mass variations (from monthly Gravity Recovery and Climate Experiment [GRACE] solutions that span the period from April 2002 through November 2010) indicated near steady-state solutions in the south (Sudan: 0.8 mm yr −1 Chad: –0.9 mm yr −1 ) and in Libya (–1.1 mm yr −1 ) and declining water supplies in Egypt (–3.5 mm yr −1 ) largely related to progressive increase in extraction rates with time.

Recommendations for sustainable management of the Nubian Aquifer include: (1) construction of local retention structures in areas of relatively high precipitation to enhance recharge and to decrease losses to runoff, (2) channeling of encroaching Lake Nasser water in high flood years across the western plateau to facilitate the recharge of the Nubian Aquifer via the permeable Nubian Sandstone outcrops that cover the lowlands to the west and to minimize losses to evaporation, and (3) updating and refinement of conceptual models and construction of transient numerical flow models to represent the complex flow system of the Nubian Aquifer. These models should account for the geological, hydrogeologic, geochemical, geochronologic, geophysical, and remote sensing -related findings, simulate the observed groundwater age distribution, and predict the responses of the hydrologic system to climate change, anthropogenic inputs, and water management practices.

Visualization tools


Views are map settings that can be saved to and retrieved from any geodatabase. They allow you to manage the properties of a data frame and layers so they can be reapplied at any time while allowing you to decide which current map properties to keep and which to override.

The Production Symbology toolbar has tools you can use to create and manage views.

Visual specifications

  • Calculated representations—Symbols used to represent features
  • Calculated fields—Text strings used to display map text

Calculated refers to the use of combinations of fields and SQL where clauses to group features for symbolization. You can also include attributes from other feature classes and tables. With this functionality, you can create highly specific symbology.

Visual specifications are stored in a geodatabase. This fosters reuse across your organization.

Layout elements

Layout elements are surround elements in a page layout. These can include scale bars, north arrows, legends, logos, text, graphics, and data frames. Production Mapping also includes the graphic table element, an element used to create different types of tables. Production Mapping provides the capability to store these layout elements in a geodatabase for distribution, management, and reuse in map production cycles.

Layout window

You can use the Layout window to manage elements in the page layout. This window has a toolbar, an element list, and element searching capabilities. The toolbar exposes Production Mapping functionality such as the Measure Layout tool, and Layout and Data Frame Rules. The element list displays all surround elements on the current page layout. Right-clicking an element in this list displays the Layout window context menu.

The Layout window is a dockable window opened by the Layout window tool on the Production Cartography toolbar. When opened, it displays a list of all elements in a page layout.

Database elements

Database elements are page layout elements that can be stored in any geodatabase. This facilitates element reuse and standardization across multiple maps and charts.

Database elements include any element used in a map's page layout. These can be surround elements, such as scale bars, north arrows, and legends, or they can be logos, text, or other graphics that are created for specific purposes in a map. Layout elements can also be data frames.

The Save Element dialog box allows you to store an element in a geodatabase.

The Database Element dialog box allows you to manage elements stored in a geodatabase and insert them into a page layout. You can access this dialog box from the ArcMap main menu by clicking Insert > Database Element .

Graphic table elements

The graphic table element is a generic surround element that is used to create different types of tables that can appear on a map. It is accessed from the Insert menu. The tables that are created can be used inside a Data Driven Pages map document, an Atlas map series, or a standard map document. This tool is only available in layout view.

Using Technologies for Data Collection and Management

Technologies and surveillance systems play an integral, increasing, and evolving role in supporting public health responses to outbreaks or other urgent public health events. The functions supported might include event detection, event characterization, enhanced surveillance, situational awareness, formal epidemiologic investigations, identification and management of exposed persons, and monitoring of the response itself and its effectiveness. In any field investigation, decisions need to be made early and strategically regarding methods, data sources, systems, and technologies. Skillful initial selection of optimal tools and approaches improves the investigation.

To the extent possible, anticipate whether an investigation will be a low-profile and localized or might result in a large, possibly multicentric investigation of considerable public health importance and public interest. This early forecast guides system and technology selections. Anticipate that methods or technologies can change or evolve during the investigation as the scope or direction of the investigation changes. Plan regular reviews of the adequacy of the methods in use, and, if needed, make a transition from one data collection platform or process to another.

Previously, all components of a field investigation were likely to be performed actually in the field. Developments in information systems, data integration, and system interoperability have now made possible and sometimes desirable for some components (e.g., data collection, data cleaning, data analysis) of &ldquofield&rdquo investigations to be performed off-site (e.g., by central office staff). Access by both office and field staff to systematically collected data often simultaneously or in near&ndash real time, improves support of the field investigation. Broader investments in health information technology (IT) and widespread adoption of electronic health records (EHRs), spurred by the Health Information Technology for Economic and Clinical Health Act in the United States enacted as part of the American Recovery and Reinvestment Act of 2009, have expanded the role technology can play in supporting a public health response (1).

For the purposes of this chapter, the terms outbreak and field investigation represent any acute public health problem requiring urgent epidemiologic investigation, including

  • Infectious disease outbreaks
  • Clusters of cancers, birth defects, or poisonings
  • Environmental exposures
  • Diseases or conditions of unknown etiology
  • Natural disasters or
  • Threats arising from events elsewhere in the world.

During larger and higher profile investigations, the field response most likely will occur in the context of the country&rsquos organized approach to emergency management&mdashfor example, in the US the National Incident Management System ( pdf icon external icon ) or the country&rsquos equivalent approach to emergency management.

In this chapter, the term technology refers broadly to

  • Computers,
  • Software applications,
  • Mobile devices,
  • Personal health status monitoring devices,
  • Laboratory equipment,
  • Environmental monitors and sensors, and
  • EHRs.

Technology is also used in regard to

  • Public health surveillance systems
  • Ongoing public health databases
  • Purpose-built databases for specific investigations and
  • Technologies that enable storing, managing, and querying data and sharing data among these devices and databases.

Emergency situations typically create increased demands for epidemiologic and laboratory resources. Important factors that affect data collection and management during an event response&mdashcompared with business as usual&mdashinclude time constraints immediate pressure to both collect and instantaneously summarize substantial amounts of data, typically in fewer than 24 hours limited human resources often insufficient data preparedness infrastructure and unfamiliar field deployment locations and logistics (see also Chapter 2).

Two guiding principles for selecting and using technologies during a field response are:

  • Technologies for data collection and management should streamline and directly support the workflow of field investigations rather than disrupt or divert resources and staff time away from epidemiologic investigations and related laboratory testing activities (2).
  • Technologies should facilitate more time for epidemiologists to be epidemiologists&mdashto find better data, acquire them, clean them, and use data to better characterize the event, monitor its progress, or monitor the implementation or effectiveness of control measures&mdashand more time for laboratorians to perform testing.

The choice of technology platforms should be driven by the

  • Goals of the investigation
  • Training and skills of available staff
  • Existing infrastructure for gathering and managing case reports and other surveillance data
  • Number of geographically distinct data collection sites or teams expected and the number of jurisdictions involved
  • Speed and frequency with which interim summaries or situation reports are needed
  • Types of formal or analytic epidemiologic investigations expected (e.g., surveys, longitudinal studies, or additional human or environmental laboratory testing) and
  • Other factors that will be evident in the situation.

The chosen technologies and systems should be subjected to periodic review as the investigation continues.

Technologic devices (e.g., mobile and smart devices, personal monitoring devices), EHRs, social media and other apps, automated information systems, and improved public health informatics practices have opened exciting opportunities for more effective and efficient public health surveillance. They are transforming how field teams approach the collection, management, and sharing of data during a field response.

Traditionally in field investigations, a public health agency deploys personnel to the geographic area where the investigation is centered, and the investigation is largely led and managed in the field, with periodic reports sent to headquarters. Although site visits are necessary to identify crucial information and establish relationships necessary for the investigation, a shift is occurring to a new normal in which field response data collection is integrated with existing infrastructure, uses jurisdictional surveillance and informatics staff, and uses or builds on existing surveillance systems, tools, and technologies.

Field data collection can be supported by management and analysis performed off-site or by others not part of the on-site team. Data collection, management, and analysis procedures often can be performed by highly skilled staff without spending the additional resources for them to be on-site. For example, active case finding by using queries in an established syndromic surveillance system (e.g., ESSENCE, which is part of the National Syndromic Surveillance Program []) or reviewing and entering case and laboratory data in a state electronic reportable disease surveillance system can be performed from any location where a computer or smart device and Internet connectivity are available. Data collected in the field electronically can be uploaded to central information systems. When data are collected by using paper forms, these forms can be scanned and sent to a separate data entry location where they can be digitized and rapidly integrated into a surveillance information system.

This approach enables the field team to focus on establishing relationships necessary for supporting epidemiologic investigation and data collection activities or on laboratory specimen collection that can only be accomplished on-site. Specialized staff can be assigned to the team these staff remain at their desks to collect, manage, or analyze data in support of the field investigation. Staff might include data entry operators, medical record abstractors, data analysts, or statistical programmers. Implementing coordinated field and technology teams also enables more and highly skilled staff across multiple levels (local, state, or federal) to contribute effectively to an investigation. How to coordinate data activities in multiple locations needs to be planned for early in the response.

Field investigations often are led by personnel with extensive epidemiologic, disease, and scientific subject-matter expertise who are not necessarily expert in informatics and surveillance strategies. From a data perspective, such leadership can result in the establishment of ineffective data collection and management strategies. To support effective data collection and management, for all outbreaks, field investigators should

  • Identify a role (e.g., chief data scientist or chief surveillance and informatics officer) that reports to a position at a senior level in the incident command structure (e.g., incident commander or planning section chief) and
  • Identify and establish the role at the start of the response.

Whatever title is assigned to the role, the person filling the role should have clearly delineated duties and responsibilities, including

  • Coordinating the full spectrum of data collection and management processes and systems used during the response
  • Being familiar with existing surveillance systems, processes, procedures, and infrastructure and how they are used currently
  • Identifying when and where existing systems can be modified to support the response or if temporary systems or processes need to be established
  • Anticipating that data collection methods or technologies might need to evolve or change during the investigation as the scope or direction of the investigation changes
  • Preventing creation of divergent, one-off, or disconnected data collection, management, and storage
  • Meeting regularly with response staff to identify additional system needs or modifications and ensuring that data collection and management activities support the progressing response
  • Regularly reviewing the adequacy of the surveillance systems, methods, and technology in use during the response and, if needed, plan for and implement a transition from one data collection strategy or platform to another and
  • Communicating surveillance system needs to the incident commander so that decisions and adequate resources for supporting surveillance efforts can be secured.

When preparing and packing for field deployments, two technology items are essential for each investigator: a portable laptop-style computer and a smartphone (essentially a pocket computer providing access to a camera, video, geolocating and mapping services, and data collection capacities). Depending on power availability and Wi-Fi or network connectivity, extra batteries or battery packs/mobile charging stations capable of charging multiple devices, such as laptops and phones, can be crucial. A mobile hotspot device to create an ad hoc wireless access point, separate from the smartphone, can be useful in certain situations. For example, after Hurricane Irma made landfall in Florida during September 2017, widespread power losses lasted for days. Deployed epidemiology staff were housed in locations without consistent power and had to travel to established command centers to charge phones, laptops, and rechargeable batteries once a day. Portable printers or scanners are other optional items to consider. Car-chargers for laptops and phones are also useful, although gas shortages can be a constraint and make car-chargers less optimal during certain types of responses.

As far as possible, responders should be deployed with items similar to ones they have been using on a regular basis. This will ensure that the investigator is familiar with the equipment and how it functions in different settings (e.g., how it accesses the Internet and device battery life), that it has all the expected and necessary software installed, and, perhaps most importantly, that it can be connected to the network (e.g., how it accesses the deploying agency&rsquos intranet). Equipment caches of laptops, tablets, or other devices purchased only for use during events can lead to considerable deployment problems (e.g., lack of training in how to use the specialized equipment, network compatibility, or obsolescence of either hardware or software).

Field investigators responding to out-of-jurisdiction locations most likely will need to be issued temporary laptops from within the response jurisdiction to ensure network and software compatibility, connectivity, and adherence to jurisdictional security requirements. Temporary access (log-ins and passwords) to key surveillance system applications or updates to an investigator&rsquos existing role-based access to these applications will also need to be considered.

Data often move at the speed of trust. A field team should establish strong working relationships at the start of the response with those who invited the epidemiologic assistance. On-site visit time should be used to ensure that the relationship will, among other tasks, facilitate gathering data and meet the needs of local authorities. Plans need to be made at the outset for sharing regular, timely data summaries and reports with local partners.

Upon initial arrival, the field team should assess existing surveillance systems and the processes for data submission to these systems. The assessment should address

  • Data types already collected and available,
  • Data timeliness,
  • Data completeness,
  • How easily and rapidly systems or processes can be modified or changed,
  • Equipment available (e.g., laptops and phones),
  • Available surveillance system staffing, and
  • Known or anticipated problems and concerns with data quality, availability, and timeliness.

If the team is deploying out of its own jurisdiction, the team leader should seek assistance and consultation from someone at the jurisdictional level who fills a role like that of the chief surveillance and informatics officer (see previous section).

An outbreak investigation and response has defined steps and phases (see Chapter 3), and each has specific technology and information needs. In recent years, public health agencies have benefitted from technologic advances that support outbreak detection&mdashwhether the outbreak is caused by a known or unknown agent. For example, to detect reportable disease clusters effectively, the New York City Department of Health and Mental Hygiene each day prospectively applies automated spatiotemporal algorithms to reportable disease data by using SaTScan (Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA). This system enabled detection of the second largest US outbreak of community-acquired legionellosis by identifying a cluster of eight cases centered in the South Bronx days before any human public health monitor noticed it and before healthcare providers recognized the increase in cases (3). The identification led to an extensive epidemiologic, environmental, and laboratory investigation to identify the source&mdasha water cooling tower&mdashand then implement measures to remediate it. Although technology is revolutionizing approaches to cluster detection, this chapter assumes the field team will be responding after a known event or outbreak has been detected thus, the following discussion focuses on using technologies for conducting initial characterization, active case finding, enhanced surveillance, supporting and evaluating control measures, and situational awareness, and for monitoring the response and its effectiveness.

Conducting Initial Characterization, Active Case Finding, and Monitoring

In an outbreak setting, routine data management often changes because of new stressors or novel circumstances, particularly the need to almost immediately gather data, produce reports, and inform decision makers and the public (see also Chapters 2 and 3). To assess population groups at highest risk, geographic extent, and upward or downward trends of disease incidence throughout a confirmed outbreak, investigators can use existing surveillance mechanisms. However, such mechanisms might need to be enhanced for example, investigators might need to

  • Create a new syndrome or add new queries to an existing syndromic surveillance system
  • Ask physicians and laboratorians to report suspected and probable, as well as confirmed, cases
  • Conduct active case finding and
  • Provide laboratories with diagnostic direction or reagents or ask them to send specimens meeting certain criteria to the state public health laboratory.

Regardless of whether case detection is enhanced, the technology used should support production of a line-listing for tracking cases that are part of the investigation. The system should also document what changes are made to individual cases and when those changes are made, including changes that result from new information gathered or learned or from epidemiologic findings. The system should ensure that laboratory data are easily made relational (Box 5.1). Even if investigation data are collected entirely or partially on paper, those data usually are keypunched into electronic data systems for further analysis, and the paper forms are scanned and stored electronically. As stated in a review of the 2003 severe acute respiratory syndrome (SARS) outbreak in Toronto, an important step in achieving seamless outbreak management is &ldquouniform adoption of highly flexible and interoperable data platforms that enable sharing of public health information, capture of clinical information from hospitals, and integration into an outbreak management database platform&rdquo (5).

On May 9, 2014, the Florida Department of Health in Orange County received notification from a hospital infection preventionist about a man with suspected Middle East respiratory syndrome (MERS). Specimens tested at CDC confirmed this was the second reported, confirmed US MERS case. (4)

The investigation determined that the patient possibly exposed others in four general settings: airplanes during travel from Saudi Arabia to Orlando, Florida at home (household contacts and visiting friends) a hospital outpatient waiting room while accompanying a relative for an unrelated medical reason and later, an emergency department waiting room where he sought care for his illness. Multiple levels of contacts were tracked by the four exposure settings and risk for exposure (e.g., healthcare workers at high or low risk depending on procedures performed). The Epi Info 7 (CDC, Atlanta, Georgia) database that was created supported easy generation of line listings for tracking contacts and linking contact and laboratory information, including associated exposure settings, tracking isolation periods, contact method, attempts, signs and symptoms, final outcome, persons who should provide a clinical specimen, number and types of specimens collected (multiple and over time), whether specimens were received for testing, and laboratory results. The novel nature of the investigation required that additional data fields be captured as the scope of the investigation shifted. Because field investigators can control Epi Info 7 database management, these needed shifts were able to be met rapidly with no technical support. As a result, the progress of the contact investigation was able to be monitored in real time to identify priorities, optimally use personnel resources, and ensure leadership had current information on which to base decisions. Because of the ongoing reported MERS cases in the Middle East, CDC used BioMosaic, a big-data analytics application, to analyze International Air Transport Association travel volume data to assess potential high-exposure areas in the United States on the basis of US-bound travel. Effective database management and linking of epidemiologic and laboratory information in a single location supported the investigation.

ITs can be used to improve the quality, completeness, and speed of information obtained in a field investigation and the speed and sophistication of reports that can be generated from that information at the individual or aggregate level. To ensure that the full benefits of these technologies are realized, investigators need to perform the following actions:

  • Begin with the type of output desired, create mock reports, and work backward to define the necessary input elements, ideally at the outset before any data collection begins however, in reality this process often is iterative.
  • Test the data export features and ensure the analytic software can easily access the necessary data.
  • Carefully consider the questions leadership will need to have answered and ensure that the collected data elements answer the overarching questions. Completing this step may directly affect the underlying table structure of the database.
  • Develop a flowchart detailing the steps associated with data gathering, information sharing, data management, and data technology. This flow chart will also help to identify processes that must be or should be manual and to identify and remove duplications of data transfer and entry (Figure 5.1).
  • Recognize that field setups typically need rapid creation and modification of the database and to allow for creation of case records from laboratory results and for addition of multiple laboratory results to case records.
  • Ensure the database can track both cumulative data (total cases) and temporal data changes (what occurred during the previous 24 hours or previous week). Collect status changes (i.e., change-history status and date-time stamps when the data changed) for priority data elements to ensure accurate reporting of information that changed and when (e.g., the number of new cases, the number of cases that changed from probable to confirmed, or the number of suspected cases that have been ruled out).
  • Ensure the database supports user-defined data extraction and query capabilities. Do not underestimate the need for easy access to the data by the field investigators for data entry and rapid data summary and for planning the next day&rsquos field operations (e.g., completed interviews, number of houses to revisit, number of non-English interviewers, number of persons with specimens collected, and number of specimen collection containers or other laboratory supplies on hand and needed). Field investigators should not have to be experts in formulating relational database queries.
  • Plan to test field data collection equipment and applications (see also Chapter 2). For example, if interviews and data collection are to be performed in a door-to-door sampling effort, are the laptop computers too heavy to hold while completing an interview? Can the screens be viewed in direct sunlight? Does the system screen navigation match the flow of the interview? Will Internet connectivity be available?

ITs can be used to improve the quality, completeness, and speed of information obtained in a field investigation and the speed and sophistication of reports that can be generated from that information at the individual or aggregate level. To ensure that the full benefits of these technologies are realized, investigators need to perform the following actions:

  • Begin with the type of output desired, create mock reports, and work backward to define the necessary input elements, ideally at the outset before any data collection begins however, in reality this process often is iterative.
  • Test the data export features and ensure the analytic software can easily access the necessary data.
  • Carefully consider the questions leadership will need to have answered and ensure that the collected data elements answer the overarching questions. Completing this step may directly affect the underlying table structure of the database.
  • Develop a flowchart detailing the steps associated with data gathering, information sharing, data management, and data technology. This flow chart will also help to identify processes that must be or should be manual and to identify and remove duplications of data transfer and entry (Figure 5.1).
  • Recognize that field setups typically need rapid creation and modification of the database and to allow for creation of case records from laboratory results and for addition of multiple laboratory results to case records.
  • Ensure the database can track both cumulative data (total cases) and temporal data changes (what occurred during the previous 24 hours or previous week). Collect status changes (i.e., change-history status and date-time stamps when the data changed) for priority data elements to ensure accurate reporting of information that changed and when (e.g., the number of new cases, the number of cases that changed from probable to confirmed, or the number of suspected cases that have been ruled out).
  • Ensure the database supports user-defined data extraction and query capabilities. Do not underestimate the need for easy access to the data by the field investigators for data entry and rapid data summary and for planning the next day&rsquos field operations (e.g., completed interviews, number of houses to revisit, number of non-English interviewers, number of persons with specimens collected, and number of specimen collection containers or other laboratory supplies on hand and needed). Field investigators should not have to be experts in formulating relational database queries.
  • Plan to test field data collection equipment and applications (see also Chapter 2). For example, if interviews and data collection are to be performed in a door-to-door sampling effort, are the laptop computers too heavy to hold while completing an interview? Can the screens be viewed in direct sunlight? Does the system screen navigation match the flow of the interview? Will Internet connectivity be available?
  • Use programmed data quality and validity checks to identify and resolve discrepancies at the time of data collection. For example, date fields should only accept valid dates within a given range, or pregnancy should be available as a valid value for women only.
  • Be aware that, typically, the more complex the data entry checks or programmed skip patterns in place, the more time that is needed to set up the form itself during field responses, setup time can be an important tradeoff against other uses of investigators&rsquo time and against data quality concerns.
  • Recognize that structured data collection techniques and standardization processes can minimize data quality problems, although even highly structured data collection techniques do not eliminate data errors. The standardization process that facilitates computer-readable data forms risks losing the richness of information identified within unstructured documents (i.e., clinicians&rsquo notes or field observations). How data elements are collected (e.g., structured drop-down lists, free text, check boxes when multiple selections are possible, or radio buttons for single-choice selections) dictates data storage format and table structures and can dictate how the data can be analyzed (e.g., symptoms being reported by interviewees can be stored in a comma-delimited string, or each symptom can be stored as yes/no choices in separate columns).

Note: DMAT: Disaster Medical Assistance Team ED: Emergency Department
Source: Reference 6.

Using Routine Surveillance Data and Systems

The value and use of routine surveillance data systems should not be underestimated during outbreak investigations and ideally will be managed within a data preparedness framework. Many state reportable disease surveillance systems (both commercially available or state-or in-house&ndash designed) now have outbreak management components (7). To avoid duplicating efforts or processes, field investigators should understand and assess existing surveillance systems that support outbreak management before determining which technologies to use (Box 5.2).

In addition to public health electronic disease surveillance systems supporting outbreak management components, reportable disease electronic laboratory reporting (ELR) is now a mainstay of reportable disease surveillance. Every state health department has operational ELR systems (9). Although ELR was designed for supporting individual identification and reporting of disease events, it can also be used to support outbreak response activities. Using existing surveillance systems, including ELR processes, supports outbreak detection, characterization, outbreak identification, and control measure evaluation (Box 5.3).

Following the identification of the initial case of Zika virus infection attributed to likely local mosquito-borne transmission in Florida, the Florida Department of Health conducted active surveillance in selected areas of the state to identify locally acquired Zika virus infections and to assess whether ongoing transmission was occurring (8). Data collected during these field surveys were managed in the outbreak module (OM) of the state health department&ndashdeveloped reportable disease surveillance application, Merlin. Three types of OM events were used: index, cases and their contacts urosurvey (i.e., survey administration paired with urine sample collection), participants of residential, business, or clinic urosurveys and other, nonindex cases. Data regarding residential urosurveys (persons within a 150-meter radius of a locally acquired index case), business urosurveys (employees at a business or worksite), and clinic urosurveys (persons who lived or worked in the area of interest) were collected and analyzed. For each urosurvey OM event, an event-specific survey was generated in real time and used to capture the collected data. In 2016, door-to-door survey data were collected on a simple paper form. Surveys were faxed nightly to the central office staff in Tallahassee, where existing reportable disease data entry staff entered all the survey data collected the digitized information was made available to local-and state-level investigators within 24 hours.

In 2016, 87 OM events (49 index, 32 urosurvey, and 6 other) were initiated. These events comprised approximately 2,400 persons, of whom approximately 2,200 (92%) had participated in any urosurvey event. Managing the data within the Merlin system OM was also useful for immediately linking the laboratory data received electronically from the state public health laboratory information system via the state&rsquos existing electronic laboratory reporting (ELR) infrastructure. Modifications were made to the ELR feed (e.g., new Zika virus test codes were added) with these changes able to be completed before the first urosurvey was launched, thus ensuring rapid data receipt. For positive laboratory results, case records were created immediately, and those records were linked between the case record and separate survey data collection areas. Merlin continued to support routine case reporting, and the OM facilitated flexible group-specific, event-level investigations. Event surveys comprised core questions and site-or setting-specific questions. Use of core questions enabled comparability within and between urosurvey event data, improving the ability to conduct ad hoc analysis. Managing data electronically within Merlin and OM facilitated easy access to data for export, event-specific analysis, and linking for mapping. The seamless management of case and survey data eliminated duplicate data entry. During the response, modifications were made to automate sending reportable disease case data from Merlin to CDC for national reporting in the ArboNet database (replacing a previously manual data entry process).

Identification of influenza outbreaks can be challenging, often relying on a healthcare provider to recognize and report that information to the jurisdiction&rsquos health department. Influenza infections among certain populations at high risk (e.g., older persons, particularly those in nursing homes or other long-term care facilities) can have more severe outcomes especially if there are delays implementing appropriate antiviral treatment or chemoprophylaxis. To more effectively identify outbreaks in this setting, the Florida Department of Health (FDOH) implemented regulations to require reporting of influenza results through electronic laboratory reporting (ELR).

Following an approach first described by the New York Department of Health and Mental Hygiene (10), FDOH obtained a list of all licensed nursing homes and other long-term care facilities from the state licensing agency. Addresses of these facilities were then matched to the patient address received on the ELR form to determine whether a person in these facilities had an influenza-positive specimen. A single positive result within these high-risk settings triggers an outbreak investigation. Previously unreported outbreaks have been identified through this approach.

As another example, during the FDOH&rsquos 2016 response to locally acquired Zika virus infections (8), ELR was vital for evaluating public health recommendations. In Miami-Dade County, FDOH recommended that all pregnant women be tested for Zika virus infection after active local transmission was identified. Laboratories obtaining testing capacity for Zika virus were asked to send all Zika laboratory test results to FDOH. The FDOH birth defects program determined the estimated number of live births and pregnant women living in Miami-Dade, and the ELR data (negative and positive results) were used to assess what proportion of pregnant women had actually been tested and where the public or healthcare providers needed additional outreach or education. With the high volume of testing performed (approximately 65,000 results in 2016), using technology (established ELR processes and advanced analytic software) made such an approach feasible.

Syndromic surveillance uses data about symptoms or health behaviors (e.g., substantial increases in over-the-counter medication sales) and statistical tools to detect, monitor, and characterize unusual activity for further public health investigation or response and situational awareness. The most recognized and largest syndromic surveillance data source is patient encounter data from emergency departments and urgent care centers. These data can be monitored in near&ndashreal time as potential indicators of an event, a disease, or an outbreak of public health significance or to provide event characterization and monitoring after initial detection. ESSENCE, an established syndromic surveillance system, was used to quickly facilitate active case finding when Zika virus was introduced in the US in 2016 and 2017 (8) (Box 5.4).

Zika virus disease (Zika) became a widespread public health problem in Brazil in 2015 and quickly spread to other South and Central American countries and eventually to the United States. Zika is associated with increased probability of severe birth defects in babies when their mothers are infected with the virus during pregnancy. Zika also has been associated with Guillain-Barré syndrome.

The primary vector for Zika is the Aedes aegypti mosquito, which is present in Florida. With large numbers of tourists visiting Florida annually, including from many of the countries with Zika outbreaks, Florida instituted measures to minimize introduction of this disease into the state (8). Identification of persons infected with Zika early in the course of their illness allows for a twofold public health intervention: (1) patient education about how to avoid mosquito bites while viremic to help prevent spread to others and (2) mosquito control efforts that are targeted to the areas where the patient has been (e.g., home or work).

Florida&rsquos syndromic surveillance system (ESSENCE-FL) has nearly complete coverage in hospitals that have emergency departments (245/250 hospitals). Queries were created to search the chief complaint, discharge diagnosis, and triage notes field for Zika terms (including misspellings of the words Zika and microcephaly) and clusters of symptoms (e.g., rash, fever, conjunctivitis, or joint pain) in individuals who had travel to countries of concern. Dashboards were created by state-level staff and shared with county epidemiologists to facilitate daily review of emergency department visits for which Zika was suspected.

A total of 19 Zika cases (10 in 2016, 7 in 2017, and 2 in 2018) were identified by using ESSENCE-FL. These visits were not reported to public health by using traditional reporting mechanisms and would not have been identified without active case finding using ESSENCE-FL by the public health agency. These identifications were completed by using an existing surveillance system and helped to reduce the probability of introducing locally spread Zika in Florida.

Building New Surveillance Systems Versus Modifying Existing Systems

There is a danger that data management in the context of a field investigation can create more, rather than better, data systems. Condition-specific, event-specific, or stand-alone systems that are not integrated or interoperable require burdensome, post hoc coordination that is difficult and time-consuming, if not impossible.

Rather than setting up new stand-alone systems,

  • Work to modify existing systems. Making an urgent system modification is typical, and modifying systems often is more sustainable than designing and developing separate, nonintegrated data management approaches.
  • Consider stand-alone systems only when no other options are available. If used, immediately implement a plan to retrieve and share the data with other systems.
  • Look for opportunities in which the event response can help catalyze surveillance system modifications that will strengthen future surveillance activities.

With broad implementation of EHRs, opportunities exist for improving links between healthcare providers and public health departments, making data collection during field investigations more effective and timely (11). Increasingly, public health agencies have been able to establish agreements with healthcare facilities, often at the local level, to support remote access to EHRs for day-to-day surveillance activities. With such access to EHRs, staff can review medical records remotely to gather additional clinical, exposure, or demographic data about a case whose case report has been received through other channels.

Routine use of such access by local or state health department staff before an event can reduce public health learning curves when EHRs need to be accessed during a response event. Even without routine access, field investigators have been able to get time-limited system-specific EHR access during such response events, as happened during the response to the multistate outbreak of fungal meningitis in 2012 (Box 5.5). This benefitted the outbreak team as they conducted active case finding, completed case abstraction after case identification, and characterized the cases. Medical records abstraction can be done remotely by technical experts who are not on the deployed field team. Familiarity with EHR systems and direct contact with vendors can be helpful. Healthcare provider office staff might be knowledgeable about conducting record-level retrieval in the EHR product, but they might be less skilled at producing system extracts or querying across records (e.g., all persons receiving a specific procedure during a specific time frame) in ways that clinical users of the EHR have little occasion to do.

On September 18, 2012, a clinician alerted the Tennessee Department of Health about a patient with culture-confirmed Aspergillus fumigatus meningitis diagnosed after epidural steroid injection. This case was the first in a multistate outbreak of fungal infections linked to methylprednisolone acetate injections produced by the New England Compounding Center (Framingham, MA) (12). Three lots of methylprednisolone acetate distributed to 75 medical facilities in 23 states were implicated. Medical record abstraction is a common practice during outbreak investigations, but it typically requires on-site abstraction. The Tennessee Department of Health used remote desktop access to electronic health records (EHRs) to review data regarding known affected patients and identify the background rate of adverse events from the procedures of concern. Remote EHR access enabled abstraction of past, current, and follow-up visits and review of medical histories, clinical course of the disease, laboratory test results, imaging results, and treatment data. This was critically important to inform the real-time development and dissemination of CDC guidelines for patient care that evolved with the constantly changing clinical manifestations. Remote EHR access saved health department and facility staff time, enabled staff to return to their offices to complete case ascertainment, and supported multiple highly skilled staff working simultaneously. Assistance from facility information technology staff was needed in certain instances to obtain remote desktop access and provide guidance on using the EHR.

During the investigation, public health authorities needed a substantial amount of information quickly on an ongoing basis and from multiple, disparate institutions, and traveling to obtain the information was impractical. To remedy the challenges of accessing EHRs remotely, areas for improvement include better understanding of privacy policies, increased capability for data sharing, and links between jurisdictions to alleviate data entry duplication.

When data to support an event response might be in an EHR, field teams should

  • Use on-site time to establish necessary relationships and agreements to support remote or desk EHR access
  • Have a low threshold for requesting remote EHR access
  • Elevate resolution of any barriers to EHR access that are encountered to jurisdictional leadership and request assistance from privacy and legal teams (see also Chapter 13)
  • Expand the response team to include experts in medical data abstraction who can support the response remotely
  • Contact EHR vendors or use health department surveillance and informatics staff to facilitate coordination with vendors and to help with gaining remote access or to performing data extractions or queries across records and
  • Ensure quality EHR data integration into existing surveillance systems.

Using EHRs is new to some public health workers and can present challenges. For example, public health users require time to learn how to access, connect, and navigate systems. Where in the EHR the needed data are stored depends in part on how healthcare facilities use their EHRs for example, data ideally stored as coded elements or in available system-designated fields might instead be located in free-text boxes. The more system users exist, the more likely the same data element is recorded in different ways or in different places. Data important to the response might even be stored on paper outside the EHR system. Ideally, public health personnel have access to an institution&rsquos entire EHR system, but some facilities still require that those personnel request specific records, to whom the facility assigns specific record access the latter approach slows the process. The benefits of timely data and data access have proved to be worth the effort to overcome these challenges.

Improving Analysis, Visualization, and Reporting

During outbreaks and response events in recent years, demand has increased for rapid turnaround of easily consumable information. This demand is in part driven by cultural changes and expectations, where people now have powerful computers in their pockets (smartphones) and easy access to social media, the Internet, and 24- hour news cycles. The field team must meaningfully summarize the data and produce reports rapidly, turning collected data into information useful for driving public health action.

Regardless of collection method, after data are digitized, analytic and statistical software can be used to manipulate the data set in multiple ways to answer diverse questions. Additionally, advanced analytic software enables use of other types of data (e.g., electronic real-time data about air or water quality or data acquisition or remote sensing systems, such as continual or automated collection and transmission). Combining these data with geographic information system data can facilitate overlay of environmental and person-centric information by time and place (11).

The following principles apply to facilitating effective analysis and visualization:

  • Data must be easily exportable to other systems for analysis often this process can be automated. Even when data collection occurs in one primary database, completing data analyses may require use of other, more sophisticated tools or merger of outbreak data with data from other sources.
  • Establish a report schedule (e.g., every day at 9 am) early during the investigation. In larger outbreaks for which data input is managed in multiple or disparate locations, communicating explicitly when data should be entered or updated in the system and what time the daily report will be run is imperative for ensuring that the most up-to-date information is available for analysis. Reports summarizing the cumulative information known, as well as daily or even twice-daily data summaries (i.e., situation reports) (Handout 5.1), might be necessary.
  • Use software to automate report production to run at specific times. This function is useful during larger events where situation reports might be needed multiple times each day.

Transitioning from Field Investigations to Ongoing Surveillance

New systems or processes at the local, state, and federal levels often have been developed for supporting outbreak responses. Because of time and resource constraints in outbreak settings, surveillance systems or processes initiated during outbreaks can partially duplicate other processes. They may be time-consuming or staff-intensive in ways that are acceptable during the response but not as part of a routine system and may present integration problems when the outbreak is over. To minimize this potential, field investigators should ensure that processes for reviewing data collection are strictly followed throughout the outbreak. Field investigators should begin transition planning for sustainability with the goal of transitioning as soon as possible to existing mechanisms, keeping in mind related data collection activities that may be needed in future, long-term records management and storage, and continued analyses.

Determining Security, Standards, and Database Backups

Data security is paramount in any uses of technology in a field response. Computers, tablets, and other mobile devices taken into the field must be protected against data loss and unauthorized access. Determinations must be made regarding what types of equipment can interact with the public health agency&rsquos internal network. Confidential data storage on a local machine should be discouraged, and, if unavoidable, address the need early and through the public health agency&rsquos privacy and security standards (see also Chapter 13).

Before data collection or device selection,

  • Understand at a high level the public health agency&rsquos privacy and security standards
  • Assess whether data security in the field meets the jurisdiction&rsquos standards, which might require meeting with the health department&rsquos IT director
  • Determine how data collection (and mobile or off-site data collection) will interact with potential firewall and network problems
  • For field deployments where Internet connections or department of health network availability might not be consistently accessible (as was the situation during deployments after Hurricane Irma struck Florida in 2017 and during the Zika response in Miami&ndash Dade County where door-to-door surveying was done inside large apartment buildings), ensure strict security standards are followed and
  • Implement effective database management and rigor by establishing regular and automated backup procedures.

A situation report summarizing epidemiologic activities was produced daily by the Florida Department of Health and provided to the incident commander for decision-making and resource prioritization purposes. Data were extracted from the state reportable disease surveillance system into analytic software, where the report production was automated.

Zika Fever Summary Points

-112 Zika fever cases have been reported in Florida as of 2:30 pm on 05/12/16
&ndash No cases are new since 2:30 pm on 50/11/16
&ndash 7 cases have been in pregnant women
&ndash 0 cases were acquired in Florida
&ndash 13 cases have been hospitalized during their illiness
&ndash 4 cases are currently ill
&ndash 0 cases have been associated with microcephaly, fetal intercranial calcifications, or poor fetal outcomes (after 1st trimester)
&ndash 1 case has been associated with Guillain-Barré syndrome
&ndash 1117 people have been tested by the Bureau of Public Health Laboratories for Zika virus (634 were pregnant women)

Visualization tools


Views are map settings that can be saved to and retrieved from any geodatabase. They allow you to manage the properties of a data frame and layers so they can be reapplied at any time while allowing you to decide which current map properties to keep and which to override.

The Production Symbology toolbar has tools you can use to create and manage views.

Visual specifications

  • Calculated representations—Symbols used to represent features
  • Calculated fields—Text strings used to display map text

Calculated refers to the use of combinations of fields and SQL where clauses to group features for symbolization. You can also include attributes from other feature classes and tables. With this functionality, you can create highly specific symbology.

Visual specifications are stored in a geodatabase. This fosters reuse across your organization.

Layout elements

Layout elements are surround elements in a page layout. These can include scale bars, north arrows, legends, logos, text, graphics, and data frames. Production Mapping also includes the graphic table element, an element used to create different types of tables. Production Mapping provides the capability to store these layout elements in a geodatabase for distribution, management, and reuse in map production cycles.

Layout window

You can use the Layout window to manage elements in the page layout. This window has a toolbar, an element list, and element searching capabilities. The toolbar exposes Production Mapping functionality such as the Measure Layout tool, and Layout and Data Frame Rules. The element list displays all surround elements on the current page layout. Right-clicking an element in this list displays the Layout window context menu.

The Layout window is a dockable window opened by the Layout window tool on the Production Cartography toolbar. When opened, it displays a list of all elements in a page layout.

Database elements

Database elements are page layout elements that can be stored in any geodatabase. This facilitates element reuse and standardization across multiple maps and charts.

Database elements include any element used in a map's page layout. These can be surround elements, such as scale bars, north arrows, and legends, or they can be logos, text, or other graphics that are created for specific purposes in a map. Layout elements can also be data frames.

The Save Element dialog box allows you to store an element in a geodatabase.

The Database Element dialog box allows you to manage elements stored in a geodatabase and insert them into a page layout. You can access this dialog box from the ArcMap main menu by clicking Insert > Database Element .

Graphic table elements

The graphic table element is a generic surround element that is used to create different types of tables that can appear on a map. It is accessed from the Insert menu. The tables that are created can be used inside a Data Driven Pages map document, an Atlas map series, or a standard map document. This tool is only available in layout view.

Обмен с подписью теперь поддерживается в Google Поиске для любого контента

Сегодня мы также хотим объявить, что обмен с подписью (SXG) в Google Поиске стал доступен для всех веб-страниц. Раньше эта возможность поддерживалась только для страниц, созданных с помощью технологии AMP.

Функция SXG позволяет Google Поиску применять алгоритмы предзагрузки с сохранением конфиденциальности в совместимых браузерах и делает просмотр сайтов более комфортным. При использовании SXG Google Поиск может загружать ключевые ресурсы (HTML, JavaScript, CSS) раньше, чем элементы навигации, чтобы ускорить показ страниц в браузере.

Крупная японская компания Nikkei протестировала обмен с подписью на сайте Nikkei Style. Отрисовка крупного контента (LCP) ускорилась на 300 мс. Кроме того, показатель взаимодействия с пользователями вырос на 12 %, а количество просмотров страниц на сеанс в Chrome для Android – на 9 %. Для реализации SXG специалисты Nikkei выбрали nginx-sxg-module , расширение с открытым исходным кодом для серверов nginx.

Ознакомьтесь с дополнительной информацией о принципах работы SXG. Инструкции по настройке обмена с подписью приведены на этой странице.


Agnovi Corporation | Investigative Case Management and Criminal Intelligence Database Software. Agnovi is a leading provider of case management software solutions for all levels of law enforcement & investigative agencies. View our solutions now!


Law Enforcement and Public Safety Software | eFORCE Software. Optimize processes and work more efficiently using secure law enforcement and public safety software solutions. eFORCE Software has the tools you need.


Numerica Corporation. Our world-class scientists, engineers and software architects deliver state-of-the-art technology solutions to our government and industry customers. We provide new levels of actionable information to decision makers, and our advanced algorithms and software are deployed in systems around the world.


SmartDraw is the Smartest Way to Draw Anything. See why SmartDraw is the smartest way to draw any type of chart, diagram, or floor plan.


Law enforcement software | Diverse Computing DCI | eAgent. DCI provides the innovative line of eAgent law enforcement software solutions, NCIC & Nlets Access, Advanced Authentication, and CJIS Compliance Consulting.


Public Safety Software Solutions | Spillman Technologies. Spillman Technologies has created software for safer communities for 30+ years. Learn more about our public safety and private security software solutions.


FormDocs Electronic Forms Software. The number one forms-management software for your PC or network. Design, fill-in, e-sign, e-mail, share, search. Free trial.

Computer Information Systems. Computer Information Systems’ mission is to develop, market, deliver and maintain seamlessly integrated, mission critical public safety software systems that will be continually enhanced to keep current with emerging Windows Technology and protect the agency’s investment.


Digital Design Group, Inc. False Alarm Reduction Software.


The Leader in SOC and CSIRT Orchestration and Automation | DFLabs. DFLabs is an ISO 9001 certified Company, recognized as a global leader in cyber incident response automation and orchestration. IncMan – Cyber Incidents Under Control – is the flagship product, adopted by Fortune 500 and Global 2000 organizations worldwide. DFLabs has operations in Europe, North America, Middle East, and Asia with US headquarters in Boston, MA and World headquarters in Milano, Italy.


SysTools Official Website – Software for Data Recovery, Forensics, Migration & Backup. SysTools – Simplifying Technology – Trusted by Millions of users for 160+ products in the range of Data Recovery, Digital Forensics Freeware & Cloud Backup.


R.S. Technologies Police Report Writing Software.


Valor IMS. Valor Systems, Inc. is a privately held software development company dedicated to providing emergency response and records management applications to Public Safety Agencies and Corporate Security operations on a global scale. Corporate headquarters is located at 50 S. Main Street, Suite 200, Naperville, IL 60540.


Intergraph Corporation | Process, Power and Marine | Hexagon Safety & Infrastructure. Intergraph is the leading global provider of engineering and geospatial software that enables customers to build and operate more efficient plants and ships, create intelligent maps, and protect critical infrastructure and millions of people around the world. Intergraph is part of Hexagon.


Employee Scheduling Software | Mobile Scheduling Software – VCS Software. VCS Software’s Scheduling and Time & Attendance solutions revolutionizes workforce management for every business, organization, and government facility.

Urban Remote Sensing

Driven by advances in technology and societal needs, the next frontier in remote sensing is urban areas. With the advent of high-resolution imagery and more capable techniques, the question has become "Now that we have the technology, how do we use it?" The need for a definitive resource that explores the technology of remote sensing and the issues it can resolve in an urban setting has never been more acute.

Containing contributions from world renowned experts, Urban Remote Sensing provides a review of basic concepts, methodologies, and case studies. Each chapter demonstrates how to apply up-to-date techniques to the problems identified and how to analyze research results.

  • Focuses on data, sensors, and systems considerations as well as algorithms for urban feature extraction
  • Analyzes urban landscapes in terms of composition and structure, especially using sub-pixel analysis techniques
  • Presents methods for monitoring, analyzing, and modeling urban growth
  • Illustrates various approaches to urban planning and socio-economic applications of urban remote sensing
  • Assesses the progress made to date, identifies the existing problems and challenges, and demonstrates new developments and trends in urban remote sensing

This book is ideal for upper division undergraduate and graduate students, however it can also serve as a reference for researchers or those individuals interested in the remote sensing of cities in academia, and governmental and commercial sectors. Urban Remote Sensing examines how to apply remote sensing technology to urban and suburban areas.

Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years

The preservation of digital objects involves a variety of challenges, including policy questions, institutional roles and relationships, legal issues, intellectual property rights, and metadata. But behind or perhaps beneath such issues, there are substantial challenges at the empirical level. What does it mean to preserve digital objects? What purposes are served by preserving them? What are the real possibilities for successful preservation? What are the problems encountered in trying to exploit these possibilities? Can we articulate a framework or an overall architecture for digital preservation that allows us to discriminate and select possibilities?

To address any of these challenges, we must first answer the simple question: What are digital objects? We could try to answer this question by examining the types of digital objects that have been and are being created. Many types of digital information can and do exist in other forms. In fact, many types of digital information are rather straightforward transcriptions of traditional documents, such as books, reports, correspondence, and lists. Other types of digital information are variations of traditional forms. But many forms of digital information cannot be expressed in traditional hard-copy or analog media for example, interactive Web pages, geographic information systems, and virtual reality models. One benefit of an extensive review of the variety of types of digital information is that it forces one to come to grips with this variety, which is growing both in terms of the number of types of digital objects and in terms of their complexity. In fact, the diversity of digital information exists not only among types but also within types. Consider one application class, documents. There is no single definition or model of a digital document that would be valid in all cases. Information technologists model digital documents in very different ways: a digital document can be a sequence of expressions in natural language characters or a sequence of scanned page images, a directed graph whose nodes are pages, what appears in a Web page, and so on. How documents are managed, and therefore how they are preserved, depend on the model that is applied.

The variety and complexity of digital information objects engender a basic criterion for evaluating possible digital preservation methods, namely, they must address this variety and complexity. Does that necessarily mean that we must preserve the variety and complexity? It is tempting to respond that the variety and complexity must indeed be preserved because if we change the characteristics of digital objects we are obviously not preserving them. However, that response is simplistic. For example, in support of the argument that emulation is the best method for digital preservation-because it allows us to keep digital objects in their original digital formats-the example of the periodic table of the elements has been offered. The information conveyed by the periodic table depends on the spatial layout of the data contained in it. The layout can be corrupted or obliterated by using the wrong software, or even by changing the font. However, to argue that any software or digital format is necessary to preserve the periodic table is patently absurd. The periodic table was created a century before computers, and it has survived very well in analog form. Thus we cannot say without qualification that the variety and complexity of digital objects must always be preserved. In cases such as that of the periodic table, it is the essential character of the information object, not the way it happens to be encoded digitally, that must be preserved. For objects such as the periodic table, one essential characteristic is the arrangement of the content in a 2-by-2 grid. As long as we preserve that structure, we can use a variety of digital fonts and type sizes, or no fonts at all-as in the case of ASCII or a page-image format.

We can generalize this insight and assert that the preservation of a digital information object does not necessarily entail maintaining all of its digital attributes. In fact, it is common to change digital attributes substantially to ensure that the essential attributes of an information object are preserved when the object is transmitted to different platforms. For example, to ensure that written documents retain their original appearance, authors translate them from the word processing format in which they were created to Adobe’s PDF format. Fundamentally, the transmission of information objects across technological boundaries-such as platforms, operating systems, and applications-is the same, whether the boundaries exist in space or time.

Are there basic or generic properties that are true of all digital objects? From a survey of types such as those just described, one could derive an intensive definition of digital objects: a digital object is an information object, of any type of information or any format, that is expressed in digital form. That definition may appear too generic to be of any use in addressing the challenge of digital preservation. But if we examine what it means for information to be expressed in digital form, we quickly come to recognize a basic characteristic of digital objects that has important consequences for their preservation. All digital objects are entities with multiple inheritance that is, the properties of any digital object are inherited from three classes. Every digital object is a physical object, a logical object, and a conceptual object, and its properties at each of those levels can be significantly different. A physical object is simply an inscription of signs on some physical medium. A logical object is an object that is recognized and processed by software. The conceptual object is the object as it is recognized and understood by a person, or in some cases recognized and processed by a computer application capable of executing business transactions.

Physical Objects: Signs Inscribed on a Medium

As a physical object, a digital object is simply an inscription of signs on a medium. Conventions define the interface between a system of signs, that is, a way of representing data, and the physical medium suitable for storing binary inscriptions. Those conventions vary with the physical medium: there are obvious physical differences between recording on magnetic disks and on optical disks. The conventions for recording digital data also vary within media types for example, data can be recorded on magnetic tape with different densities, different block sizes, and a different orientation with respect to the length and width of the tape.

Basically, the physical level deals with physical files that are identified and managed by some storage system. The physical inscription is independent of the meaning of the inscribed bits. At the level of physical storage, the computer system does not know what the bits mean, that is, whether they comprise a natural language document, a photograph, or anything else. Physical inscription does not entail morphology, syntax, or semantics.

Concern for physical preservation often focuses on the fact that digital media are not durable over long periods of time (Task Force 1996). This problem can be addressed through copying digital information to new media, but that “solution” entails another type of problem: media refreshment or migration adds to the cost of digital preservation. However, this additional cost element may in fact reduce total costs. Thanks to the continuing operation of Moore’s law, digital storage densities increase while costs decrease. So, repeated copying of digital data to new media over time reduces per-unit costs. Historically, storage densities have doubled and costs decreased by half on a scale of approximately two years. At this rate, media migration can yield a net reduction, not an increase, in operational costs: twice the volume of data can be stored for half the cost (Moore et al. 2000). In this context, the durability of the medium is only one variable in the cost equation: the medium needs to be reliable only for the length of time that it is economically advantageous to keep the data on it. For example, if the medium is reliable for only three years, but storage costs can be reduced by 50 percent at the end of two years, then the medium is sufficiently durable in a preservation strategy that takes advantage of the decreasing costs by replacing media after two years.

The physical preservation strategy must also include a reliable method for maintaining data integrity in storage and in any change to storage, including any updating of the storage system, moving data from inactive storage to a server or from a server to a client system, or delivering information to a customer over the Internet, as well as in any media migration or media refreshment.

Obviously, we have to preserve digital objects as physical inscriptions, but that is insufficient.

Logical Objects: Processable Units

A digital information object is a logical object according to the logic of some application software. The rules that govern the logical object are independent of how the data are written on a physical medium. Whereas, at the storage level, the bits are insignificant (i.e., their interpretation is not defined), at the logical level the grammar is independent of physical inscription. Once data are read into memory, the type of medium and the way the data were inscribed on the medium are of no consequence. The rules that apply at the logical level determine how information is encoded in bits and how different encodings are translated to other formats notably, how the input stream is transformed into the system’s memory and output for presentation.

A logical object is a unit recognized by some application software. This recognition is typically based on data type. A set of rules for digitally representing information defines a data type. A data type can be primitive, such as ASCII or integer numbers, or it can be composite-that is, a data type composed of other data types that themselves might be composite. The so-called “native formats” produced by desktop application software are composite data types that include ASCII and special codes related to the type of information objects the software produces for example, font, indentation, and style codes for word processing files. A string of data that all conform to the same data type is a logical object. However, the converse is not necessarily true: logical objects may be composite, i.e., they may contain other logical objects.

The logical string must be stored in a physical object. It may be congruent with a physical object-for example, a word processing document may be stored as a single physical file that contains nothing but that document-but this is not necessarily the case. Composite logical objects are an obvious exception, but there are other exceptions as well. A large word processing document can be divided into subdocuments, with each subdocument, and another object that defines how the subdocuments should be combined, stored as separate physical files. For storage efficiency, many logical objects may be combined in a large physical file, such as a UNIX TAR file. Furthermore, the mapping of logical to physical objects can be changed with no significance at the logical level. Logical objects that had been stored as units within a composite logical object can be extracted and stored separately as distinct physical files, with only a link to those files remaining in the composite object. The way they are stored is irrelevant at the logical level, as long as the contained objects are in the appropriate places when the information is output. This requires that every logical object have its own persistent identifier, and that the location or locations where each object is stored be specified. More important, to preserve digital information as logical objects, we have to know the requirements for correct processing of each object’s data type and what software can perform correct processing.

Conceptual Objects: What We Deal with in the Real World

The conceptual object is the object we deal with in the real world: it is an entity we would recognize as a meaningful unit of information, such as a book, a contract, a map, or a photograph. In the digital realm, a conceptual object may also be one recognized by a business application, that is, a computer application that executes business transactions. For example, when you withdraw money from an ATM machine, you conceive of the transaction as an event that puts money in your hands and simultaneously reduces the balance of your bank account by an equal amount. For this transaction to occur, the bank’s system that tracks your account also needs to recognize the withdrawal, because there is no human involved at that end. We could say that in such cases the business application is the surrogate or agent for the persons involved in the business transaction.

The properties of conceptual objects are those that are significant in the real world. A cash withdrawal has an account, an account owner, an amount, a date, and a bank. A report has an author, a title, an intended audience, and a defined subject and scope. A contract has provisions, contracting parties, and an effective date. The content and structure of a conceptual object must be contained somehow in the logical object or objects that represent that object in digital form. However, the same conceptual content can be represented in very different digital encodings, and the conceptual structure may differ substantially from the structure of the logical object. The content of a document, for example, may be encoded digitally as a page image or in a character-oriented word processing document. The conceptual structure of a report-e.g., title, author, date, introduction-may be reflected only in digital codes indicating differences in presentation features such as type size or underscoring, or they could be matched by markup tags that correspond to each of these elements. The term “unstructured data” is often used to characterize digital objects that do not contain defined structural codes or marks or that have structural indicators that do not correspond to the structure of the conceptual object.

Consider this paper. What you see is the conceptual object. Then consider the two images below. Each displays the hexadecimal values of the bytes that encode the beginning of the document. 2 Neither looks like the conceptual object (the “real” document). Neither is the exact equivalent of the conceptual document. Both contain the title of the article, but otherwise they differ substantially. Thus, they are two different logical representations of the same conceptual object.

Fig. 1. Hexadecimal Dump of MS Word

Fig. 2. Hexadecimal Dump of PDF

Is there any sense in which we could say that one of these digital formats is the true or correct logical representation of the document? An objective test would be whether the digital format preserves the document exactly as created. The most basic criterion is whether the document that is produced when the digital file is processed by the right software is identical to the original. In fact, each of these encodings, when processed by software that recognizes its data type, will display or print the document in the format in which it was created. So if the requirement is to maintain the content, structure, and visual appearance of the original document, either digital format is suitable. The two images are of Microsoft Word and Adobe PDF versions of the document. Other variants, such as WordPerfect, HTML, and even a scanned image of the printed document, would also satisfy the test of outputting the correct content in the original format.

This example reveals two important aspects of digital objects, each of which has significant implications for their preservation. The first is that there can be different digital encodings of the same conceptual object and that different encodings can preserve the essential characteristics of the conceptual object. The second relates to the basic concept of digital preservation.

With respect to the first of these implications, the possibility of encoding the same conceptual object in a variety of digital formats that are equally suitable for preserving the conceptual object can be extended to more complex types of objects and even to cases where the conceptual object is not presented to a human but is found only at the interface of two business applications. Consider the example of the cash withdrawal from an ATM. The essential record of that transaction consists of information identifying the account from which the cash is withdrawn, the amount withdrawn, and the date and time of the transaction. For the transaction to be carried out, there must be an interface between the system that manages the ATM and the system that manages the account. The information about the transaction presented at the interface, in the format specified for that interface, is the conceptual object that corresponds to the withdrawal slip that would have been used to record the transaction between the account holder and a human teller. The two systems must share that interface object and, in any subsequent actions related to that withdrawal, must present the same information however, there is no need for the two systems to use identical databases to store the information.

Before considering the implications for the nature of digital preservation, we should examine more fully the relationships among physical, logical, and conceptual objects.

Relationships: Where Things Get Interesting

The complex nature of a digital object having distinct physical, logical, and conceptual properties gives rise to some interesting considerations for digital preservation, especially in the relationships among the properties of any object at these three levels. The relationship between any two levels can be simple. It can be one-to-one for example, a textual document saved as a Windows word processing file is a single object at all three levels. But a long textual report could be broken down into a master and three subdocuments in word processing format, leaving one conceptual object stored as four logical objects: a one-to-many relationship. If the word processing files relied on external font libraries, additional digital objects would be needed to reproduce the document. Initially, the master and subdocuments would probably be stored in as many physical files, but they might also be combined into a zip file or a Java ARchive (JAR) file. In this case, the relationship between conceptual and logical objects is one-to-many, and the relationship between logical and physical could be either one-to-one or many-to-one. To access the report, it would be necessary to recombine the master and subdocuments, but this amalgamation might occur only during processing and not affect the retention of the logical or physical objects.

Relationships may even be many-to-many. This often occurs in databases where the data supporting an application are commonly stored in multiple tables. Any form, report, or stored view defined in the application is a logical object that defines the content, structure, and perhaps the appearance of a class of conceptual objects, such as an order form or a monthly report. Each instance of such a conceptual object consists of a specific subset of data drawn from different tables, rows, and columns in the database, with the tables and columns specified by the form or report and the rows determined in the first instance by the case, entity, event, or other scope specified at the conceptual level, e.g., order number, “x” or monthly report for customer, “y” or product, “z.” In any instance, such as a given order, there is a one-to-many relationship between the conceptual and the logical levels, but the same set of logical objects (order form specification, tables) is used in every instance of an order, so the relationship between conceptual and logical objects are in fact many-to-many. In cases such as databases and geographic information systems, such relationships are based on the database model, but many-to-many relationships can also be established on an ad hoc basis, such as through hyperlinks to a set of Web pages or attachments to e-mail messages. Many-to-many relationships can also exist between logical and physical levels for example, many e-mail messages may be stored in a single file, but attachments to messages might be stored in other files.

To preserve a digital object, the relationships between levels must be known or knowable. To retrieve a report stored as a master and several subdocuments, we must know that it is stored in this fashion and we must know the identities of all the logical components. To retrieve a specific order from a sales application, we do not need to know where all or any of the data for that order are stored in the database we only need to know how to locate the relevant data, given the logical structure of the database.

We can generalize from these observations to state that, in order to preserve a digital object, we must be able to identify and retrieve all its digital components. The digital components of an object are the logical and physical objects that are necessary to reconstitute the conceptual object. These components are not necessarily limited to the objects that contain the contents of a document. Digital components may contain data necessary for the structure or presentation of the conceptual object. For example, font libraries for character-based documents and style sheets for HTML pages are necessary to preserve the appearance of the document. Report and form specifications in a database application are necessary to structure the content of documents.

In addition to identifying and retrieving the digital components, it is necessary to process them correctly. To access any digital document, stored bit sequences must be interpreted as logical objects and presented as conceptual objects. So digital preservation is not a simple process of preserving physical objects but one of preserving the ability to reproduce the objects. The process of digital preservation, then, is inseparable from accessing the object. You cannot prove that you have preserved the object until you have re-created it in some form that is appropriate for human use or for computer system applications.

To preserve a digital object, is it necessary to preserve its physical and logical components and their interrelationship, without any alteration? The answer, perhaps surprisingly, is no. It is possible to change the way a conceptual object is encoded in one or more logical objects and stored in one or more physical objects without having any negative impact on its preservation. For example, a textual report may contain a digital photograph. The photograph may have been captured initially as a JPEG file and included in the report only by means of a link inserted in the word processing file, pointing to the image file. However, the JPEG file could be embedded in the word processing file without altering the report as such. We have seen another example of this in the different formats that can be used to store and reproduce this article. In fact, it may be beneficial or even necessary to change logical or physical characteristics to preserve an object. Authors often transform documents that they create as word processing documents into PDF format to increase the likelihood that the documents will retain their original appearance and to prevent users from altering their contents. An even simpler case is that of media migration. Digital media become obsolete. Physical files must be migrated to new media if not, they will become inaccessible and will eventually suffer from the physical deterioration of the older media. Migration changes the way the data are physically inscribed, and it may improve preservation because, for example, error detection and correction methods for physical inscription on digital media have improved over time.

Normally, we would say that changing something directly conflicts with preserving it. The possibility of preserving a digital object while changing its logical encoding or physical inscription appears paradoxical and is compounded by the fact that it may be beneficial or even necessary to make such changes. How can we determine what changes are permissible and what changes are most beneficial or necessary for preservation? Technology creates the possibilities for change, but it cannot determine what changes are permissible, beneficial, necessary, or harmful. To make such determinations, we have to consider the purpose of preservation.

The Ultimate Outcome: Authentic Preserved Documents

What is the goal of digital preservation? For archives, libraries, data centers, or any other organizations that need to preserve information objects over time, the ultimate outcome of the preservation process should be authentic preserved objects that is, the outputs of a preservation process ought to be identical, in all essential respects, to what went into that process. The emphasis has to be on the identity, but the qualifier of “all essential respects” is important.

The ideal preservation system would be a neutral communications channel for transmitting information to the future. This channel should not corrupt or change the messages transmitted in any way. You could conceive of a digital preservation system as a black box into which you can put bit streams and from which you can withdraw them at any time in the future. If the system is trustworthy, any document or other digital object preserved in and retrieved from the system will be authentic. In abstract terms, we would like to be able to assert that, if Xt0 was an object put into the box at time, t0, and Xtn is the same object retrieved from the box at a later time, tn, then Xtn =Xt0.

However, the analysis of the previous sections shows that this cannot be the case for digital objects. The process of preserving digital objects is fundamentally different from that of preserving physical objects such as traditional books or documents on paper. To access any digital object, we have to retrieve the stored data, reconstituting, if necessary, the logical components by extracting or combining the bit strings from physical files, reestablishing any relationships among logical components, interpreting any syntactic or presentation marks or codes, and outputting the object in a form appropriate for use by a person or a business application. Thus, it is impossible to preserve a digital document as a physical object. One can only preserve the ability to reproduce the document. Whatever exists in digital storage is not in the form that makes sense to a person or to a business application. The preservation of an information object in digital form is complete only when the object is successfully output. The real object is not so much retrieved as it is reproduced by processing the physical and logical components using software that recognizes and properly handles the files and data types (InterPARES Preservation Task Force 2001). So, the black box for digital preservation is not just a storage container: it includes a process for ingesting objects into storage and a process for retrieving them from storage and delivering them to customers. These processes, for digital objects, inevitably involve transformations therefore, the equation, then Xtn =Xt0 cannot be true for digital objects.

In fact, it can be argued that practically, this equation is never absolutely true, even in the preservation of physical objects. Paper degrades, ink fades even the Rosetta Stone is broken. Moreover, in most cases we are not able to assert with complete assurance that no substitution or alteration of the object has occurred over time. As Clifford Lynch has cogently argued, authentication of preserved objects is ultimately a matter of trust. There are ways to reduce the risk entailed by trusting someone, but ultimately, you need to trust some person, some organization, or some system or method that exercises control over the transmission of information over space, time, or technological boundaries. Even in the case of highly durable physical objects such as clay tablets, you have to trust that nobody substituted forgeries over time (Lynch 2000). So the equation for preservation needs to be reformulated as Xtn = Xt0 + delta(X), where delta(X) is the net effect of changes in X over time.

But can an object change and still remain authentic? Common sense suggests that something either is or is not authentic, but authenticity is not absolute. Jeff Rothenberg has argued that authenticity depends on use (Rothenberg 2000). More precisely, the criteria for authenticity depend on the intended use of the object. You can only say something is authentic with respect to some standard or criterion or model for what X is.

Consider the simple example shown in figure 3. It shows a letter, preserved in the National Archives, concerning the disposition of Thomas Jefferson’s papers as President of the United States (Jefferson 1801). Is this an authentic copy of Thomas Jefferson’s writing? To answer that question, we would compare it to other known cases of Thomas Jefferson’s handwriting. The criteria for authentication would relate to the visual appearance of the text. But what if, by “Jefferson’s writing,” we do not mean his handwriting but his thoughts? In that case, the handwriting becomes irrelevant: Jefferson’s secretary may have written the document, or it could even be a printed version. Conversely, a document known to be in Jefferson’s handwriting, but containing text he copied from a book, does not reveal his thoughts. Authenticating Jefferson’s writing in this sense relates to the content and style, not to the appearance of the text. So authenticating something as Jefferson’s writing depends on how we define that concept.

There are contexts in which the intended use of preserved information objects is well-known. For example, many corporations preserve records for very long times for the purpose of protecting their property rights. In such cases, the model or standard that governs the preservation process is that of a record that will withstand attacks on its reliability and authenticity in litigation. Institutions such as libraries and public archives, however, usually cannot prescribe or predict the uses that will be made of their holdings. Such institutions generally maintain their collections for access by anyone, for whatever reason. Where the intentions of users are not known in advance, one must take an “aboriginal” approach to authenticity that is, one must assume that any valid intended use must be somehow consonant with the original nature and use of the object. Nonetheless, given that a digital information object is not something that is preserved as an inscription on a physical medium, but something that can only be constructed-or reconstructed-by using software to process stored inscriptions, it is necessary to have an explicit model or standard that is independent of the stored object and that provides a criterion, or at least a benchmark, for assessing the authenticity of the reconstructed object.

Ways to Go: Selecting Methods

What are the possibilities for preserving authentic digital information objects? Among these possibilities, how can we select the best option or options? Four criteria apply in all cases: any method chosen for preservation must be feasible, sustainable, practicable, and appropriate. Feasibility requires hardware and software capable of implementing the method. Sustainability means either that the method can be applied indefinitely into the future or that there are credible grounds for asserting that another path will offer a logical sequel to the method, should it cease being sustainable. The sustainability of any given method has internal and external components: internally, the method must be immune or isolated from the effects of technological obsolescence externally, it must be capable of interfacing with other methods, such as for discovery and delivery, which will continue to change. Practicality requires that implementation be within reasonable limits of difficulty and expense. Appropriateness depends on the types of objects to be preserved and on the specific objectives of preservation. With respect to the types of objects to be preserved, we can define a spectrum of possibilities running from preserving technology itself to preserving objects that were produced using information technology (IT). Methods can be aligned across this spectrum because the appropriateness of any preservation method depends on the specific objectives for preservation in any given case. As discussed earlier, the purposes served by preservation can vary widely. Considering where different methods fall across this spectrum will provide a basis for evaluating their appropriateness for any given purpose.

To show the rationale of the spectrum, consider examples at each end. On the “preserve technology” end, one would place preserving artifacts of technology, such as computer games. Games are meant to be played. To play a computer game entails keeping the program that is needed to play the game operational or substituting an equivalent program, for example, through reverse engineering, if the original becomes obsolete. On the “preserve objects” end, one would place preserving digital photographs. What is most important is that a photograph present the same image 50 or 100 years from now as it does today. It does not really matter what happens to the bits in the background if the same image can be retrieved reliably. Conversely, if a digital photograph is stored in a physical file and that file is maintained perfectly intact, but it becomes impossible to output the original image in the future-for example, because a compression algorithm used to create the file was either lossy or lost-we would not say the photograph was preserved satisfactorily.

But these illustrations are not completely valid. Many computer games have no parallels in the analog world. Clearly they must be preserved as artifacts of IT. But there are many games now played on computers that existed long before computers were invented. The card game, solitaire, is one example. Obviously, it could be preserved without any computer. In fact, the most assured method for preserving solitaire probably would be simply to preserve the rules of the game, including the rules that define a deck of cards. So the most appropriate method for preserving a game depends on whether we consider it to be essentially an instance of a particular technology-where “game” is inseparable from “computer”-or a form of play according to specified rules that is, a member of a class of objects whose essential characteristics are independent of the technology used to produce or implement them. We have to preserve a computer game in digital form only if there is some essential aspect of the digital form than cannot be materialized in any other form or if we wish to be able to display, and perhaps play, a specific version of the computer game.

The same analysis can be applied to digital photographs. With traditional photographs, one would say that altering the image that had been captured on film was contrary to preserving it. But there are several types of digital photographs where the possibilities of displaying different images of the same picture are valuable. For example, a traditional chest X-ray produced three pieces of film, and, therefore, three fixed images. But a computerized axial tomography (CAT) scan of the chest can produce scores of different images, making it a more flexible and incisive tool for diagnosis. How should CAT scans be preserved? It depends on our conception or model of what a CAT scan is. If we wanted to preserve the richest source of data about the state of a particular person’s body at a given time, we would have to preserve the CAT scan as an instance of a specific type of technology. But if we needed to preserve a record of the specific image that was the basis for a diagnosis or treatment decision, we would have to preserve it as a specific image whose visual appearance remains invariant over time. If the first case, we must preserve CAT scanning technology, or at least that portion of it necessary to produce different images from the stored bit file. It is at least worth considering, in the latter case, that the best preservation method, taking feasibility and sustainability into account, would be to output the image on archival quality photographic film.

Here, in the practical context of selecting preservation methods, we see the operational importance of the principle articulated in discussing the authenticity of preserved objects: we can determine what is needed for preservation only on the basis of a specific concept or definition of the essential characteristics of the object to be preserved. The intended use of the preserved objects is enabled by the articulation of the essential characteristics of those objects, and that articulation enables us not only to evaluate the appropriateness of specific preservation methods but also to determine how they should be applied in any case. Applying the criterion of appropriateness, we can align various preservation methods across the spectrum of “preserve technology”-“preserve objects.”

More than a Spectrum: A Two-Way Grid

For any institution that intends or needs to preserve digital information objects, selection of preservation methods involves another dimension: the range of applicability of the methods with respect to the quantity and variety of objects to be preserved. Preservation methods vary greatly in terms of their applicability. Some methods apply only to specific hardware or software platforms, others only to individual data types. Still others are very general, applicable to an open-ended variety and quantity of digital objects. The range of applicability is another basis for evaluating preservation methods. Organizations that need to preserve only a limited variety of objects can select methods that are optimal for those objects. In contrast, organizations responsible for preserving a wide variety must select methods with broad applicability. Combining the two discriminants of appropriateness for preservation objectives and range of applicability defines a two-dimensional grid in which we can place different preservation methods and enrich our ability to evaluate them.

Figure 4 shows this grid, with a number of different methods positioned in it. Two general remarks about the methods displayed in this grid are in order. On the one hand, the methods included in it do not include all those that have been proposed or tried for digital preservation. In particular, methods that focus on metadata are not included. Rather, the emphasis is on showing a variety of ways of overcoming technological obsolescence. Even here, the cases included are not exhaustive they are only illustrative of the range of possibilities. On the other hand, some methods are included that have not been explicitly or prominently mentioned as preservation methods. There is a triple purpose for this. The first purpose is to show the robustness of the grid as a framework for characterizing and evaluating preservation methods. The second is to emphasize that those of us who are concerned with digital preservation need to be open to the possibilities that IT is constantly creating. The third purpose is to reflect the fact that, in the digital environment, preservation is not limited to transmitting digital information over time. The same factors are in play in transmitting digital information across boundaries in space, technology, and institutions. Therefore, methods developed to enable reliable and authentic transmission across one of these types of boundaries can be applicable across others (Thibodeau 1997).

Fig. 4. Digital Preservation Methods

Sorting IT Out

Discussions of digital preservation over the last several years have focused on two techniques: emulation and migration. Emulation strives to maintain the ability to execute the software needed to process data stored in its “original” encodings, whereas migration changes the encodings over time so that we can access the preserved objects using state-of-the-art software in the future. Taking a broader perspective, IT and computer science are offering an increasing variety of methods that might be useful for long-term preservation. These possibilities do not fit nicely into the simple bifurcation of emulation versus migration. We can position candidate methods across the preservation spectrum according to the following principles:

  • On the “preserve technology” end of the spectrum, methods that attempt to keep data in specific logical or physical formats and to use technology originally associated with those formats to access the data and reproduce the objects.
  • In the middle of the spectrum, methods that migrate data formats as technology changes, enabling use of state-of-the-art technology for discovery, access, and reproduction.
  • On the “preserve objects” end of the spectrum, methods that focus on preserving essential characteristics of objects that are defined explicitly and independently of specific hardware or software.

There are various ways one can go about all these options. For example, if we focus on the “preserve technology” end, we start with maintaining original technology, an approach that will work for some limited time. Even for preservation purposes, it can be argued that this approach is often the only one that can be used.

Preserving Technology: The Numbers Add Up, and Then Some

The starting point for all digital preservation is the technology and data formats used to create and store the objects. Digital information objects can be preserved using this “original” technology for 5 to 10 years, but eventually the hardware, software, and formats become obsolete. Trying to preserve specific hardware and software becomes increasingly difficult and expensive over time, with both factors compounded by the variety of artifacts that need to be preserved. Over the long term, keeping original technology is not practicable and may not be feasible.

Enter the Emulator

Various approaches can be used to simplify the problem while still keeping data in their original encodings. The best-known approach is emulation. Emulation uses a special type of software, called an emulator, to translate instructions from original software to execute on new platforms. The old software is said to run “in emulation” on newer platforms. This method attempts to simplify digital preservation by eliminating the need to keep old hardware working. Emulators could work at different levels. They could be designed to translate application software to run on new operating systems, or they could translate old operating system commands to run on new operating systems. The latter approach is simpler in that the former would require a different emulator for every application, and potentially for every version of an application, while the latter should enable all applications that run on a given version of an operating system to execute using the same emulator.

While proponents of emulation argue that it is better than migration because at every data migration there is a risk of change, emulation entails a form of migration. Emulators themselves become obsolete therefore, it becomes necessary either to replace the old emulator with a new one or to create a new emulator that allows the old emulator to work on new platforms. In fact, if you get into an emulation strategy, you have bought into a migration strategy. Either strategy adds complexity over time.

Emulation is founded on the principle that all computers are Turing machines and that any command that can run on one Turing machine can run on any other Turing machine. There is, however, evidence that this principle breaks down at an empirical level. For example, basic differences such as different numbers of registers or different interrupt schemes make emulation unreliable, if not impossible (IEEE 2001).

Reincarnation for Old Machines

Another technique that keeps old software running takes the opposite approach from emulation: it relies on a special type of hardware, rather than software emulators. It does this by re-creating an old computer on a configurable chip. An entire computer system could be reincarnated by being programmed on a new, configurable chip. The configurable chip constitutes a single point of failure, but that can readily be offset. If the chip begins to fail or becomes obsolete, the old system could simply be programmed on a newer chip. Intuitively, configurable chips seem like a simpler approach than emulation.

Compound Disinterest

While emulation and configurable chips take opposite directions, they present some common problems. First, current technology is not perfect. There are anomalies and bugs. Any preservation strategy that relies on specific software is carrying all the problems associated with those products into the future. Not all these problems get fixed. For example, it is not always possible to figure out what causes a problem such as a general protection fault, because there are too many variables involved. Furthermore, fixes can increase the complexity of preservation strategies that rely on keeping old software running, because they increase the number of versions of software that are released. Logically, if the authenticity of digital information depends on preserving original data formats and using them with the original software, each format should be processed with the version of the software used to produce it.

Software defects aside, the combinatorics entailed by strategies that involve preserving ever-increasing varieties of data formats, application software, and operating systems are frightening. With new versions being released every 18­24 months, over 25-years or longer, one would need to support thousands of combinations of applications, utilities, operating systems, and formats.

The viability of these strategies gets much more complex when the focus shifts from a single system to combinations of systems, which is the norm today. Emulation and programmable chips might be viable strategies if all we had to cope with were the products of desktop PCs, but not in today’s world, where the objects to be preserved often involve a diverse palette of technologies, such as various client-server applications where the servers use different operating systems, distributed applications running on heterogeneous platforms, and virtual machines such as Java. Providing technical support for operations of such a daunting variety of makes, models, and versions may be neither feasible nor affordable, because you would have to get all these applications running in emulation at the same time.

Complexity also increases in the case of collections of documents accumulated over time. Most government records, for example, are accumulated over many years, often many decades. Following the most fundamental principles of archival science—respect for provenance and for original order—we cannot segregate records according to their digital formats. We must preserve and provide access to aggregates of records established by their creators. Under a strategy of preserving technology, doing research in such series would entail using all the different software products used to produce the records.

Even if it were technically and financially possible to keep the technologies operative, staffing a help desk to support end users is inconceivable, especially since most users in the future will never have encountered—not to mention learned how to use—most of the products they will need to access the preserved information objects. Even if it were possible to provide adequate support to a user perusing, for example, a single case file accumulated over 20 years, it is not obvious that this would be deemed an acceptable level of support, because it would cut users off from the possibility of using more advanced technologies for discovery, delivery, and analysis.

Scenarios pegged on preserving specific technology, maintaining the links between specific software and specific data formats, run counter to the major direction of information technology. E-commerce and e-government require that the information objects created and used in these activities be transportable among the parties involved, independent of the hardware and the software each party uses at any time. Neither e-commerce nor e-government would be possible if the necessary information had to be accessed in original formats using obsolete technologies. Preserve technology strategies will depend on niche technologies and cannot expect widespread support in the IT market.

In this approach, one also encounters some interesting issues of intellectual property rights—not only the usual issues of copyright but also the ownership that the software companies assert over their formats even when they do not own the content.

A View Toward Further Simplification

Various software-engineering methods provide simpler ways of keeping obsolete formats accessible by concentrating on specific requirements.

One such method focuses on documents, a class of objects in which the functionality that has to be preserved is simply the ability to present them visually on a screen or printed page. For such objects, the only specific software needed for preservation is software that reliably renders the content with its original look and feel. This approach is being used in the Victorian Electronic Records System (VERS) developed for the Public Record Office of the State of Victoria, Australia. The system stipulates converting documents created in desktop computing environments to Adobe’s PDF format. Instead of attempting to run versions of Acrobat reader for PDF indefinitely in the future, the VERS project conducted an experiment to demonstrate that it is possible to construct a viewer from the published specifications of the PDF format. The VERS approach embodies a combination of format migration, in that the various formats in which records are originally created must be translated to PDF with software reengineering. Similar approaches could be applied to other data types whose essential functionality is presentation in page image.

Finding Virtue in Abstraction

Another application of software engineering involves developing virtual machines that can execute essential functions on a variety of platforms. The Java language is an example of a virtual machine, although it was not developed for purposes of preservation. The virtual machine approach avoids the need for emulator software by providing required functionality in a virtual machine that, in principle, can be implemented on a great variety of computing platforms indefinitely into the future. Raymond Lorie of the IBM-Almaden Research Center has launched an effort to develop a Universal Virtual Computer (UVC) that would provide essential functionality for an unlimited variety of data types. Following this strategy, objects would be preserved in their original formats, along with the rules for encoding and decoding data in those formats. The rules are written in a machine language that is completely and unambiguously specified. The language is so simple that it can be interpreted to run on any computer in the future. When the UVC program executes, the preserved data are interpreted according to a logical schema for the appropriate data type and output, and each data element bears a semantic tag defined in the logical schema. This approach avoids much of the complexity of emulation and configurable chips, but there are some trade-offs. The UVC only provides a limited set of basic functions. It also sacrifices performance: software that can run on any platform is not optimized for any one of them (Lorie 2000).

Accepting Change: Migration Strategies

In the middle of the spectrum fall data migration approaches that abandon the effort to keep old technology working or to create substitutes that emulate or imitate it. Instead, these approaches rely on changing the digital encoding of the objects to be preserved to make it possible to access those objects using state-of-the-art technology after the original hardware and software become obsolete. There are a variety of migration strategies.

Simple Version Migration

The most direct path for format migration, and one used very commonly, is simple version migration within the same family of products or data types. Successive versions of given formats, such as Corel WordPerfect’s WPD or Microsoft Excel’s XLS, define linear migration paths for files stored in those formats. Software vendors usually supply conversion routines that enable newer versions of their product to read older versions of the data format and save them in the current version.

Version migration sets up a chain that must be extended over time, because every format will eventually become obsolete. One problem with this approach is that using more recent versions of software, even with the original formats, may present the preserved documents with characteristics they did not, and perhaps could not, have had. For example, any document created with a word processor in the early 1990s, before “WYSIWYG” display was available, would have appeared on screen with a black background and green letters. If one were to open such a document with a word processor today, it would look much like a printed page.

Software vendors control this process of version migration. Their conversion utilities are designed to migrate data types and do not provide for explicit or specific control according to attributes defined at the conceptual level. Each successive migration will accumulate any alterations introduced previously. Another potential problem is that over time, product lines, and the migration path, may be terminated.

Format Standardization

An alternative to the uncertainties of version migration is format standardization, whereby a variety of data types are transformed to a single, standard type. For example, a textual document, such as a WordPerfect document, could be reduced to plain ASCII. Obviously, there would be some loss if font, type size, and formatting were significant. But this conversion is eminently practicable, and it would be appropriate in cases where the essential characteristics to be preserved are the textual content and the grammatical structure. Where typeface and font attribute are important, richer formats, such as PDF or RTF, could be adopted as standards. The low common denominator provides a high guarantee that the format will be successful, at least for preserving appearance. For types of objects where visual presentation is essential, bit-mapped page images and hard copy might be acceptable: 100 years from now, IT systems will be able to read microfilm. In fact, according to companies such as Kodak and Fuji, it can be done today.

For socioeconomic and other data sets created to enable a variety of analyses, the data structure can often be preserved in a canonical form, such as arrays or relational tables, independently of specific software. Such formats are either simple enough or so unambiguously defined that it is reasonable to assume that information systems in the future will be able to implement the structures and process the data appropriately.

In principle, the standard format should be a superclass of the original data types—one that embodies all essential attributes and methods of the original formats. This is not necessarily the case, so there may be significant changes in standardization, just as with version migration. Moreover, standards themselves evolve and become obsolete. So, except for the simplest formats, there is a likely need for repeated migrations from one standard format to another, with consequent accumulation of changes.

Typed Object Model Conversion

Another approach to migrating data formats into the future is Typed Object Model (TOM) Conversion. The TOM approach starts out with the recognition that all digital data things are objects, that is, they have specified attributes, specified methods or operations, and specific semantics. All digital objects belong to one or another type of digital object, where “type” is defined by given values of attributes, methods, or semantics for that class of objects. A Microsoft Word 6 document, for example, is a type of digital object defined by its logical encoding. An e-mail is a type of digital object defined, at the conceptual and logical levels, by essential data elements, e.g., “To,” “From,” “Subject,” or “Date.”

Any digital object is a byte sequence and has a format, i.e., a specified encoding of that object for its type. Byte sequences can be converted from one format to another, as shown in the earlier example of this document encoded in Microsoft Word and PDF formats. But within that range of possible conversion, the essential properties of a type or class of objects define “respectful conversions,” that is, conversions whose result cannot be distinguished when viewed for an interface of that type. The content and appearance of the document in this example remains identical whether it is stored as a Word or PDF file therefore, conversion between those two formats is respectful for classes of objects whose essential properties are content and appearance (Wing and Ockerbloom 2000). There is a TOM conversion available online that is capable of doing respectful conversions of user submitted files in some 200 formats.

Rosetta Stones Translation

Another migration approach under development is called Rosetta Stones. Arcot Rajasekar of the San Diego Supercomputer Center is developing this approach. Like TOM, this approach starts with data types, but rather than articulating the essential properties of each type, it constructs a representative sample of objects of that type. It adds a parallel sample of the same objects in another, fully specified type, and retains both. For example, if one wanted to preserve textual documents that had been created in WordPerfect 6, one would create a sample of files in version 6 of the WPD format that embodies all the significant features of this format. Then one would duplicate each of the documents in this sample in another format that might be human-readable computer output microfilm (COM) or paper, because we know that we will always be able to read in those human-readable versions. This second sample constitutes a reference set, like the Greek in the original Rosetta Stone. The triad of samples in the original data type, the reference set, and the target type constitutes a digital Rosetta Stone from which rules for translating from the original to the target encoding can be derived.

Given the reference sample—e.g., the printed version of documents—and the rules for encoding in a target format that is current at any time in the future, we can create a third version of the sample in the target format. By comparing the target sample with the original sample, we can deduce the rules for translating from the original to the target format and apply these rules to convert preserved documents from the original to the target format. This approach avoids the need for repeated migrations over time. Even though the target formats can be expected to become obsolete, migration to subsequent formats will be from the original format, not from the earlier migration. Important to the success of this approach is the ability to construct a parallel sample in a well-characterized and highly durable type. It is not evident that it will be possible to do this for all data types, especially more complex types that do not have analog equivalents, but research on this approach is relatively recent.

Object Interchange Format

Another approach enables migration through an object interchange format defined at the conceptual level. This type of approach is being widely adopted for e-commerce and e-government where participants in a process or activity have their own internal systems, which cannot readily interact with systems in other organizations. Rather than trying to make the systems directly interoperable, developers are focusing on the information objects that need to be exchanged to do business or otherwise interact. These objects are formally specified according to essential characteristics at the conceptual level, and those specifications are articulated in logical models. The logical models or schema define interchange formats. To collaborate or interact, the systems on each side of a transaction need to be able to export information in the interchange format and to import objects in this format from other systems. While it was designed for internal markup of documents, the XML family of standards has emerged as a major vehicle for exchange of digital information between and among different platforms.

A significant example of this approach concerns financial reports. There are several types of financial reports that essentially all corporations produce and share with their business partners and with government agencies around the world. The extensible business reporting language (XBRL) is an initiative to enable exchange of these reports, regardless of the characteristics of systems used either to produce or receive the reports. The initiative comprises major professional organizations of accountants from the United States, Canada, the United Kingdom, Australia, and several non-English-speaking countries, major accounting firms, major corporations, IT companies, and government agencies. XBRL defines a single XML schema that covers all standard financial reports. The schema defines an interchange format. Any system that can export and import data in that format can exchange financial reports and data with any other system with XBRL I/O capability, regardless of the hardware or software used in either case. At the logical level, the XBRL schema is impressively simple. That simplicity is enabled by an extensive ontology of accounting terms at the conceptual level. This approach is obviously driven by short-term business needs, but a method that allows reliable exchange of important financial data across heterogeneous computing platforms around the world can probably facilitate transmission of information over generations of technology. Given that XML schemas and tags are constructed using plain ASCII and can be interpreted by humans, it is likely that future computer systems will be able process them correctly. Thus, the object interchange method can become a preservation method simply by retaining the objects in the interchange format and, on an as-needed basis, building interpreters to enable target systems in the future to import objects in such formats.

To some extent, object interchange formats have the same purpose as do samples in well-known data types in the Rosetta Stones method: they serve as a bridge between heterogeneous systems and data types. While the Rosetta Stones method is more generic, object interchange specifications have a significant advantage in that the essential properties of the objects are defined by experts who have substantial knowledge of their creation and use. Thus, unlike all the other approaches considered so far, object interchange formats embed domain knowledge in the transmission of information objects across space, time, and technologies. The object interchange model lies close to the “preserve objects” end of the preservation spectrum. It could be said to lie midway between specific and general in its applicability because it provides a single method that potentially could be applied to a great variety of objects and data types, but addresses only the persistence of content and form across technological boundaries.

Preserving Objects: Persistent Archives

A promising approach, persistent archives, has been articulated over the last four years, primarily at the San Diego Supercomputer Center in research sponsored by the Defense Advanced Research Projects Agency, the National Science Foundation, and the National Archives and Records Administration. It has many elements in common with other approaches described in this paper, but it is also markedly different than these other strategies. Like the UVC, it relies on a high level of abstraction to achieve very broad applicability. Like TOM and Rosetta Stones, it addresses the specific characteristics of logical data types. Like object interchange formats and the UVC, it tags objects to ensure the persistence of syntactic, semantic, and presentation elements. Like migration, it transforms the logical encoding of objects, but unlike migration, the transformations are controlled not by target encodings into which objects will be transformed but by the explicitly defined characteristics of the objects themselves. It implements a highly standardized approach, but unlike migration to standard format, it does not standardize on logical data types, but at a higher level of abstraction: on the method used to express important properties, such as context, structure, semantics, and presentation.

The most important difference between persistent archives and the other approaches described is that the former strategy is comprehensive. It is based on an information management architecture that not only addresses the problem of obsolescence but also provides the functionality required for long-term preservation, as stipulated in the OAIS standard. Furthermore, it provides a coherent means of addressing the physical, logical, and conceptual properties of the objects being preserved through the data, information, and knowledge levels of the architecture. Persistence is achieved through two basic routes: one involving the objects themselves, the other the architecture. Objects are preserved in persistent object format, which is relatively immune to the continuing evolution of IT. The architecture enables any component of hardware or software to be replaced with minimum impact on the archival system as a whole. The architecture is notional. It does not prescribe a specific implementation.

The cornerstone of the persistent archives approach is the articulation of the essential characteristics of the objects to be preserved—collections as well as individual objects—in a manner that is independent of any specific hardware or software. This articulation is expressed at the data level by tags that identify every byte sequence that must be controlled to ensure preservation. In effect, tags delimit atomic preservation units in physical storage. The granularity of these data units can vary greatly, depending on requirements articulated at the information and knowledge levels. Every tag is linked to one or more higher-level constructs, such as data models, data element definitions, document type definitions, and style sheets defined at the information level, and ontologies, taxonomies, thesauri, topic maps, rules, and textbooks at the knowledge level. In research tests on a wide variety of data types, conceptual objects, and collections, it has been shown that simple, persistent ASCII tags can be defined to identify, characterize, and control all data units. The research has shown that XML is currently the best method for tagging and articulating requirements at the information level and, to some extent, at the knowledge level however, it would be wrong to conclude that persistent archives are based or dependent on XML. Rather, persistent archives currently use XML, but there is nothing in the architecture that would preclude using other implementation methods should they prove superior.

The architecture is structured to execute the three basic processes required in the Open Archival Information System (OAIS) standard: ingest, for bringing objects into the system management, for retaining them over time and access, for disseminating them to consumers. In ingest, objects in obsolescent formats are transformed into persistent format, through parsing and tagging of data units as described earlier, or, if they are already in persistent format, by verifying that fact at the data, information, and knowledge levels. Over time, data units are maintained in storage, and the metadata and domain knowledge that are necessary to retrieve, use, and understand the data are maintained in models, dictionaries, and knowledge bases. When access to a preserved object is desired, the data are retrieved from storage and the object is materialized in a target technology current at the time. This materialization requires translating from the persistent form to the native form of the target technology. If the three basic processes are conceived as columns and the three levels (data, information, knowledge) as rows, the persistent archives architecture can be depicted in a 3-by-3 grid (Moore et al. 2000).

The persistent archives architecture is independent of the technology infrastructure in which it is implemented at any time. It achieves this independence through loose coupling of its basic building blocks, using software mediators to link each pair of adjacent blocks. Interactions are between adjacent blocks vertically and horizontally, but not diagonally. Over time, as the components used to implement any block are updated, there is no need to change any of the other blocks, only the mediators.

Conclusion: The Open End

There is an inherent paradox in digital preservation. On the one hand, it aims to deliver the past to the future in an unaltered, authentic state. On the other hand, doing so inevitably requires some alteration. All the methods described in this paper entail altering or replacing hardware, software, or data, and sometimes more than one of these. This paradox is compounded by the fact that in the future, as today, people will want to use the best available technology—or at least technologies they know how to use—for discovery, retrieval, processing, and delivery of preserved information. There is a danger that to the degree that preservation solutions keep things unaltered they will create barriers to satisfying this basic user requirement. Adding to this the recognition that the problem of digital preservation is not static, that it will continue to evolve as information technology and its application in the production of valuable information change, reinforces the paradox to the point that any solution to the challenge of digital preservation must be inherently evolutionary. If the preservation solution cannot grow and adapt to continuing changes in the nature of the problem and continuing escalation of user demands, the “solution” itself will in short order become part of the problem that is, it will itself become obsolete.

This paradox can be resolved only through the elaboration of a basic conceptual framework for digital preservation—a framework that allows us to identify and analyze all that is involved in the process of digital preservation and to understand how different facets of that process affect each other. Fortunately, such a framework has been articulated over the last few years and has become an international standard. It is the OAIS reference model. While the OAIS model was developed for the space science community, its articulation was, from the beginning, both international and multidisciplinary. As a result, the model has broad applicability. The OAIS model provides a frame of reference in which we can balance the need for preserving digital objects unaltered and the need to keep pace with changing IT, both to encompass new classes of digital objects and to capitalize on technological advances to improve preservation services (ISO 2002).

However, the OAIS model is too generalized to suffice for implementation. It needs to be refined and extended to be useful in specific domains. One example of such refinement has been articulated for the domain of records. The International research on Permanent Authentic Records in Electronic Records (InterPARES) project is a multinational, multidiscipline research collaboration whose name reflects its basic objective. To fine-tune the OAIS framework for the specific goal of preserving authentic records, the InterPARES Project developed a formal Integrated DEFinition (IDEF) process model for what is required to preserve authentic digital records. This “Preserve Electronic Records” model retains the functions of an OAIS but adds specific archival requirements and archival knowledge. Archival requirements act as specific controls on the preservation process, and archival knowledge was the basis for further refinement of the preservation process. In turn, the process of developing the archival model led to advances in archival knowledge specifically, to clarification of the characteristics of electronic records at the physical, logical, and conceptual levels, and to improvements in our understanding of what it means to preserve electronic records. The InterPARES Preserve Electronic Records model includes specific paths for accommodating new classes of electronic records over time and for taking advantage of improvements in IT (Preservation Task Force in press).

The InterPARES model illustrates how, starting from the OAIS reference model, one can construct an open-ended approach to digital preservation and effectively address the paradoxical challenge of digital preservation. This case can serve as an example for other domains. There is undeniably a pressing need for technological methods to address the challenge of digital preservation. There is a more basic need for an appropriate method of evaluating alternative methods, such as the two-way grid described in this paper. Finally, there is an overriding need to select and implement preservation methods in an open-ended system capable of evolving in response to changing needs and demands.


1 An earlier version of this paper appeared as “Digital Preservation Techniques: Evaluating the Options” in Archivi & Computer: Automatione e Beni Culturali 10 (2/01): 101­109.

2 Each image displays the hexadecimal values of (1) in the leftmost column, the position of first byte in that row relative to the start of the file, and (2) the numeric values of 16 bytes starting with the numbered one. It also shows the printable ASCII characters, or a ‘.’ for unprintable bytes in the rightmost column.


All URLs were valid as of July 10, 2002.

Institute of Electrical and Electronics Engineers, Inc. (IEEE). June 2001. Transactions on Computers—Special Issue on Dynamic Optimization.

International Standards Organization (ISO). 2002. Open Archival Information System—Reference Model. The draft international standard is available at

InterPARES Project Preservation Task Force. 2001. How to Preserve Electronic Records. Available at

InterPARES Project Preservation Task Force. In press. Report of the Preservation Task Force. A preliminary version of the report is available at

Jefferson, Thomas. 1801. Note from Thomas Jefferson regarding the disposition of his Presidential papers, December 29, 1801. Washington, D.C.: National Archives, General Records of the Department of State. RG 59. Available at

Lorie, Raymond A. 2000. The Long-Term Preservation of Digital Information. Available at

Lynch, Clifford. 2000. Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust. In Authenticity in a Digital Environment. Washington, D.C.: Council on Library and Information Resources. Available at

Moore, Reagan et al. 2000. Collection-Based Persistent Digital Archives—Part 1. D-Lib Magazine 6(3). Available at

Rajasekar, Arcot, Reagan Moore, and Michael Wan. Syntactic and Semantic Replicas: Rosetta Stones for Long-Term Digital Preservation. Unpublished manuscript.

Rothenberg, Jeff. 2000. Preserving Authentic Digital Information. In Authenticity in a Digital Environment. Washington, D.C.: Council on Library and Information Resources. Available at

Task Force on Archiving of Digital Information. 1996. Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. Washington, D.C.: Commission on Preservation and Access, and Mountain View, Calif.: Research Libraries Group. Available at

Thibodeau. Kenneth. 1997. Boundaries and Transformations: An Object-Oriented Strategy for Preservation of Electronic Records. European Commission. In INSAR Supplement II: the Proceedings of the DLM-Forum on Electronic Records. Luxembourg: Office for Official Publications of the European Communities.

Wing, Jeannette M., and John Ockerbloom. 2000. Respectful Type Converters. IEEE Transactions on Software Engineering 26(7): 579-93.

Troubleshooting the bq command-line tool

This section shows you how to resolve issues with bq command-line tool.

Keep your Cloud SDK up to date

If you are using the bq command-line tool from the Cloud SDK, then make sure that you have the latest functionality and fixes for the bq command-line tool by keeping your Cloud SDK installation up to date. To see whether you are running the latest version of the Cloud SDK, enter the following command in Cloud Shell:

The first two lines of the output display the version number of your current Cloud SDK installation and the version number of the most recent Cloud SDK. If you discover that your version is out of date, then you can update your Cloud SDK installation to the most recent version by entering the following command in Cloud Shell:


You can enter the following commands to debug the bq command-line tool:

See requests sent and received. Add the --apilog= PATH_TO_FILE flag to save a log of operations to a local file. Replace PATH_TO_FILE with the path that you want to save the log to. The bq command-line tool works by making standard REST-based API calls, which can be useful to see. It's also useful to attach this log when you're reporting issues. Using - or stdout instead of a path prints the log to the Google Cloud Console. Setting --apilog to stderr outputs to the standard error file.

Troubleshoot errors. Enter the --format=prettyjson flag when getting a job's status or when viewing detailed information about resources such as tables and datasets. Using this flag outputs the response in JSON format, including the reason property. You can use the reason property to look up troubleshooting steps.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Watch the video: ArcMap Data Driven Pages