PEGI Project
Preservation of Electronic Government Information


Project updates and related notes

PEGI DLC Meeting Table Notes - Scott Matheson

After hearing a bit about the state of electronic government information, the group at table 3 set out to identify gaps where information might get lost and identify the information most at risk of loss and what threats might lead to that loss.

We identified a large group of materials where preservation is a priority and where we felt good faith efforts to preserve the content were underway. This group included publications in GPO’s FDSys, materials designated in the Presidential Records Act, and congressional committee records held by the Center for Legislative Archives.

We also identified some materials we felt were at risk for various reasons, including agency materials in innovative or non-standard formats. Examples included older CD-ROMs (which were innovative when distributed) or today’s social media channels. Some information production or dissemination is privatized – done outside of federal agencies – and could be at higher risk. Finally, we noted that there is risk that time-series data will no longer be produced going forward, making the existing data less useful, though we acknowledged there is no real remedy to this issue in the library/archives space.

Some of the risk factors and mitigation ideas we identified are a lack of capacity to quickly deal with disbanding agencies or commissions – a “records SWAT team” was suggested. Similarly, there is not always an “exit strategy” for cloud-hosted data when a project, contract, or commission ends. These contracts should include transfers to NARA or other appropriate repository. A “federal information impact statement” similar to an environmental impact statement was suggested as one way to raise awareness of these issues.

Other risks included the appropriate scheduling of new types of records or forms of information, and a risk to continuous access because of records that are transferred to NARA but not made publicly available before they are removed from the hosting web site.

Our group identified born-digital information as most at-risk. We noted that information digitized from paper sources was less of a priority for preservation than born-digital materials because it could be re-digitized if the electronic versions were lost or became inaccessible (provided sufficient paper copies are preserved).

In the second discussion session, we talked about what is important to preservation efforts – how should we work on it, are there groups or processes we can leverage to accomplish the task and how does the work align with the mission of our organizations.

We discussed the interaction of preservation and access and noted that it might make sense to only collect things where access is (or will eventually be) possible. The group repeatedly noted that materials should be useful and usable, including creating metadata to provide access to patrons. We also noted a need for metadata to provide an inventory to ensure we know what information has been preserved and what has not.

Existing infrastructure could be leveraged to preserve and provide access to born-digital information. We identified existing collaborative repository models like HathiTrust, CRL, or LLMC Digital which could gather information from existing systems like FDSys or NARA’s ERA, but where the community could add other digital government information (which would be out of scope for the existing systems). Such a system or network would be built for “our” end-users – where that definition might vary by group. Different systems might be developed for different communities, offering different features, like change tracking or full-text indexed for natural language processing research. We noted the importance of keeping the public domain public, so that these repositories could interoperate

Finally, we brainstormed other places or groups who are stakeholders in the preservation of electronic government information. These includes scientists, historians, policy researchers, think tanks, librarians (of all types), archivists, records managers, and preservationists. We hope to engage all these data users and researchers over time and incorporate their needs and activities into the PEGI report.

 The group at table 3 identifying issues with preservation of electronic government information.

The group at table 3 identifying issues with preservation of electronic government information.

Scott Mathesonmini-forum