In October 2016, I began working at the MIT Libraries as a CLIR/DLF Postdoctoral Fellow in Software Curation. CLIR began offering postdoctoral fellowships in data curation in 2012; however, myself and three others were part of the first cohort conducting research in the area of Software Curation. At our fellowship seminar and training this summer,the four of us joked about not having any idea what we would be doing (and Google wasn’t much help). Indeed, despite years of involvement in digital curation, I was unsure of what it might mean to curate software. As has been well-documented in the library/archival science community, curation of data means many different things to many different people. Add in the term “software” and you increase the complexities.
At MIT Libraries, I was given the good fortune of working with two distinguished and esteemed experts in library research: Nancy McGovern, the Director of the Digital Preservation Program and Micah Altman, the Director of Research. This blog post describes the first phase of our work together in defining a research agenda for software curation as an institutional asset.
As we began to suss out possible research objectives and assorted activities, we found ourselves circling back to four central questions – which themselves split into associated sub-questions.
- What is software? What is the purpose and function of software? What does it mean to curate software? How do these practices differ from preservation?
- When do we curate software? Is it at the time of creation? Or when it becomes acquired by an institution?
- Why do institutions and researchers curate software?
- Who is institutionally responsible for curating software and for whom are we curating software?
Developing Focus and Purpose
We also began to outline the types of exploratory research questions we might ask depending on the specific purpose and entities we were creating a model for (see Table 1 below). Of course, these are only some of the entities that we could focus on; we could also broaden our scope to include research questions of interest to software publishers, software journals, or funding agencies interested in software curation.
|Entity||All libraries/archives||MIT Libraries|
|Research library||What does a library need to safeguard + preserve software as an asset? How are other institutions handling this? How are funding agencies considering research on software curation?||What are the MIT libraries’ existing and future needs related to software curation?|
|Software creator||What are the best practices software creators should adopt when creating software? How are software creators depositing their software and how are journals recommending they do this?||What are the individual needs and existing practices of software creators served by the MIT Libraries?|
|Software user||What are the different kinds of reasons why people may use software? What are the conditions for use? What are the specific curation practices we should implement to make software usable for this community?||What do individuals served by the MIT Libraries need to able to reuse software?|
Table 1: Research questions by entity and intended audience
Importantly, we wanted to adopt an agile research approach that considered software as an artifact, rather than (simply) as an outcome to be preserved and made accessible. Curation in this sense might seek to answer ontological questions about software as an entity with significant characteristics at different levels of representation. Certainly, digital object management approaches that emphasize documentation of significant properties or characteristics are long-standing in the literature. At the same time, we wanted our approach to address essential curatorial activities (what Abby Smith termed “interventions”) that help ensure digital files remain accessible and usable.  We returned to our shared research vision: to devise a set of conceptual models for software curation strategies to assist research outcomes that rely on the creation, use, reuse, and study of software.
Statement of Research Objectives and Working Definitions
Given the preponderance of definitions for curation and the wide-ranging implications of curating for different purposes and audiences, we thought it would be essential for us to identify and make clear our particular interests. We developed the following statement to best describe our goals and objectives:
Libraries and archives are increasingly tasked with responsibilities related to the effective long-term preservation and curation of software. The purpose of our work is to investigate and make recommendations for strategies that institutions can adopt for managing software as complex digital objects across generations of technology.
We also developed the following working definition of software curation for use in our research:
Software curation encompasses the active practices related to the creation, acquisition, appraisal and selection, description, transformation, preservation, storage, and dissemination/access/reuse of software over short and long periods of time.
The next phase of our research involves formalizing our research approach through the evaluation, selection, and application of relevant models (such as the OAIS Reference Model) and ontologies (such as the SWO). We are also developing different data curation profiles to flesh out the activities, roles, and relationships that are bound up in software creation, use, and reuse. In addition to reporting on the status of our project, you can expect to read blog posts about both the philosophical and practical implications of curating software in an academic research library setting.
 As Abby Smith notes, “We have to intervene continually to keep digital files alive. We cannot put a digital file on a shelf and decide later about preservation intervention. Storage means active intervention.” See: Smith, Abby (2000). Authenticity in Perspective. In Authenticity in a Digital Environment, Council on Library and Information Resources.
A version of this post first appeared on MIT’s Program for Information Science blog.