Data Representation in the DARPA SD2 Program

Nicholas Roehner,Jacob Beal,Bryan Bartley,Richard Markeloff,Tom Mitchell,Tramy Nguyen,Daniel Sumorok,Nicholas Walczak,Chris J. Myers,Zach Zundel,James Scholz,Benjamin Hatch,Mark Weston,John Colonna-Romano

Data Representation in the DARPA SD2 Program

2021

1Modern scientific enterprises are often highly complex and multidisciplinary, particularly in areas like synthetic biology where the subject at hand is itself inherently complex and multidisciplinary. Collaboration across many organizations is necessary to efficiently tackle such problems [6, 15], but remains difficult. The challenge is further amplified by automation that increases the pace at which new information can be produced, and particularly so for matters of fundamental research, where concepts and definitions are inherently fluid and may rapidly change as an investigation evolves [7]. The DARPA program Synergistic Discovery and Design (SD2) aimed to address these challenges by organizing the development of data-driven methods to accelerate discovery and improve design robustness, with one of the key domains under study being synthetic biology. The program was specifically organized such that teams provided complementary types of expertise and resources, and without any team being in a dominant organizational position, such that subject-matter investigations would necessarily require peer-level collaboration across multiple team boundaries. With more than 100 researchers across more than 20 organizations, several of which ran experimental facilities with high-throughput automation, participants were forced to confront challenges around effective data sharing. The default architecture for scientific collaboration is essentially one of anarchy, with ad-hoc bilateral relations between pairs of collaborators or experimental phases (Figure 1(a)). This was by necessity the case during early phases of the SD2 program as well, in which incorporating new tools into pipelines was ad-hoc and time-consuming, and data was generally disconnected from genetic designs and experimental plans. The other typical approach for collaboration is one of "command and control", in which a dominant organization determines the data sharing content and format for all participants (Figure 1(b)). This can be efficient, but tends to be limited in flexibility and extensibility, rendering it unsuitable for research collaboration, as indeed was found when we attempted this approach during the first year of the SD2 program. We addressed these problems with the application of distributed standards to create a "flexible rendezvous" model of collaboration (Figure 1(c)), enabling information flow to track evolving collaborative relationships, improving the sharing and utility of information across the community and supporting accelerated rates of experimentation. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=71 SRC="FIGDIR/small/460644v1_fig1.gif" ALT="Figure 1"> View larger version (15K): org.highwire.dtl.DTLVardef@13ac64borg.highwire.dtl.DTLVardef@1b077feorg.highwire.dtl.DTLVardef@95f9e0org.highwire.dtl.DTLVardef@481be5_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 1:C_FLOATNO Architectures for data sharing: bilateral relations (a), command and control (b), and flexible rendezvous (c). C_FIG

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations