Collaborative Librarians

Data don't tell the whole story.

CI Article: Synergizing in Cyberinfrastructure Development January 9, 2012

Filed under: CI Article,Coordinating Centers,Cyberinfrastructure,eScience — Betsy Rolland @ 10:53 am

Bietz, M. J., E. P. S. Baumer, C. P. Lee. (2010). “Synergizing in Cyberinfrastructure Development.” Computer Supported Cooperative Work, 19(3-4): 3-4.

Bietz et al. studied a nascent marine metagenomics collaboration called Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), focusing on the work of the developers in creating infrastructure for the group. This paper takes the authors’ earlier work on human infrastructure (Lee et al 2006) and expands it to include notions of synergizing, leveraging and aligning. They define synergizing as the “active, strategic work of managing multiple relationships for infrastructure development” (p. 251) and relate it to the concept of the embeddedness of the developers as both a constraint and a resource. Developers, defined as anyone involved in the development of a new infrastructure, are required to work within the rules and limitations of the various infrastructures in which they are already embedded (e.g., a university, a development team, an academic discipline), while they are able to take advantages of the relationships they have at their disposal thanks to those infrastructures (e.g., coworkers from former development projects, existing technology transfer agreements with other universities). Developers leverage existing relationships and technologies in service of their goals, while also aligning themselves with others to get work done.

The bottom line here is that CI cannot be fully understood without taking into account both the social and technological issues inherent in building new infrastructure. For example, the authors demonstrate how some tech decisions are made for social reasons, such as choosing the software the university already supports even if it’s not the most robust or sharing server space with collaborators rather than purchasing one’s own.

Like Lee et al.’s original human infrastructure paper, I find this work very useful for my own research on coordinating centers because of its focus on the messiness of science. I think it’s a myth that it’s possible to implement scientific research according to a 5-year plan; the very raison d’etre of science is exploring something we don’t fully understand. In fact, it would be an interesting study to compare the timeline proposed in grant proposals with what actually happened in the project! A research project needs to retain enough flexibility to respond to changes in not only the science and technology but also the people involved. Can we embrace the messiness of science instead of trying to control it with arbitrary schedules and deadlines?

Lee, C. P., Dourish, P., & Mark, G. (2006). The human infrastructure of cyberinfrastructure. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 483–492). New York: ACM.

 

CI Article: “Coordination costs and project outcomes in multi-university collaborations” February 3, 2011

Filed under: CI Article,Coordinating Centers,Information Problems,Librarians — Betsy Rolland @ 7:00 am

Cummings, J. N. and S. Kiesler (2007). “Coordination costs and project outcomes in multi-university collaborations.” Research Policy 36(10): 1620-1634.

I stumbled across this article while making my way through last year’s Science of Team Science Conference. I listened to Jonathon Cummings present an overview of some of what’s in this paper. I highly recommend getting a copy of the paper if you’re at all interested in supporting collaborative research. While I agree with the authors that their results aren’t necessarily generalizable to all domains (they focused on a single grant program in area of interdisciplinary IT research and education), I appreciate the focus on coordination. It seems as though funding agencies and even the institutions themselves underestimate the difficulties inherent in multi-institutional collaborative research. Adding in the complexity of interdisciplinarity and coordination gets even more difficult.

This is one of the first articles I’ve seen that correlates successful research to specific activities of the collaboration, such as co-authorship, student exchanges, having a web portal and email lists, etc. I think the suggestion that Cummings and Kiesler make at the end, that perhaps all large collaborations should first have a small exploratory grant to support the development of the collaboration, is an excellent one. Such support would allow groups to work together to develop trust and establish a group identity. Cummings and Kiesler also suggest that funding agencies invest in training scientists on *how* to collaborate and coordinate large research projects. I would argue that this would be an excellent task for the institutions themselves to take on, in coordination with funding agencies. I would also argue that this is yet another area where librarians, with expertise in user needs assessment and community development, could make a huge impact.

 

CI Article: “New Knowledge from Old Data : The Role of Standards in the Sharing and Reuse of Ecological Data” February 1, 2011

Filed under: CI Article,Curation,Cyberinfrastructure,Data,eScience — Betsy Rolland @ 7:00 am

Zimmerman, A. (2008). “New Knowledge from Old Data.” Science, Technology, & Human Values 33(5): 631-652.

Zimmerman interviewed 13 ecologists about their use of secondary data (i.e., data they did not collect themselves) in order to tease out the role standards might play in the process of re-using data for new analyses. She found that the primary determinant in an ecologist’s decision to use the data was the researcher’s own ability to understand the data. This understanding was heavily contingent upon the researcher’s field experience and knowledge of collecting similar data. If the ecologist considered the data to be generally difficult data to collect or the kind of data that was frequently poorly understood, the data were not used. A second consideration was the reputation of the data collectors themselves or a personal relationship with the data collectors.

Zimmerman concludes that standards, while potentially useful, would be difficult to develop because the collection of data is so context-dependent. In short, the research questions determine how the data are collected and which data points are important. It would be a staggering task to try to develop standards that would cover every context and approach. Even if that were possible, science moves so quickly that the standards would likely be obsolete by the time they were approved.

There was no mention in this article about the potential for others to help with the curation or development of understanding of the data. Does the individual investigator need to be involved or is this a question that can be delegated to graduate students or a data manager? Was it a collective decision or one made by the lead researcher? The participants described a process of repeatedly going back to the journal article where the secondary data are described. I would have liked to know more about what types of information they were looking for when they did that. Which types of contextual information were most important to them? Could they even tell us or is that another form of tacit knowledge they find difficult to articulate?

 

CI Article: Data at Work: Supporting Sharing in Science and Engineering January 29, 2011

Filed under: CI Article,Cyberinfrastructure,Data,eScience — Betsy Rolland @ 3:07 pm

Birnholtz, J. P. and M. J. Bietz (2003). Data at work: supporting sharing in science and engineering. Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. Sanibel Island, Florida, USA, ACM: 339-348.

Recent calls for open science and data sharing suggest that funding agencies believe that groundbreaking scientific research requires more data sharing among scientists. Even if we provide the technical means to move data from one lab to another, however, there may be social barriers to effectively using this data in practice. To design technologies that truly support the conduct of science, and not just the sharing of a data set, we argue that the designer must understand both the scientific role that data play in producing knowledge, and the social role that data play in the conduct of scientific work. (p. 340)

In this article on sharing scientific data, Birnholtz & Bietz discuss the social nature of data in collaborative research, describing some of the problems inherent in trying to share data between and among researchers, as well as what CSCW researchers can learn by thinking of data in this way.

The authors categorize the difficulties of sharing data into three categories: “1) willingness to share, 2) locating shared data, and 3) using shared data.” Data are often the end result of years of hard work and represent a scientist’s work product. Giving that hard work away to someone doesn’t make sense. Finding data is a huge challenge, given the lack of a central registry. In my experience, scientists use a variety of strategies, including journal papers, government data repositories and colleagues. Once data have been located and received (a process with its own set of issues), actually using the data is fraught with difficulties. Assessing quality and trustworthiness is especially difficult if an investigator doesn’t have access to the original data collectors. Just simply knowing what the data actually represent is also a challenge. Many data sets, especially older, legacy data, may not have a relevant data dictionary or anyone who remembers what a specific variable meant. As Birnholtz & Bietz point out, “Even if documentation is provided, however, it is often the case that much of the knowledge needed to make sense of data sets is tacit.” How can we capture that information?

The authors go on to discuss how data sharing and data practices vary among the three fields they study, earthquake engineers, space physics and HIV/AIDS research. The finish with a set of recommendations for CSCW researchers.

An intriguing thread not further developed in this article is the idea that the level of task uncertainty in a given field affects or influences the frequency or types of data sharing that occur. This makes intuitive sense to me, as I think of my experience in cancer epidemiology. While epidemiological data aren’t standardized, by any stretch, they seem to describe a finite universe — characteristics of people and their environments and habits. Physical activity, smoking, diseases, environment are known concepts about which to collect data. A more fluid, less established field may have more variation in the data collected. I would really like to see this area further developed, as I think it has the potential to really help us think about data sharing.

I really appreciate this article’s emphasis on data and science as socially constructed, because I think it gives us the opportunity to think of supporting science in ways that lie outside of technological solutions. It’s not enough to construct a database that combines two disparate data sets if the context and tacit knowledge inherent in the data sets aren’t taken into consideration. Without a true understanding of the data, harmonization fails and, worse, leads to bad science.

 

CI Article:Tensions across the scales: Planning infrastructure for the long-term January 17, 2011

Filed under: CI Article,Cyberinfrastructure,eScience — Betsy Rolland @ 12:05 pm

Ribes, D., & Finholt, T. A. (2007). Tensions across the scales: Planning infrastructure for the long-term Proceedings of the 2007 International ACM Conference on Supporting Group Work (pp. 229-238). New York: ACM.

Ribes & Finholt describe nine tensions inherent in the move from short-term to long-term infrastructure for science. These tensions are the intersection of three “concerns of actors” and three “scales of infrastructure.” Their aim is not to prescribe how to build infrastructure for the long-term, as no one yet knows how to do that, but to define a set of researchable questions around this topic so that we can begin to get an idea of what to pay attention to.

The first tension Ribes & Finholt discuss is “Project vs. facility,” noting that most CI endeavors are funded as projects, with finite timelines and scopes and no clear path to renewal of funding. This discourages the kind of long-term planning and thinking that could add stability to a CI infrastructure and most likely leads to wasting money. Rather than investing in one CI project for a domain community, funding agencies fund smaller projects, each of which builds its own CI.

Ribes & Finholt’s second tension speaks to “Individual vs. community interests.” This is a common theme in discussions of CI — building large infrastructure projects to support science requires not only computer scientists but also domain experts. Yet the reward system for scientists doesn’t give credit for that type of work. If  only a domain expert can generate appropriate metadata for a database of genetic structures but the time s/he spends on that task doesn’t help in the race toward tenure, the expert won’t be able to justify the time spent. But then the whole community loses out. This same argument applies to proactively preparing data to share, submitting to open access journals that aren’t yet valued by the community, etc. Some of the issues are also explored in the tension “Research vs. development.”

After describing the other tensions, Ribes & Finholt conclude with an emphasis on the human side of infrastructure, drawing upon the Charlotte Lee, et al, paper on human infrastructure (reference below).  Ribes & Finholt note: “[h]owever, while the work of design and development is ‘human,’ the challenges are more comprehensively described as technical, organizational and institutional. In considering design and enactment of infrastructure it is best to address ‘hard and soft’ foundations hand-in-hand, they are usually more intimately entwined than any raw distinction would suggest (236)” (emphasis in original).

One of the things I like about this article is that Ribes & Finholt focus not only on the domain scientists and computer scientists themselves but the project managers, as well. This group is often hidden or forgotten in the writing on CI but is a critical path in the success or failure of a project.

 

Lee, C. P., Dourish, P., & Mark, G. (2006). The human infrastructure of cyberinfrastructure Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 483 – 492). New York: ACM.

 

 
Follow

Get every new post delivered to your Inbox.