Collaborative Librarians

Data don't tell the whole story.

CI Article: Synergizing in Cyberinfrastructure Development January 9, 2012

Filed under: CI Article,Coordinating Centers,Cyberinfrastructure,eScience — Betsy Rolland @ 10:53 am

Bietz, M. J., E. P. S. Baumer, C. P. Lee. (2010). “Synergizing in Cyberinfrastructure Development.” Computer Supported Cooperative Work, 19(3-4): 3-4.

Bietz et al. studied a nascent marine metagenomics collaboration called Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), focusing on the work of the developers in creating infrastructure for the group. This paper takes the authors’ earlier work on human infrastructure (Lee et al 2006) and expands it to include notions of synergizing, leveraging and aligning. They define synergizing as the “active, strategic work of managing multiple relationships for infrastructure development” (p. 251) and relate it to the concept of the embeddedness of the developers as both a constraint and a resource. Developers, defined as anyone involved in the development of a new infrastructure, are required to work within the rules and limitations of the various infrastructures in which they are already embedded (e.g., a university, a development team, an academic discipline), while they are able to take advantages of the relationships they have at their disposal thanks to those infrastructures (e.g., coworkers from former development projects, existing technology transfer agreements with other universities). Developers leverage existing relationships and technologies in service of their goals, while also aligning themselves with others to get work done.

The bottom line here is that CI cannot be fully understood without taking into account both the social and technological issues inherent in building new infrastructure. For example, the authors demonstrate how some tech decisions are made for social reasons, such as choosing the software the university already supports even if it’s not the most robust or sharing server space with collaborators rather than purchasing one’s own.

Like Lee et al.’s original human infrastructure paper, I find this work very useful for my own research on coordinating centers because of its focus on the messiness of science. I think it’s a myth that it’s possible to implement scientific research according to a 5-year plan; the very raison d’etre of science is exploring something we don’t fully understand. In fact, it would be an interesting study to compare the timeline proposed in grant proposals with what actually happened in the project! A research project needs to retain enough flexibility to respond to changes in not only the science and technology but also the people involved. Can we embrace the messiness of science instead of trying to control it with arbitrary schedules and deadlines?

Lee, C. P., Dourish, P., & Mark, G. (2006). The human infrastructure of cyberinfrastructure. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (pp. 483–492). New York: ACM.

 

Coordinating Centers in Cancer Epidemiology Research January 6, 2012

Filed under: Coordinating Centers — Betsy Rolland @ 5:01 pm

A couple of very belated announcements here… I’m pleased to share with you all my (somewhat) recently published article from Cancer Epidemiology, Biomarkers & Prevention:

Coordinating Centers in Cancer-Epidemiology Research: The Asia Cohort Consortium Coordinating Center
Cancer Epidemiology, Biomarkers and Prevention. Published OnlineFirst July 29, 2011.
Rolland B, Smith BR, Potter JD.

It was very exciting to see my first first-author paper in print! The article was officially published in the October issue of CEBP.

In related news, I was honored to receive my first grant from NCI, with my co-PI (and PhD advisor) Charlotte Lee. The grant will allow us to continue our research on coordinating centers here at the Fred Hutchinson Cancer Research Center. FHCRC is home to some of the most successful multi-institutional cancer epidemiology collaborations, some of which have been ongoing for more than a decade, like the Women’s Health Initiative. I hope to help codify the lessons they’ve learned to help other coordinating centers work more effectively. Once I finish my PhD, I plan to expand this research to other cancer centers around the country to identify common strategies for success.

 

First collaborative paper March 2, 2011

Filed under: Collaboratories,Coordinating Centers — Betsy Rolland @ 1:36 pm

I am thrilled to share my group’s first paper: The Asia Cohort Consortium’s “Association between Body-Mass Index and Risk of Death in More Than 1 Million Asians,” published last week in the New England Journal of Medicine. This paper is an amazing accomplishment for our group. The analyses (i.e., time for the project manager and statisticians) were funded by the Fred Hutchinson Cancer Research Center, but none of the contributing cohorts received any money for their participation. It is a shining example of scientists participating in collaborative activities for the opportunity to be part of something bigger than themselves. That the project resulted in a publication in a top-tier journal is simply icing on the cake!

Zheng W, McLerran DF, Rolland B, Zhang X, Inoue M, Matsuo K, He J, Gupta PC, Ramadas K, Tsugane S, Irie F, Tamakoshi A, Gao YT, Wang R, Shu XO, Tsuji I, Kuriyama S, Tanaka H, Satoh H, Chen CJ, Yuan JM, Yoo KY, Ahsan H, Pan WH, Gu D, Pednekar MS, Sauvaget C, Sasazuki S, Sairenchi T, Yang G, Xiang YB, Nagai M, Suzuki T, Nishino Y, You SL, Koh WP, Park SK, Chen Y, Shen CY, Thornquist M, Feng Z, Kang D, Boffetta P, Potter JD. (2011). “Association between Body-Mass Index and Risk of Death in More Than 1 Million Asians.” New England Journal of Medicine 364(8): 719-729.

 

Is “Big Science” better? February 4, 2011

Filed under: Uncategorized — Betsy Rolland @ 6:00 am

I attended a seminar on Monday (1/31/11), hosted by Sage Bionetworks, called Establishing a ‘TCP/IP’ for Human Biology: A Summit on Human Data Interoperability. There were several interesting presentations and intriguing ideas presented. But I left feeling vaguely dissatisfied. So much money is being invested in building huge data repositories of related (and sometimes unrelated) data, but is it really the best way forward? Is there evidence that the best way to answer questions about human health is through large-scale genetic analyses? How do we know the science is good when data are stripped of their context and dumped into a repository? Is personalized medicine really achievable and worthwhile?

As the funding situation gets more and more difficult, is spending billions on large data repositories more cost-effective than focusing on smaller projects? Would it be better to focus more on prevention and less on curing preventable diseases? In short, is Big Science really better?

 

CI Article: “Coordination costs and project outcomes in multi-university collaborations” February 3, 2011

Filed under: CI Article,Coordinating Centers,Information Problems,Librarians — Betsy Rolland @ 7:00 am

Cummings, J. N. and S. Kiesler (2007). “Coordination costs and project outcomes in multi-university collaborations.” Research Policy 36(10): 1620-1634.

I stumbled across this article while making my way through last year’s Science of Team Science Conference. I listened to Jonathon Cummings present an overview of some of what’s in this paper. I highly recommend getting a copy of the paper if you’re at all interested in supporting collaborative research. While I agree with the authors that their results aren’t necessarily generalizable to all domains (they focused on a single grant program in area of interdisciplinary IT research and education), I appreciate the focus on coordination. It seems as though funding agencies and even the institutions themselves underestimate the difficulties inherent in multi-institutional collaborative research. Adding in the complexity of interdisciplinarity and coordination gets even more difficult.

This is one of the first articles I’ve seen that correlates successful research to specific activities of the collaboration, such as co-authorship, student exchanges, having a web portal and email lists, etc. I think the suggestion that Cummings and Kiesler make at the end, that perhaps all large collaborations should first have a small exploratory grant to support the development of the collaboration, is an excellent one. Such support would allow groups to work together to develop trust and establish a group identity. Cummings and Kiesler also suggest that funding agencies invest in training scientists on *how* to collaborate and coordinate large research projects. I would argue that this would be an excellent task for the institutions themselves to take on, in coordination with funding agencies. I would also argue that this is yet another area where librarians, with expertise in user needs assessment and community development, could make a huge impact.

 

Science of Team Science Conference February 2, 2011

I registered recently to add the 2011 Science of Team Science Conference, hosted by Northwestern University’s NUCATS Institute and its Research Team Support & Development office. I couldn’t be more excited. I wasn’t able to attend last year’s conference because of my heavy travel schedule for the SLA research grant, so I’m thrilled to be able to attend this year.

I’ve been virtually attending last year’s conference via the PPT and MP3 recordings they’ve posted for each session. This is a treasure trove of information and worth perusing. I’ve listened to several presentations so far and have read the minutes, which are well done and really capture the essence of each conversation. They even captured the Q&A sessions!

I think it will be especially interesting to attend in my dual role as social science researcher and practitioner, as this doesn’t seem to be very common. I have to admit, I’ve been a little disappointed about the lack of discussion about libraries, librarians or even information management. I may submit a poster on that topic, just to make sure it makes it onto the radar.

 

CI Article: “New Knowledge from Old Data : The Role of Standards in the Sharing and Reuse of Ecological Data” February 1, 2011

Filed under: CI Article,Curation,Cyberinfrastructure,Data,eScience — Betsy Rolland @ 7:00 am

Zimmerman, A. (2008). “New Knowledge from Old Data.” Science, Technology, & Human Values 33(5): 631-652.

Zimmerman interviewed 13 ecologists about their use of secondary data (i.e., data they did not collect themselves) in order to tease out the role standards might play in the process of re-using data for new analyses. She found that the primary determinant in an ecologist’s decision to use the data was the researcher’s own ability to understand the data. This understanding was heavily contingent upon the researcher’s field experience and knowledge of collecting similar data. If the ecologist considered the data to be generally difficult data to collect or the kind of data that was frequently poorly understood, the data were not used. A second consideration was the reputation of the data collectors themselves or a personal relationship with the data collectors.

Zimmerman concludes that standards, while potentially useful, would be difficult to develop because the collection of data is so context-dependent. In short, the research questions determine how the data are collected and which data points are important. It would be a staggering task to try to develop standards that would cover every context and approach. Even if that were possible, science moves so quickly that the standards would likely be obsolete by the time they were approved.

There was no mention in this article about the potential for others to help with the curation or development of understanding of the data. Does the individual investigator need to be involved or is this a question that can be delegated to graduate students or a data manager? Was it a collective decision or one made by the lead researcher? The participants described a process of repeatedly going back to the journal article where the secondary data are described. I would have liked to know more about what types of information they were looking for when they did that. Which types of contextual information were most important to them? Could they even tell us or is that another form of tacit knowledge they find difficult to articulate?

 

Data sharing plans January 31, 2011

Filed under: Curation,Data,eScience — Betsy Rolland @ 8:00 am

An upcoming commentary in the Lancet (Walport, M. and P. Brest “Sharing research data to improve public health.” The Lancet In Press, Corrected Proof.), signed by the leaders of key funding agencies, made clear that these agencies will join major journals in demanding that data be deposited as a condition of funding or article publication. But then what? What do the agencies plan to do with the data sets they receive? How will they safeguard them, how will they provide and monitor access? What are the plans to protect patient privacy? This is especially crucial in the case of genome-wide association studies (GWAS) where genetic data would be deposited. Theoretically, I understand, it’s possible that a patient could be identified by those data alone.

Assuming such questions get ironed out, who is the intended audience for such data sets? As discussed in many key articles on data sharing, data sets can’t simply be handed over with no further explanation. Absent standards for data curation, it’s difficult to believe many data sets can be downloaded by a new research team and used without an investment of time from the original data collectors. How many researchers will be willing to take the time to help someone else understand the context of the study and even the specific meaning of each variable? Often, especially in cancer epidemiology, data are collected over a period of time, during which the protocol may change, producing a data set with one variable with multiple meanings.

Without contact with the data collection team or investigator, researchers will have a difficult time assessing the trustworthiness, reliability or appropriateness of any given data set. So, what is the goal of the funding agencies and journals in demanding deposit of data sets? Without a focus on the social aspects of data, as discussed by Birnholtz & Bietz, among others, and a greater understanding of how scientists actually use data, it’s hard to see how these data deposit initiatives move science forward.

 

CI Article: Data at Work: Supporting Sharing in Science and Engineering January 29, 2011

Filed under: CI Article,Cyberinfrastructure,Data,eScience — Betsy Rolland @ 3:07 pm

Birnholtz, J. P. and M. J. Bietz (2003). Data at work: supporting sharing in science and engineering. Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work. Sanibel Island, Florida, USA, ACM: 339-348.

Recent calls for open science and data sharing suggest that funding agencies believe that groundbreaking scientific research requires more data sharing among scientists. Even if we provide the technical means to move data from one lab to another, however, there may be social barriers to effectively using this data in practice. To design technologies that truly support the conduct of science, and not just the sharing of a data set, we argue that the designer must understand both the scientific role that data play in producing knowledge, and the social role that data play in the conduct of scientific work. (p. 340)

In this article on sharing scientific data, Birnholtz & Bietz discuss the social nature of data in collaborative research, describing some of the problems inherent in trying to share data between and among researchers, as well as what CSCW researchers can learn by thinking of data in this way.

The authors categorize the difficulties of sharing data into three categories: “1) willingness to share, 2) locating shared data, and 3) using shared data.” Data are often the end result of years of hard work and represent a scientist’s work product. Giving that hard work away to someone doesn’t make sense. Finding data is a huge challenge, given the lack of a central registry. In my experience, scientists use a variety of strategies, including journal papers, government data repositories and colleagues. Once data have been located and received (a process with its own set of issues), actually using the data is fraught with difficulties. Assessing quality and trustworthiness is especially difficult if an investigator doesn’t have access to the original data collectors. Just simply knowing what the data actually represent is also a challenge. Many data sets, especially older, legacy data, may not have a relevant data dictionary or anyone who remembers what a specific variable meant. As Birnholtz & Bietz point out, “Even if documentation is provided, however, it is often the case that much of the knowledge needed to make sense of data sets is tacit.” How can we capture that information?

The authors go on to discuss how data sharing and data practices vary among the three fields they study, earthquake engineers, space physics and HIV/AIDS research. The finish with a set of recommendations for CSCW researchers.

An intriguing thread not further developed in this article is the idea that the level of task uncertainty in a given field affects or influences the frequency or types of data sharing that occur. This makes intuitive sense to me, as I think of my experience in cancer epidemiology. While epidemiological data aren’t standardized, by any stretch, they seem to describe a finite universe — characteristics of people and their environments and habits. Physical activity, smoking, diseases, environment are known concepts about which to collect data. A more fluid, less established field may have more variation in the data collected. I would really like to see this area further developed, as I think it has the potential to really help us think about data sharing.

I really appreciate this article’s emphasis on data and science as socially constructed, because I think it gives us the opportunity to think of supporting science in ways that lie outside of technological solutions. It’s not enough to construct a database that combines two disparate data sets if the context and tacit knowledge inherent in the data sets aren’t taken into consideration. Without a true understanding of the data, harmonization fails and, worse, leads to bad science.

 

Scientists, use your librarians! January 18, 2011

Filed under: Information Problems,Librarians — Betsy Rolland @ 8:00 am

The New York Times online today had an interesting article about lack of citations in clinical trial literature:

Trial in a Vacuum: Study of Studies Shows Few Citations

The article details a study in which the authors from Johns Hopkins University Medical School looked at over 1,500 clinical trial reports and found that “[n]o matter how many randomized clinical trials have been done on a particular topic, about half the clinical trials cite none or only one of them.” While clearly deciding what is relevant to a new trial is a judgment call, this seems impossible to justify.

One possible explanation not noted by the author (problematic in itself) is that, in an age of Google, everyone considers him/herself an expert searcher. In truth, that’s not the case. Librarians have master’s degrees and are trained in searching. My favorite quote from my recent study of librarians in biomedical research was “You don’t do your own statistics, why do you think you can do your own searching?”

At the same time, librarians have, for far too long, waited for their patrons to come to them. It’s my belief that librarians (and those who love them) need to be much more assertive about getting in front of their clients and making arguments for the benefits of their services. Admittedly, this is a challenge. First, what happens if they all take you up on it? Library budgets have been cut dramatically, leaving fewer staff to serve more students and faculty. Second, there is insufficient evidence on the value of library services to researchers. There are descriptions of services offered, of information behaviors and how scientists use the library, but not hard evidence that quantifies the benefits. This research is crucial to justify increasing library budgets, but there are few grant programs available to support this type of research.

Finally, I think this article provides a justification for adding librarians to institutional review boards (IRBs), journal review boards, grant review panels and funding agencies. If each grant or human subjects application underwent an independent search, perhaps lives would be saved as clinical trial PIs were forced to reconsider whether their approach constituted something new or something already disproven.

 

 
Follow

Get every new post delivered to your Inbox.