On the value of data – routinely vs purposefully

I listen to a bunch of podcasts, and the podcast “The Pitch” is one of them. In that podcast, Entrepreneurs of start-up companies pitch their ideas to investors. Not only is it amusing to hear some of these crazy business ideas, but the podcast also help me to understand about professional life works outside of science. One thing i learned is that it is ok if not expected, to oversell by about a factor 142.

Another thing that I learned is the apparent value of data. The value of data seems to be undisputed in these pitches. In fact, the product or service the company is selling or providing is often only a byproduct: collecting data about their users which subsequently can be leveraged for targeted advertisement seems to be the big play in many start-up companies.

I think this type of “value of data” is what it is: whatever the investors want to pay for that type of data is what it is worth. But it got me thinking about the value of data that we actually collect in medical. Let us first take a look at routinely data, which can be very cheap to collect. But what is the value of the data? The problem is that routinely collected data is often incomplete, rife with error and can lead to enormous biases – both information bias as well as selection bias. Still, some research questions can be answered with routinely collected data – as long as you make some real efforts to think about your design and analyses. So, there is value in routinely collected data as it can provide a first glance into the matter at hand.

And what is the case for purposefully collected data? The idea behind this is that the data is much more reliable: trained staff collects data in a standardised way resulting in datasets without many errors or holes. The downside is the “purpose” which often limits the scope and thereby the amount collected data per included individual. this is the most obvious in randomised clinical trials in which often millions of euro’s are spent to answer one single question. Trials often do no have the precision to provide answers to other questions. So it seems that the data can lose it value after answering that single question.

Luckily, many efforts were made to let purposefully collected keep some if its value even after they have served their purpose. Standardisation efforts between trials make it now possible to pool the data and thus obtain a higher precision. A good example from the field of stroke research is the VISTA collaboration, i.e the Virtual International Stroke Trials Archive”. Here, many trials – and later some observational studies – are combined to answer research questions with enough precision that otherwise would never be possible. This way we can answer questions with high quality of purposefully collected data with numbers otherwise unthinkable.

This brings me to a recent paper we published with data from the VISTA collaboration: “Early in-hospital exposure to statins and outcome after intracerebral haemorrhage”. The underlying question whether and when statins should be initiated / continued after ICH is clinically relevant but also limited in scope and impact, so is it justified to start a trial? We took the the easier and cheaper solution and analysed the data from VISTA. We conclude that

… early in-hospital exposure to statins after acute ICH was associated with better functional outcome compared with no statin exposure early after the event. Our data suggest that this association is particularly driven by continuation of pre-existing statin use within the first two days after the event. Thus, our findings provide clinical evidence to support current expert recommendations that prevalent statin use should be continued during the early in-hospital phase.1921

link

And this shows the limitations of even well collected data from RCT: as long as the exposure of interest is potentially provided to a certain subgroup (i.e. Confounding by indication), you can never really be certain about the treatment effects. To solve this, we would really need to break the bond between exposure and any other clinical characteristic, i.e. randomize. That remains the golden standard for intended effects of treatments. Still, our paper provided a piece of the puzzle and gave more insight, form data that retained some of its value due to standardisation and pooling. But there is no dollar value that we can put on the value of medical research data – routinely or purposefully collected alike- as it all depends on the question you are trying to answer.

Our paper, with JD in the lead, was published last year in the European Stroke Journal, and can be found here as well as on my Publons profile and Mendeley profile.

Messy epidemiology: the tale of transient global amnesia and three control groups

Clinical epidemiology is sometimes messy. The methods and data that you might want to use might not be available or just too damn expensive. Does that mean that you should throw in the towel? I do not think so.

I am currently working in a more clinical oriented setting, as the only researcher trained as a clinical epidemiologist. I could tell about being misunderstood and feeling lonely as the only who one who has seen the light, but that would just be lying. The fact is that my position is one privilege and opportunity, as I work with many different groups together on a wide variety of research questions that have the potential to influence clinical reality directly and bring small, but meaningful progress to the field.

Sometimes that work is messy: not the right methods, a difference in interpretation, a p value in table 1… you get the idea. But sometimes something pretty comes out of that mess. That is what happened with this paper, that just got published online (e-pub) in the European Journal of Neurology.  The general topic is the heart brain interaction, and more specifically to what extent damage to the heart actually has a role in transient global amnesia. Now, the idea that there might be a link is due to some previous case series, as well as the clinical experience of some of my colleagues. Next step would of course to do a formal case control-study, and if you want to estimate true measure of rate ratios, a lot effort has to go into the collection of data from a population based control group. We had neither time nor money to do so, and upon closer inspection, we also did not really need that clean control group to answer some of our questions that would progress to the field.

So instead, we chose three different control groups, perhaps better referred as reference groups, all three with some neurological disease. Yes, there are selections at play for each of these groups, but we could argue that those selections might be true for all groups. If these selection processes are similar for all groups, strong differences in patient characteristics of biomarkers suggest that other biological systems are at play. The trick is not to hide these limitations, but as a practiced judoka, leverage these weaknesses and turn them into a strengths. Be open about what you did, show the results, so that others can build on that experience.

So that is what we did. Compared patients with migraine with aura, vestibular neuritis and transient ischemic attack, patients with transient global amnesia are more likely to exhibitsigns of myocardial stress. This study was not designed – nor will if even be able to – understand the cause of this link, not do we pretend that our odds ratios are in fact estimates of rate ratios or something fancy like that. Still, even though many aspects of this study are not “by the book”, it did provide some new insights that help further thinking about and investigations of this debilitating and impactful disease.

The effort was lead by EH, and the final paper can be found here on pubmed.

FVIII, Protein C and the Risk of Arterial Thrombosis: More than the Sum of Its Parts.

maxresdefault
source: https://www.youtube.com/watch?v=jGMRLLySc4w 

Peer review is not a pissing contest. Peer reviewing is not about findings the smallest of errors and delay publication because of it. Peer review is not about being right. Peer review is not about rewriting the paper under review. Peer review is not about asking for yet another experiment.

 

Peer review is about making sure that the conclusions presented in the paper are justified by the data presented and peer review is about helping the authors get the best report on what they did.

At least that what I try to remind myself of when I write my peer review report. So what happened when I wrote a peer review about a paper presenting data on the two hemostatic factors protein C and FVIII in relation to arterial thrombosis. These two proteins are known to have a direct interaction with each other. But does this also translate into the situation where a combination of the two risk factors of the “have both, get extra risk for free”?

There are two approaches to test so-called interaction: statistical and biological. The authors presented one approach, while I thought the other approach was better suited to analyze and interpret the data. Did that result in an academic battle of arguments, or perhaps a peer review deadlock? No, the authors were quite civil to entertain my rambling thoughts and comments with additional analyses and results, but convinced me in the end that their approach have more merit in this particular situation. The editor of thrombosis and hemostasis saw this all going down and agreed with my suggestion that an accompanying editorial on this topic to help the readers understand what actually happened during the peer review process. The nice thing about this is that the editor asked me to that editorial, which can be found here, the paper by Zakai et al can be found here.

All this learned me a thing or two about peer review: Cordial peer review is always better (duh!) than a peer review street brawl, and that sharing aspects from the peer review process could help readers understand the paper in more detail. Open peer review, especially the parts where peer review is not anonymous and reports are open to readers after publication, is a way to foster both practices. In the meantime, this editorial will have to do.

 

new paper: pulmonary dysfunction and CVD outcome in the ELSA study

 This is a special paper to me, as this is a paper that is 100% the product of my team at the CSB.Well, 100%? Not really. This is the first paper from a series of projects where we work with open data, i.e. data collected by others who subsequently shared it. A lot of people talk about open data, and how all the data created should be made available to other researchers, but not a lot of people talk about using that kind of data. For that reason we have picked a couple of data resources to see how easy it is to work with data that is initially not collected by ourselves.

It is hard, as we now have learned. Even though the studies we have focussed on (ELSA study and UK understanding society) have a good description of their data and methods, understanding this takes time and effort. And even after putting in all the time and effort you might still not know all the little details and idiosyncrasies in this data.

A nice example lies in the exposure that we used in this analyses, pulmonary dysfunction. The data for this exposure was captured in several different datasets, in different variables. Reverse engineering a logical and interpretable concept out of these data points was not easy. This is perhaps also true in data that you do collect yourself, but then at least these thoughts are being more or less done before data collection starts and no reverse engineering is needed. new paper: pulmonary dysfunction and CVD outcome in the ELSA study

So we learned a lot. Not only about the role of pulmonary dysfunction as a cause of CVD (hint, it is limited), or about the different sensitivity analyses that we used to check the influence of missing data on the conclusions of our main analyses (hint, limited again) or the need of updating an exposure that progresses over time (hint, relevant), but also about how it is to use data collected by others (hint, useful but not easy).

The paper, with the title “Pulmonary dysfunction and development of different cardiovascular outcomes in the general population.” with IP as the first author can be found here on pubmed or via my mendeley profile.

predicting DVT with D-dimer in stroke patients: a rebuttal to our letter

2016-10-09-18_05_33-1-s2-0-s0049384816305102-main-pdf
Some weeks ago, I reported on a letter to the editor of Thrombosis Research on the question whether D-Dimer indeed does improve DVT risk prediction in stroke patients.

I was going to write a whole story on how one should not use a personal blog to continue the scientific debate. As you can guess, I ended up writing a full paragraph where I did this anyway. So I deleted that paragraph and I am going to do a thing that requires some action from you. I am just going to leave you with the links to the letters and let you decide whether the issues we bring up, but also the corresponding rebuttal of the authors, help to interpret the results from the the original publication.

How to set up a research group

A couple of weeks ago I wrote down some thoughts I had while writing a paper for the JTH series on Early Career Researchers. I was asked to write how one sets up a research group, and the four points I described in my previous post can be recognised in the final paper.

But I also added some reading tips in the paper. reading on a particular topic helps me not only to learn what is written in the books, but also to get my mind in a certain mindset. So, when i knew that i was going to take over a research group in Berlin I read a couple of books, both fiction and non fiction. Some where about Berlin (e.g. Cees Nootebooms Berlijn 1989/2009), some were focussed on academic life (e.g. Porterhouse Blue). They help to get my mind in a certain gear to help me prepare of what is going on. In that sense, my bookcase says a lot about myself.

The number one on the list of recommended reads are the standard management best sellers, as I wrote in the text box:

// Management books There are many titles that I can mention here; whether it the best-seller Seven Habits of Highly Effective People or any of the smaller booklets by Ken Blanchard, I am convinced that reading some of these texts can help you in your own development as a group leader. Perhaps you will like some of the techniques and approaches that are proposed and decide to adopt them. Or, like me, you may initially find yourself irritated because you cannot envision the approaches working in the academic setting. If this happens, I encourage you to keep reading because even in these cases, I learned something about how academia works and what my role as a group leader could be through this process of reflection. My absolute top recommendation in this category is Leadership and Self-Deception: a text that initially got on my nerves but in the end taught me a lot.

I really think that is true. You should not only read books that you agree with, or which story you enjoy. Sometimes you can like a book not for its content but the way it makes you question your own preexisting beliefs and habits. But it is true that this sometimes makes it difficult to actually finnish such a book.

Next to books, I am quite into podcasts so I also wrote

// Start up. Not a book, but a podcast from Gimlet media about “what it’s really like to get a business off the ground.” It is mostly about tech start-ups, but the issues that arise when setting up a business are in many ways similar to those you encounter when you are starting up a research group. I especially enjoyed seasons 1 and 3.

I thought about including the sponsored podcast “open for business” from Gimlet Creative, as it touches upon some very relevant aspects of starting something new. But for me the jury is still out on the “sponsored podcast” concept  – it is branded content from amazon, and I am not sure to what extent I like that. For now, i do not like it enough to include it in the least in my JTH-paper.

The paper is not online due to the summer break,but I will provide a link asap.

– update 11.10.2016 – here is a link to the paper. 

 

 

 

 

Does d-dimer really improve DVT prediction in stroke?

369
elsevier.com

Good question, and even though thromboprofylaxis is already given according to guidelines in some countries, I can see the added value of a good discriminating prediction rule. Especially finding those patients with low DVT risk might be useful. But using d-dimer is a whole other question. To answer this, a thorough prediction model needs to be set up both with and without the information of d-dimer and only a direct comparison of these two models will provide the information we need.

In our view, that is not what the paper by Balogun et al did. And after critical appraisal of the tables and text, we found some inconsistencies that prohibits the reader from understanding what exactly was done and which results were obtained. In the end, we decided to write a letter to the editor, especially to prevent that other readers to mistakenly take over the conclusion of the authors. This conclusion, being that “D-dimer concentration with in 48 h of acute stroke is independently associated with development of DVT.This observation would require confirmation in a large study.” Our opinion is that the data from this study needs to be analysed properly to justify such an conclusion. One of the key elements in our letter is that the authors never compare the AUC of the model with and without d-dimer. This is needed as that would provide the bulk of the answer whether or not d-dimer should be measured. The only clue we have are the ORs of d-dimer, which range between 3-4, which is not really impressive when it comes to diagnosis and prediction. For more information on this, please check this paper on the misuse of the OR as a measure of interest for diagnosis/prediction by Pepe et al.

A final thing I want to mention is that our letter was the result of a mini-internship of one of the students at the Master programme of the CSB and was drafted in collaboration with our Virchow scholar HGdH from the Netherlands. Great team work!

The letter can be found on the website of Thrombosis Research as well as on my Mendeley profile.