On the value of data – routinely vs purposefully

I listen to a bunch of podcasts, and the podcast “The Pitch” is one of them. In that podcast, Entrepreneurs of start-up companies pitch their ideas to investors. Not only is it amusing to hear some of these crazy business ideas, but the podcast also help me to understand about professional life works outside of science. One thing i learned is that it is ok if not expected, to oversell by about a factor 142.

Another thing that I learned is the apparent value of data. The value of data seems to be undisputed in these pitches. In fact, the product or service the company is selling or providing is often only a byproduct: collecting data about their users which subsequently can be leveraged for targeted advertisement seems to be the big play in many start-up companies.

I think this type of “value of data” is what it is: whatever the investors want to pay for that type of data is what it is worth. But it got me thinking about the value of data that we actually collect in medical. Let us first take a look at routinely data, which can be very cheap to collect. But what is the value of the data? The problem is that routinely collected data is often incomplete, rife with error and can lead to enormous biases – both information bias as well as selection bias. Still, some research questions can be answered with routinely collected data – as long as you make some real efforts to think about your design and analyses. So, there is value in routinely collected data as it can provide a first glance into the matter at hand.

And what is the case for purposefully collected data? The idea behind this is that the data is much more reliable: trained staff collects data in a standardised way resulting in datasets without many errors or holes. The downside is the “purpose” which often limits the scope and thereby the amount collected data per included individual. this is the most obvious in randomised clinical trials in which often millions of euro’s are spent to answer one single question. Trials often do no have the precision to provide answers to other questions. So it seems that the data can lose it value after answering that single question.

Luckily, many efforts were made to let purposefully collected keep some if its value even after they have served their purpose. Standardisation efforts between trials make it now possible to pool the data and thus obtain a higher precision. A good example from the field of stroke research is the VISTA collaboration, i.e the Virtual International Stroke Trials Archive”. Here, many trials – and later some observational studies – are combined to answer research questions with enough precision that otherwise would never be possible. This way we can answer questions with high quality of purposefully collected data with numbers otherwise unthinkable.

This brings me to a recent paper we published with data from the VISTA collaboration: “Early in-hospital exposure to statins and outcome after intracerebral haemorrhage”. The underlying question whether and when statins should be initiated / continued after ICH is clinically relevant but also limited in scope and impact, so is it justified to start a trial? We took the the easier and cheaper solution and analysed the data from VISTA. We conclude that

… early in-hospital exposure to statins after acute ICH was associated with better functional outcome compared with no statin exposure early after the event. Our data suggest that this association is particularly driven by continuation of pre-existing statin use within the first two days after the event. Thus, our findings provide clinical evidence to support current expert recommendations that prevalent statin use should be continued during the early in-hospital phase.1921

link

And this shows the limitations of even well collected data from RCT: as long as the exposure of interest is potentially provided to a certain subgroup (i.e. Confounding by indication), you can never really be certain about the treatment effects. To solve this, we would really need to break the bond between exposure and any other clinical characteristic, i.e. randomize. That remains the golden standard for intended effects of treatments. Still, our paper provided a piece of the puzzle and gave more insight, form data that retained some of its value due to standardisation and pooling. But there is no dollar value that we can put on the value of medical research data – routinely or purposefully collected alike- as it all depends on the question you are trying to answer.

Our paper, with JD in the lead, was published last year in the European Stroke Journal, and can be found here as well as on my Publons profile and Mendeley profile.

Kuopio Stroke Symposium

Kuopio in summer

Every year there is a Neurology symposium organized in the quiet and beautiful town of Kuopio in Finland. Every three years, just like this year, the topic is stroke and for that reason, I was invited to be part of the faculty. A true honor, especially if you consider the other speakers on the program who all delivered excellent talks!

But these symposia are much more than just the hard cold science and prestige. It is also about making new friends and reconnecting with old ones. Leave that up to the Fins, whose decision to get us all on a boat and later in a sauna after a long day in the lecture hall proved to be a stroke of genius.

So, it was not for nothing that many of the talks boiled down to the idea that the best science is done with friends – in a team. This is true for when you are running a complex international stroke rehabilitation RCT, or you are investigating whether the lower risk in CVD morbidity and mortality amongst frequent sauna visitors. Or, in my case, about the role of hypercoagulability in young stroke – pdf of my slides can be found here –

New paper: Contribution of Established Stroke Risk Factors to the Burden of Stroke in Young Adults

2017-06-16 09_26_46-Contribution of Established Stroke Risk Factors to the Burden of Stroke in Young2017-06-16 09_25_58-Contribution of Established Stroke Risk Factors to the Burden of Stroke in Young

Just a relative risk is not enough to fully understand the implications of your findings. Sure, if you are an expert in a field, the context of that field will help you to assess the RR. But if ou are not, the context of the numerator and denominator is often lost. There are several ways to work towards that. If you have a question that revolves around group discrimination (i.e. questions of diagnosis or prediction) the RR needs to be understood in relation to other predictors or diagnostic variables. That combination is best assessed through the added discriminatory value such as the AUC improvement or even more fancy methods like reclassification tables and net benefit indices. But if you are interested in are interested in a single factor (e.g. in questions of causality or treatment) a number needed to treat (NNT) or the Population Attributable Fraction can be used.

The PAF has been subject of my publications before, for example in these papers where we use the PAF to provide the context for the different OR of markers of hypercoagulability in the RATIO study / in a systematic review. This paper is a more general text, as it is meant to provide in insight for non epidemiologist what epidemiology can bring to the field of law. Here, the PAF is an interesting measure, as it has relation to the etiological fraction – a number that can be very interesting in tort law. Some of my slides from a law symposium that I attended addresses these questions and that particular Dutch case of tort law.

But the PAF is and remains an epidemiological measure and tells us what fraction of the cases in the population can be attributed to the exposure of interest. You can combine the PAF to a single number (given some assumptions which basically boil down to the idea that the combined factors work on an exact multiplicative scale, both statistically as well as biologically). A 2016 Lancet paper, which made huge impact and increased interest in the concept of the PAF, was the INTERSTROKE paper. It showed that up to 90% of all stroke cases can be attributed to only 10 factors, and all of them modifiable.

We had the question whether this was the same for young stroke patients. After all, the longstanding idea is that young stroke is a different disease from old stroke, where traditional CVD risk factors play a less prominent role. The idea is that more exotic causal mechanisms (e.g. hypercoagulability) play a more prominent role in this age group. Boy, where we wrong. In a dataset which combines data from the SIFAP and GEDA studies, we noticed that the bulk of the cases can be attributed to modifiable risk factors (80% to 4 risk factors). There are some elements with the paper (age effect even within the young study population, subtype effects, definition effects) that i wont go into here. For that you need the read the paper -published in stroke- here, or via my mendeley account. The main work of the work was done by AA and UG. Great job!

Advancing prehospital care of stroke patients in Berlin: a new study to see the impact of STEMO on functional outcome

There are strange ambulances driving around in Berlin. They are the so-called STEMO cars, or Stroke Einsatz Mobile, basically driving stroke units. They have the possibility to make a CT scan to rule out bleeds and subsequently start thrombolysis before getting to the hospital. A previous study showed that this descreases time to treatment by ~25 minutes. The question now is whether the patients are indeed better of in terms of functional outcome. For that we are currently running the B_PROUD study of which we recently published the design here.

The paradox of the BMI paradox

2016-10-19-17_52_02-physbe-talk-bs-pdf-adobe-reader

I had the honor to be invited to the PHYSBE research group in Gothenburg, Sweden. I got to talk about the paradox of the BMI paradox. In the announcement abstract I wrote:

“The paradox of the BMI paradox”
Many fields have their own so-called “paradox”, where a risk factor in certain
instances suddenly seems to be protective. A good example is the BMI paradox,
where high BMI in some studies seems to be protective of mortality. I will
argue that these paradoxes can be explained by a form of selection bias. But I
will also discuss that these paradoxes have provided researchers with much
more than just an erroneous conclusion on the causal link between BMI and
mortality.

I first address the problem of BMI as an exposure. Easy stuff. But then we come to index even bias, or collider stratification bias. and how selections do matter in a recurrence research paradox -like PFO & stroke- or a health status research like BMI- and can introduce confounding into the equation.

I see that the confounding might not be enough to explain all that is observed in observational research, so I continued looking for other reasons there are these strong feelings on these paradoxes. Do they exist, or don’t they?I found that the two sides tend to “talk in two worlds”. One side talks about causal research and asks what we can learn from the biological systems that might play a role, whereas others think with their clinical  POV and start to talk about RCTs and the need for weight control programs in patients. But there is huge difference in study design, RQ and interpretation of results between the studies that they cite and interpret. Perhaps part of the paradox can be explained by this misunderstanding.

But the cool thing about the paradox is that through complicated topics, new hypothesis , interesting findings and strong feelings about the existence of paradoxes, I think that the we can all agree: the field of obesity research has won in the end. and with winning i mean that the methods are now better described, better discussed and better applied. New hypothesis are being generated and confirmed or refuted. All in all, the field makes progress not despite, but because the paradox. A paradox that doesn’t even exist. How is that for a paradox?

All in all an interesting day, and i think i made some friends in Gothenburg. Perhaps we can do some cool science together!

Slides can be found here.

Does d-dimer really improve DVT prediction in stroke?

369
elsevier.com

Good question, and even though thromboprofylaxis is already given according to guidelines in some countries, I can see the added value of a good discriminating prediction rule. Especially finding those patients with low DVT risk might be useful. But using d-dimer is a whole other question. To answer this, a thorough prediction model needs to be set up both with and without the information of d-dimer and only a direct comparison of these two models will provide the information we need.

In our view, that is not what the paper by Balogun et al did. And after critical appraisal of the tables and text, we found some inconsistencies that prohibits the reader from understanding what exactly was done and which results were obtained. In the end, we decided to write a letter to the editor, especially to prevent that other readers to mistakenly take over the conclusion of the authors. This conclusion, being that “D-dimer concentration with in 48 h of acute stroke is independently associated with development of DVT.This observation would require confirmation in a large study.” Our opinion is that the data from this study needs to be analysed properly to justify such an conclusion. One of the key elements in our letter is that the authors never compare the AUC of the model with and without d-dimer. This is needed as that would provide the bulk of the answer whether or not d-dimer should be measured. The only clue we have are the ORs of d-dimer, which range between 3-4, which is not really impressive when it comes to diagnosis and prediction. For more information on this, please check this paper on the misuse of the OR as a measure of interest for diagnosis/prediction by Pepe et al.

A final thing I want to mention is that our letter was the result of a mini-internship of one of the students at the Master programme of the CSB and was drafted in collaboration with our Virchow scholar HGdH from the Netherlands. Great team work!

The letter can be found on the website of Thrombosis Research as well as on my Mendeley profile.

 

The ECTH 2016 in The Hague

My first conference experience (ISTH 2008, Boston) got me hooked on science. All these people doing the same thing, speaking the same language, and looking to show and share their knowledge. This is true when you are involved in the organisation. Organising the international soccer match at the Olympic stadium in Amsterdam linked to the ISTH 2013 to celebrate the 25th anniversary of the NVTH was fun. But lets not forget the exciting challenge of organising the WEON 2014.

And now, the birth of a new conference, the European Congress of Thrombosis and Hemostasis, which will be held in The Hague in Netherlands (28-30 sept 2016). I am very excited for several reasons: First of all, this conference will fill in the gap of the bi-annual ISTH conferences. Second, I have the honor to help out as the chair of the junior advisory board. Third, the Hague! My old home town!

So, we have 10 months to organise some interesting meetings and activities, primary focussed on the young researchers. Time to get started!