On the value of data – routinely vs purposefully

I listen to a bunch of podcasts, and the podcast “The Pitch” is one of them. In that podcast, Entrepreneurs of start-up companies pitch their ideas to investors. Not only is it amusing to hear some of these crazy business ideas, but the podcast also help me to understand about professional life works outside of science. One thing i learned is that it is ok if not expected, to oversell by about a factor 142.

Another thing that I learned is the apparent value of data. The value of data seems to be undisputed in these pitches. In fact, the product or service the company is selling or providing is often only a byproduct: collecting data about their users which subsequently can be leveraged for targeted advertisement seems to be the big play in many start-up companies.

I think this type of “value of data” is what it is: whatever the investors want to pay for that type of data is what it is worth. But it got me thinking about the value of data that we actually collect in medical. Let us first take a look at routinely data, which can be very cheap to collect. But what is the value of the data? The problem is that routinely collected data is often incomplete, rife with error and can lead to enormous biases – both information bias as well as selection bias. Still, some research questions can be answered with routinely collected data – as long as you make some real efforts to think about your design and analyses. So, there is value in routinely collected data as it can provide a first glance into the matter at hand.

And what is the case for purposefully collected data? The idea behind this is that the data is much more reliable: trained staff collects data in a standardised way resulting in datasets without many errors or holes. The downside is the “purpose” which often limits the scope and thereby the amount collected data per included individual. this is the most obvious in randomised clinical trials in which often millions of euro’s are spent to answer one single question. Trials often do no have the precision to provide answers to other questions. So it seems that the data can lose it value after answering that single question.

Luckily, many efforts were made to let purposefully collected keep some if its value even after they have served their purpose. Standardisation efforts between trials make it now possible to pool the data and thus obtain a higher precision. A good example from the field of stroke research is the VISTA collaboration, i.e the Virtual International Stroke Trials Archive”. Here, many trials – and later some observational studies – are combined to answer research questions with enough precision that otherwise would never be possible. This way we can answer questions with high quality of purposefully collected data with numbers otherwise unthinkable.

This brings me to a recent paper we published with data from the VISTA collaboration: “Early in-hospital exposure to statins and outcome after intracerebral haemorrhage”. The underlying question whether and when statins should be initiated / continued after ICH is clinically relevant but also limited in scope and impact, so is it justified to start a trial? We took the the easier and cheaper solution and analysed the data from VISTA. We conclude that

… early in-hospital exposure to statins after acute ICH was associated with better functional outcome compared with no statin exposure early after the event. Our data suggest that this association is particularly driven by continuation of pre-existing statin use within the first two days after the event. Thus, our findings provide clinical evidence to support current expert recommendations that prevalent statin use should be continued during the early in-hospital phase.1921


And this shows the limitations of even well collected data from RCT: as long as the exposure of interest is potentially provided to a certain subgroup (i.e. Confounding by indication), you can never really be certain about the treatment effects. To solve this, we would really need to break the bond between exposure and any other clinical characteristic, i.e. randomize. That remains the golden standard for intended effects of treatments. Still, our paper provided a piece of the puzzle and gave more insight, form data that retained some of its value due to standardisation and pooling. But there is no dollar value that we can put on the value of medical research data – routinely or purposefully collected alike- as it all depends on the question you are trying to answer.

Our paper, with JD in the lead, was published last year in the European Stroke Journal, and can be found here as well as on my Publons profile and Mendeley profile.

Intrinsic Coagulation Pathway, History of Headache, and Risk of Ischemic Stroke: a story about interacting risk factors

Yup, another paper from the long-standing collaboration with Leiden. this time, it was PhD candidate HvO who came up with the idea to take a look at the risk of stroke in relation to two risk factors that independently increase the risk. So what then is the new part of this paper? It is about the interaction between the two.

Migraine is a known risk factor for ischemic for stroke in young women. Previous work also indicated that increased levels of the intrinsic coagulation proteins are associated with an increase in ischemic stroke risk. Both roughly double the risk. so what does the combination do?

Let us take a look at the results of analyses in the RATIO study. High levels if antigen levels of coagulation factor FXI are associated with a relative risk of 1.7. A history of severe headache doubles the risk of ischemic stroke. so what can we then expect is both risks just added up? Well, we need to take the standard risk that everybody has into account, which is RR of 1. Then we add the added risk in terms of RR based on the two risk factors. For FXI this is (1.7-1=) 0.7. For headache that is 2.0-1=) 1.0. So we would expect a RR of (1+0.7+1.0=) 2.7. However, we found that the women who had both risk factors had a 5-fold increase in risk, more than what can b expected.

For those who are keeping track, I am of course talking about additive interaction or sometimes referred to biological interaction. this concept is quite different from statistical interaction which – for me – is a useless thing to look at when your underlying research is of a causal nature.

What does this mean? you could interpret this that some women only develop the disease because they are exposed to both risk factors. IN some way, that combination becomes a third ‘risk entity’ that increases the risk in the population. How that works on a biochemical level cannot be answered with this epidemiological study, but some hints from the literature do exist as we discuss in our paper

Of course, some notes have to be taken into account. In addition to the standard limitations of case-control studies, two things stand out: because we study the combination of two risk factors, the precision of our study is relatively low. But then again, what other study is going to answer this question? The absolute risk of ischemic stroke is too low in the general population to perform prospective studies, even when enriched with loads of migraineurs. Another thing that is suboptimal is that the questionnaires used do not allow to conclude that the women who report severe headache actually have a migraine. Our assumption is that many -if not most- do. even though mixing ‘normal’ headaches with migraines in one group would only lead to an underestimation of the true effect of migraine on stroke risk, but still, we have to be careful and therefore stick to the term ‘headache’.

HvO took the lead in this project, which included two short visits to Berlin supported by our Virchow scholarship. The paper has been published in Stroke and can be seen ahead of print on their website.

medRxiv: the pre-print server for medicine

Pre-print servers are a place to place share your academic work before actual peer review and subsequent publication. They are not so new completely new to academia, as many different disciplines have adopted pre-print servers to quickly share ideas and keep the academic discussion going. Many have praised the informal peer-review that you get when you post on pre-print servers, but I primarily like the speed.

But medicine is not one of those disciplines. Up until recently, the medical community had to use bioRxiv, a pre-print server for biology. Very unsatisfactory; as the fields are just too far apart, and the idiosyncrasies of the medical sciences bring some extra requirements. (e.g. ethical approval, trial registration, etc.). So here comes medRxiv, from the makers of bioRxiv with support of the BMJ. Let’s take a moment to listen to the people behind medRxiv to explain the concept themselves.

source: https://www.medrxiv.org/content/about-medrxiv

I love it. I am not sure whether it will be adopted by the community at the same space as some other disciplines have, but doing nothing will never be part of the way forward. Critical participation is the only way.

So, that’s what I did. I wanted to be part of this new thing and convinced with co-authors for using the pre-print concept. I focussed my efforts on the paper in which we describe the BeLOVe study. This is a big cohort we are currently setting up, and in a way, is therefore well-suited for pre-print servers. The pre-print servers allow us to describe without restrictions in word count, appendices or tables and graphs to describe what we want to the level of detail of our choice. The speediness is also welcome, as we want to inform the world on our effects while we are still in the pilot phase and are still able to tweak the design here or there. And that is actually what happened: after being online for a couple of days, our pre-print already sparked some ideas by others.

Now we have to see how much effort it took us, and how much benefit w drew from this extra effort. It would be great if all journals would permit pre-prints (not all do…) and if submitting to a journal would just be a “one click’ kind of effort after jumping through the hoops for the medRxiv.

This is not my first pre-print. For example, the paper that I co-authored on the timely publication of trials from Germany was posted on biorXiv. But being the guy who actually uploads the manuscript is a whole different feeling.

Messy epidemiology: the tale of transient global amnesia and three control groups

Clinical epidemiology is sometimes messy. The methods and data that you might want to use might not be available or just too damn expensive. Does that mean that you should throw in the towel? I do not think so.

I am currently working in a more clinical oriented setting, as the only researcher trained as a clinical epidemiologist. I could tell about being misunderstood and feeling lonely as the only who one who has seen the light, but that would just be lying. The fact is that my position is one privilege and opportunity, as I work with many different groups together on a wide variety of research questions that have the potential to influence clinical reality directly and bring small, but meaningful progress to the field.

Sometimes that work is messy: not the right methods, a difference in interpretation, a p value in table 1… you get the idea. But sometimes something pretty comes out of that mess. That is what happened with this paper, that just got published online (e-pub) in the European Journal of Neurology.  The general topic is the heart brain interaction, and more specifically to what extent damage to the heart actually has a role in transient global amnesia. Now, the idea that there might be a link is due to some previous case series, as well as the clinical experience of some of my colleagues. Next step would of course to do a formal case control-study, and if you want to estimate true measure of rate ratios, a lot effort has to go into the collection of data from a population based control group. We had neither time nor money to do so, and upon closer inspection, we also did not really need that clean control group to answer some of our questions that would progress to the field.

So instead, we chose three different control groups, perhaps better referred as reference groups, all three with some neurological disease. Yes, there are selections at play for each of these groups, but we could argue that those selections might be true for all groups. If these selection processes are similar for all groups, strong differences in patient characteristics of biomarkers suggest that other biological systems are at play. The trick is not to hide these limitations, but as a practiced judoka, leverage these weaknesses and turn them into a strengths. Be open about what you did, show the results, so that others can build on that experience.

So that is what we did. Compared patients with migraine with aura, vestibular neuritis and transient ischemic attack, patients with transient global amnesia are more likely to exhibitsigns of myocardial stress. This study was not designed – nor will if even be able to – understand the cause of this link, not do we pretend that our odds ratios are in fact estimates of rate ratios or something fancy like that. Still, even though many aspects of this study are not “by the book”, it did provide some new insights that help further thinking about and investigations of this debilitating and impactful disease.

The effort was lead by EH, and the final paper can be found here on pubmed.

Impact of your results: Beyond the relative risk

I wrote about this in an earlier topic: JLR and I published a paper in which we explain that a single relative risk, irrespective of its form, is jus5t not enough. Some crucial elements go missing in this dimensionless ratio. The RR could allow us to forget about the size of the denominator, the clinical context, the crude binary nature of the outcome. So we have provided some methods and ways of thinking to go beyond the RR in an tutorial published in RPTH (now in early view). The content and message are nothing new for those trained in clinical research (one would hope). Even for those without a formal training most concepts will have heard the concepts discussed in a talk or poster . But with all these concepts in one place, with an explanation why they provide a tad more insight than the RR alone, we hope that we will trigger young (and older) researchers to think whether one of these measures would be useful. Not for them, but for the readers of their papers. The paper is open access CC BY-NC-ND 4.0, and can be downloaded from the website of RPTH, or from my mendeley profile.  

Advancing prehospital care of stroke patients in Berlin: a new study to see the impact of STEMO on functional outcome

There are strange ambulances driving around in Berlin. They are the so-called STEMO cars, or Stroke Einsatz Mobile, basically driving stroke units. They have the possibility to make a CT scan to rule out bleeds and subsequently start thrombolysis before getting to the hospital. A previous study showed that this descreases time to treatment by ~25 minutes. The question now is whether the patients are indeed better of in terms of functional outcome. For that we are currently running the B_PROUD study of which we recently published the design here.

Virchow’s triad and lessons on the causes of ischemic stroke

I wrote a blog post for BMC, the publisher of Thrombosis Journal in order to celebrate blood clot awareness month. I took my two favorite subjects, i.e. stroke and coagulation, and I added some history and voila!  The BMC version can be found here.

When I look out of my window from my office at the Charité hospital in the middle of Berlin, I see the old pathology building in which Rudolph Virchow used to work. The building is just as monumental as the legacy of this famous pathologist who gave us what is now known as Virchow’s triad for thrombotic diseases.

In ‘Thrombose und Embolie’, published in 1865, he postulated that the consequences of thrombotic disease can be attributed one of three categories: phenomena of interrupted blood flow, phenomena associated with irritation of the vessel wall and its vicinity and phenomena of blood coagulation. This concept has now been modified to describe the causes of thrombosis and has since been a guiding principle for many thrombosis researchers.

The traditional split in interest between arterial thrombosis researchers, who focus primarily on the vessel wall, and venous thrombosis researchers, who focus more on hypercoagulation, might not be justified. Take ischemic stroke for example. Lesions of the vascular wall are definitely a cause of stroke, but perhaps only in the subset of patient who experience a so called large vessel ischemic stroke. It is also well established that a disturbance of blood flow in atrial fibrillation can cause cardioembolic stroke.

Less well studied, but perhaps not less relevant, is the role of hypercoagulation as a cause of ischemic stroke. It seems that an increased clotting propensity is associated with an increased risk of ischemic stroke, especially in the young in which a third of main causes of the stroke goes undetermined. Perhaps hypercoagulability plays a much more prominent role then we traditionally assume?

But this ‘one case, one cause’ approach takes Virchow’s efforts to classify thrombosis a bit too strictly. Many diseases can be called multi-causal, which means that no single risk factor in itself is sufficient and only a combination of risk factors working in concert cause the disease. This is certainly true for stroke, and translates to the idea that each different stroke subtype might be the result of a different combination of risk factors.

If we combine Virchow’s work with the idea of multi-causality, and the heterogeneity of stroke subtypes, we can reimagine a new version of Virchow’s Triad (figure 1). In this version, the patient groups or even individuals are scored according to the relative contribution of the three classical categories.

From this figure, one can see that some subtypes of ischemic stroke might be more like some forms of venous thrombosis than other forms of stroke, a concept that could bring new ideas for research and perhaps has consequences for stroke treatment and care.

Figure 1. An example of a gradual classification of ischemic stroke and venous thrombosis according to the three elements of Virchow’s triad.

However, recent developments in the field of stroke treatment and care have been focused on the acute treatment of ischemic stroke. Stroke ambulances that can discriminate between hemorrhagic and ischemic stroke -information needed to start thrombolysis in the ambulance-drive the streets of Cleveland, Gothenburg, Edmonton and Berlin. Other major developments are in the field of mechanical thrombectomy, with wonderful results from many studies such as the Dutch MR CLEAN study. Even though these two new approaches save lives and prevent disability in many, they are ‘too late’ in the sense that they are reactive and do not prevent clot formation.

Therefore, in this blood clot awareness month, I hope that stroke and thrombosis researchers join forces and further develop our understanding of the causes of ischemic stroke so that we can Stop The Clot!