MiRoR project highlight – Technology Assisted Reviews of Diagnostic Test Accuracy
Artificial intelligence has seen huge advances over the last decade or so, and many tasks which once required human intelligence can today be automated. However, these advances have so far seen little to no adoption in systematic reviews, and much of the process remains manual, time-consuming, and expensive.
In this project (ESR 12 – Text mining for the systematic survey of diagnostic tests in published or unpublished literature) we are investigating ways to reduce the workload by providing technological assistance as well as decision support to systematic review writers. Our intent is not to replace the conventional process with an automated one, nor to obsolete human screeners, but to provide tools that can be used to make the process more efficient as well as more efficacious.
So far the consensus has been that we are stuck with exhaustive manual screening since we cannot change the process unless the replacement guarantees perfect retrieval. Yet we humans too have our weaknesses. And while a machine may harbor biases through its design or conception, it will never tire, it will never grow bored, and it will never get distracted. The same cannot be said about any human screener. It is common to view automation as a trade-off between effort and quality, yet technological assistance, used judiciously, could well improve the quality of future systematic reviews, by enabling screeners to work more efficiently.
Our work is not without precedence. In the legal domain, the document discovery phase that precede trials was once conducted by exhaustive manual review, but has since been moving towards using technology assisted, non-exhaustive reviews. This can reduce the workload by more than 99%, although this may be accompanied by a reduction in sensitivity by as much as 30%. Missing more than one in four relevant articles might be a bit hard to stomach, and the insights gained in the domain of law might thus not be immediately transferable to systematic reviews in evidence based medicine, unless we first reduce this loss of sensitivity. However, this may be less of a problem for e.g. living systematic reviews, where missed references can be added later.
Furthermore, the authors of a technology assisted review generally only have to screen a few hundred articles to identify the majority of the relevant studies, while their colleagues doing a conventional review might have to screen tens of thousands. Even if both end up screening all candidate references to ensure that nothing is missed, the technology assisted reviewer could start thinking about the meta-analysis earlier, potentially decreasing the delay from review inception to conclusion.
How does the technology assisted reviewer know when the amount of evidence gathered is sufficient to start the meta-analysis? Here, automated decision support could estimate how many relevant articles remain among the candidates yet to screen, and the impact the still missing evidence would have on the review conclusions.
Our project is multidisciplinary, and I am working closely with experts both from text mining and information retrieval, as well as experts in epidemiology and library sciences. As for me, my main research interest is in applications of text mining on technical and scientific literature, but this is my first exposure to biomedical research.
Through this project I get to practice text mining and computer science through our work on applications of computational methods in secondary research, but through my secondments and through our collaborations I also get to appreciate how secondary research is conventionally performed, and how computational methods could fit into the workflow. This kind of domain knowledge, I think, is necessary for us to construct methods that remain useful and relevant to their intended users.
by Christopher Norman, MiRoR PhD Fellow at the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI) – CNRS (France)