Systematic Review Consultants LTD

Chaptered Recording: The Past, Present, and Futures of Pragmatic and Responsible Adoption of AI in Systematic Reviews

***Timeline of automation of systematic reviews from 1991 to 2026***

Here is the list of sections so you can choose which parts to listen to/see. Select the time tag from below to listen to your preferred section(s).

Thank you for supporting the future content by subscribing to the YouTube channel.

I do my best to use Chaptering for all my YouTube videos so as to save time for ~1000 subscribers.
Some complain about YouTube ads; I found Adblock Plus and AdBlock extensions, as well as VPNs,to be very good at blocking all ads.
If you have any comments or questions, please add them under the video, and I will definitely reply.

00:00 Introduction by Professor Angèle Gayet-Ageron

03:42 Start of lecture and declaring conflict of interests

04:24 Systematic Reviewing as Process vs Systematic Review as Product

08:24 Classification of (Semi)Automation and AI in Systematic Review Context: Rule-based, machine learning (active learning, classifiers), predictive AI, discriminative AI, generative AI, LLMs, and agentic AI

10:26 Welcomers of Automation: Librarians & Statisticians

11:23 Timeline of automation of systematic reviews from 1991 to 2026

17:28 LEMASyR Map and BIMS-ARINES Newsletter

17:51 Number of papers published on automation of SR and LLMs/GenAI

18:37 Systematic review steps targeted by the automation

19:12 Organizations’ actions when facing the challenge; organizations with released documents related to the use of AI in Evidence Synthesis

19:25 Who reads the documents released by the organizations? From PRISMA and GRADE to the RAISE: Responsible Use of AI in Evidence Synthesis

20:17 LLMs used in the automation of systematic review steps

21:13 Promping, Prompt Engineering, Prompting Framework, Prompt Libraries, and Fixed Models

22:28 Prompting features and structure in Elicit (Columns) and Nested Knowledge (Tags)

23:55 Prompting Development and Testing Process (Prompt Engineering)

24:13 Large Language Models’ Features and Facets

24:50 Factors affecting the performance of LLMs

25:09 Issues and problems reported in the literature when using large language models in systematic reviews

25:44 Use case of LLMs in Protocol Development: Refining Research Question, and Exploration & Scoping

26:16 Undermind as an example for Refining Research Question

26:54 Classification of Search in Systematic Review Context: Exploratory, Scoping, Supplementary, Systematic Searching (list of databases and synonyms/keywords), and Top-Up and Update Searches

29:39 What information sources are behind LLM-based tools? Knowledge Graphs: Semantic Scholar, OpenAlex, and Lens.org

30:20 Retrieval-Augmented Generation (RAG) + Knowledge Graph = Safety RAG

31:05 Screening: Active Learning (Relevancy Ranking, Priority Screening, and Stopping Rules or Switching Points), Classifiers, PICO Data Extraction in Elicit

32:20 Active Machine Learning can have issues in the relevancy ranking of records without an abstract or non-standard records (Rayyan)

33:06 Active Machine Learning (Relevancy Ranking) in Covidence

33:12 Priority Screening with Active Learning in ASReview and Stopping Rule or Switching Point: Systematic Review vs Scoping Review or Mapping Review

34:00 Stopping Rule Classification (Switching Point and Switch Back)

34:41 Classifiers

34:53 RCT Classifier in Covidence and Classifiers in EPPI-Reviewer

36:01 Classification of LLMs’ Errors in Data Extraction for Systematic Reviews

40:26 Preventing/Detecting LLM Errors in Data Extraction

42:24 High-Risk Data for Data Extraction using LLM

43:17 Explainability (Seeing Is Believing = SIB) in PICO Portal, DistillerSR, Nested Knowledge, Laser AI, and Elicit

45:04 Data Extraction from Images, Figures, and Tables in PICO Portal, Nested Knowledge, and Elicit

45:33 Meta-Analysis in ChatGPT 4.5 vs ChatGPT 5.5 Extended (Function Calling, Chain-of-Thoughts)

46:58 Should I use AI in Systematic Reviews?

48:52 What AI tool to use for systematic reviews? AI (ML, LLM) features within existing SR tools, Assistive Use

49:10 Futures

50:14 Iterative & Living SR in Elicit

51:03 Golden Rule in Using LLMs

51:23 Golden Rule in Evaluation of LLMs

51:34 Golden Rule In Developing New AI Tools

51:53 To Do Homework

52:39 QA Start. Minab 168, Makan Nasiri

53:00 Question 1: Can AI help in reducing the Publication Bias?

53:53 Question 2: What can we learn from the classification of errors in data extraction? Why is it important to know why AI gets it wrong?

56:59 Question 3: EPPI-Reviewer can do automatic update searches using OpenAlex. How about a baseline search?

58:58 Question 4: Going directly from searching to data extraction means you are replacing/skipping the screening. Screening as Classification or Data Extraction Problem

01:03:07 Question 5: Tools highlight PDFs. Not all these PDFs are CC-BY. Is there a problem with uploading copyrighted paywalled PDFs to LLMs?

01:06:16 Question 6: Safety RAG solved the hallucination problem. What about problems of missing studies?

01:08:48 Question 7: You referred to switching points (stopping rules). What tools support it?

01:11:04 Question 8: How effective is it to use Elicit for data extraction?

01:13:36 Question 9: Are there any journal policies on using AI for systematic reviews?

01:15:54 Statement: AI is like a tool or medicine & has to be tested in empirical studies to determine which tool is suitable for which task and what the risks are. Humanities (ethics and law) are two decades behind science &technology.

The Past, Present, and Futures of Pragmatic and Responsible Adoption of AI in Systematic Reviews

This will be Farhad’s only main session this year, and we hope you can make it. Registration and attendance link is below:

Register: https://www.ispm.unibe.ch/continuing_education/the_bern_lectures_in_health_science/the_past_present_and_futures_of_pragmatic_and_responsible_adoption_of_ai_in_systematic_reviews/index_eng.html

Attend: https://tobira.unibe.ch/!v/JMyvgft2TYG

Classification of LLM Errors in Data Extraction for Systematic Reviews and Factors Affecting the LLM-Based Tool’s Performance

Generated by Chatting GPT-5.5 Thinking and OpenAI’s ChatGPT image-generation tool; Generation ID: `77e285ae-b8e3-456a-a0c5-44e16e08bf3b`
Date generated: May 7, 2026

It’s been a long time, but I’m busy, so I decided to write fewer, more useful posts.

LLMs have targeted the most complex task in systematic reviewing: Data Extraction (Data Abstraction, Data Charting).

RAG

Progress in Retrieval-Augmented Generation (RAG) came with two benefits:

RAG enabled agentic search tools, such as Undermind and Elicit, that use LLMs to access Knowledge Graphs such as Semantic Scholar, OpenAlex, and Lens.org. RAG + Knowledge Graph = Safety RAG. With Safety RAG, hallucinations are pushed to zero. This means no made-up references, but expect duplicates and the same reference mentioned and cited as two or three different references. I’m not talking about this here.
LLMs can pay attention to the uploaded/shared document and answer questions or extract data from it, rather than all over the place. This is the feature I focus on here.

Great progress, even exciting, but keep your glasses on!

Factors Affecting the Quality of Data Extraction using LLMs in Systematic Reviews

LLM-based tools have become very good at extracting data, and they keep getting better. Their performance, however we measure it, depends on many factors, just briefly:

Factors related to LLM

Training data quality
Training data quantity
Training data diversity
Training data contamination
Model’s size (parameters)
Architecture of the model
Number of parameters
Context window size
LLM’s version
LLM’s mode (thinking/reasoning)
LLM’s personality
Fine-tuning of the model
Temperature setting
Nucleus sampling
Frequency penalties
GPU/TPU VRAM and bandwidth
Quantisation
Using models specific to medicine/biology
Customisation of the model
Memory enabling
KV caching
Batching
Use of RAG
Use of Model Context Protocol (MCP)
Use of skills
Use of multimodal language models
Choice of evaluation metric
Use of human-in-the-loop
…

Factors related to prompting

Length of prompts
Iterations/testing (iterative prompting)
Clarity and precision
Following a framework
Few-Shot examples
Type of prompts (zero-shot, chain-of-thought, hybrid, negative, etc.)
Use of prompt chaining
Prompt engineer’s experience
…

Factors related to the quality of the report

Report/PDF/File can be uploaded or accessed through API

Length of the file
Quality of the file
Format of the file
Containing image/table/text
The method used for creating the PDF or file format
Copyright or technical limitation set on the file (blocking AI or readability)
Complexity of methods, study design, and data
…

Factors related to the interface

Local version (and local hardware)
API use
Web UI
…

Factors related to the requested data

Data for binary outcome vs continuous outcome
Numerical vs textual data
Export format requested
Source of data: text, table, image, CSV, etc.
Length of data requested
Number of data points requested
…

So, I just tried to give an impression of how complex things can be; however, I’m illiterate in the field of AI, and I cannot expect any systematic review to focus on technical details. What we can do is to check the output and detect the error as Auditors. So, AI Error Vigilancy is the skill to have. Here are some of the errors I have detected, and I hope they help you:

Classification of LLM-Based Tools’ Errors in Data Extraction

However you use LLMs, as a chatbot or embedded within an established systematic review automation tool, you must take into account the possibility of errors. Here, I try to classify the types of errors you should have in mind when checking the data:

Made-up (Hallucinations): The data/information does not exist. Some AI tools can now report “Not Available,” “Not Reported,” or “Not Found” to avoid hallucinations.

Missed: The data is there, but for many reasons, AI missed it (Not Retrieved). The quality of the PDF, the readability of the image, or the PDF-to-text or image conversion process or algorithm may contribute to missing data. AI may report them as “Not Available,” “Not Reported,” or “Not Found” by mistake.

Misplaced/Misallocated: The data was collected correctly, but placed in the wrong data-extraction field.

Mislocated: The data has been collected from the wrong section of the paper. Rather than collecting the prevalence of disease from the Results section, the AI has collected it from the Introduction or Discussion sections.

Misreported (Inherited Errors): The data for the same field has been reported poorly, inconsistently, or, paradoxically, in the original report. AI would usually report the first occurrence of the data, typically from the abstract, and ignore inconsistent or paradoxical reporting of the same data.

Misread/Misconverted: The data is partially collected (incomplete; Selectivity Error), collected incorrectly, or misread due to technical problems, such as readability issues (OCR and Computer Vision) with PDFs or the text converted from the PDF. The risk factors for this error are older files, PDF converted to text, data in tables and figures, or data in image-based tables/figures. This error occurs more with numerical data.

Misinterpreted/Miscalculated/Misleading (The Best Guess Error): AI misinterpreted the data. It is likely that when we ask for change-from-baseline data for an outcome, and it is not readily available, AI may miscalculate it or interpret the wrong data (Endpoint or Baseline) as the correct one. Interpreting SE as SD is also possible. Such calculations may need a protocol/method to follow, and AI may or may not have it. For example, calculating the mean and SD from the median and IQR may follow different methods depending on the sample size. Another example is taking the affiliation country as the country of study.

Misprioritised: If there are multiple formats/calculations for the same data (crude vs adjusted, mean and standard deviation (SD) vs median and IQR), AI may collect the data that is not a priority and ignore the other occurrences.

Miscollected (Multiplicity Error): The risk-of-bias data should be collected from all sections of the paper, not just the methods section. Other data may be collected from the paper, its appendices (in multiple formats, e.g., XLSX, PPT, JPG, CSV, ZIP, etc.), or multiple reports (papers/PDFs) of the same study. Ignoring multiplicity can miss data or lead to incomplete data or risk-of-bias assessment.

Deep Error: If the collected data is not from the report in hand, but from a reference cited within the report. I called it Deep Error. It’s a comeback to those tools that claim to have Deep Search and Deep Research capabilities! Sometimes, they go too deep. I’m looking for a better name for this error: Source Hierarchy Error, Secondary Source Error, Indirect Citation Error, or Deep Citation Error.

Adequacy: If the entered data is accurate, summative, and information (not too little, not too much, but just enough).

Standardisation Error: While the prompt may specify a standard format for the field, the AI may not adhere to it.

Conclusion

Does AI Save or Take Time?

I’m not being cynical; most research reports an accuracy of 50% to 90%, so AI is helpful. But sometimes, finding those inaccuracies can take even more time than data extraction itself; spot the irony! If you deal with 10 studies, deal with it, don’t be lazy! After you finish, ask AI to do it, check your work against AI’s and publish it as a blog or a paper.

Explainability or Seeing Is Believing (SIB)

Advice: If you have to use LLMs, either use tools that let you check each piece of collected data by clicking it and seeing its location in the report, or open the report and double-check it. While such action cannot mitigate all risk or resolve all errors, it can help reviewers realise their responsibility and keep them alert to errors.

Please share your thoughts and let me improve this post; I will acknowledge any contributions.

If you liked this blog post, please subscribe to our newsletter.

Semi-Automation of Searching and Screening in Systematic Reviews — Recorded and Chaptered on YouTube

I was invited to teach the searching and screening sessions in the High-Efficiency Systematic Reviews (with Ethical AI Use) workshop. The workshop ran for a few days and, naturally,was not free; however, the convenor agreed that I could share the content with a delay on YouTube.

Both videos are chaptered, so you can skip right to the part you want to listen/see, and I provide a brief outline below:

Session 2: Semi-Automation of Search in Systematic Reviews

AUDIENCE: First-Time Systematic Reviewers, Undergraduate, Post-Graduate, Master’s, and PhD Students, and Doctors, General Practitioners (GPs), Nurses, Paramedical and Allied Health Practitioners (AHPs)

Types of Automation
Types of Search
Systematic Search Steps
Showcasing Undermind.ai
Showcasing PubMed
Showcasing 2DSearch
Showcasing Polyglot in Tera Tools
Structure of Search Strategies
De-Duplication or a typology of duplicates
Updating Searches
Auto-Update in EPPI-Reviewer using Network Graph Search (OpenAlex)
Golden Rule in Search of Systematic Reviews
Systematic Review for Learning vs Systematic Review for Publishing
LEMASyR and BIMS-ARINES

URL for Search Session: https://www.youtube.com/watch?v=oXqJQeHWhUs

Session 3: Semi-Automation of Screening in Systematic Reviews

Record vs Report vs Study
How do we do screening in systematic reviews?
Rule-Based Automation in Search and De-Duplication, Machine Learning in Screening
Supervised Active Learning, Priority Screening, and the relevant tools
How much screening is enough? Screening Progress Graph and Stopping Rules in EPPI-Reviewer, PICO Portal, and ASReview
Tools for Record Screening (Title-Abstract Screening): Rayyan, Covidence, ASReview, EPPI-Reviewer, PICO Portal, and Laser AI
Showcasing Rayyan
Showcasing Covidence — 4 Types of Automation
Priority screening, Tagging/Labelling Overview, and Late PROSPERO Registration
Screening Progress Graph, Priority Screening, and Stopping Rules: Switching from two to one and from one to two reviewers.
Report Screening or Full-Text Screening: PICO Data Extraction Using LLMs and Checking the Output
Golden Rule in Using AI/Automation in Systematic Reviews
Newsletter to Keep Up with Most Recent Publications Regarding Using AI in Systematic Reviews and Evidence Synthesis and LEMASyR Map
BIMS-ARINES and LEMASyR

URL for Screening Session: https://www.youtube.com/watch?v=WQ1WsMyh9vM

Year-End Message — Systematic Review Consultants LTD: 2025

It was just about yesterday that we were concerned about all computers, watches, and machines stopping as soon as we entered the year 2000. Now, we are just a quarter of a century older and freaked out about AI taking over the world. Who would’ve thought?

Brief Company Update

It was an active year for our team.

We delivered 27 projects from a wide variety. From rapid and scoping reviews to Systematic Reviews for Health Technology Assessments and Clinical Practice Guidelines.
Our collaboration with the WHO was expanded.
We delivered two systematic review workshops.
All our traditional clients have continued working with us; some for more than 12 years.
Thanks to the Library and Information Science (LIS) Community, we have two new clients. We should repay the favour to the community with interest.

Bursaries

For the first time, we are funding LIS students and professionals to participate in the Search Solutions conference. Currently, we fund the conference registration fee for two LIS students and two professionals; however, we are considering extending this offer to other conferences and covering the travel and/or accommodation costs. We owe a lot to the LIS community, as they are at the forefront of the defence against misinformation and disinformation about AI-based technologies.

Pro bono Project

This year, for the first time, we have decided to have a pro bono project. A single project with real-world impact and no funding will benefit from our full services.

Free Access to Automation Tools

By paying the subscription fee, we continue to support projects that need access to fee-based automation tools, even if they are not our projects.

GAJET List

We will keep this list alive to serve open science. This list now appears on the websites of Harvard, Yale, and McGill University Libraries, among others. Thanks to all LIS professionals for their support.

Free Webinars, Recordings, and Oral History Series

We continue to post webinars and their recordings about using AI in evidence synthesis on our YouTube channel. We have not received sufficient support, which has delayed the expansion of the oral history series. When we started, we expected organizations such as Cochrane, JBI, and Campbell to at least link to this Oral History series. We want to continue these series, which are very time-consuming and require funding; however, we hope to receive support from the evidence synthesis community to promote them.

Newsletter to Keep You Up with AI Use in Evidence Synthesis

This newsletter (BIMS-ARINES) now has 116 subscribers. I can’t see subscribers’ names or email addresses, and I can’t use this information for marketing or spamming — how unfortunate. All I can do is ensure subscribers start their Sunday or Monday mornings with the most important publications on the use of AI in evidence synthesis in their inboxes. I have done it for 66 weeks and hope to continue.

The Surprise I promised last year.

Living Evidence Map for Automation of Systematic Reviews (LEMASyR) is there for you, free, ad-free, and login-free. Enjoy. Should we have another surprise for the next year?

The Year Ahead

You and I are the last hope.

Ultimately, humanity and ethics are our only remaining hopes in a world run by bullies for national interests. All ideologies, ancient and modern, landed on one golden rule of ethics, which is the most precious human wisdom lived across thousands of years:

Treat others the way you want to be treated.

Those who have the power and can and should act in accordance with this golden rule have no interest in humanity or ethics. 2025 showed us how we are the only hope left in the world. The choice is yours. Be neutral or make a difference.

We were privileged to find new friends and colleagues every year, and we thank them for trusting us beyond the formalities of the projects. People are irreplaceable treasure. But we also lost a friend, Professor Lelia Duley, who dedicated her life to producing game-changing primary and secondary evidence in Pregnancy and Childbirth. While we were scheduling to interview her for the Oral History series, we were unable to find a mutually convenient time and missed the opportunity. We hope someone writes a worthy obituary in an academic journal, but till then, I wrote a brief one.

AI is coming for us, and there is no escape.

In 2025, humanity was affected not only by politics but also by technology. AI started expanding with almost no regulation in place. As I always say:

Humanities (Ethics and Law) are two decades behind technology.

It has already taken many jobs, with AI companies not compensating for the resulting job losses. No training, we are on our own. It is just like when the Internet or Google came along, and everyone said, “Who needs a library?” Now, again, once more, the librarians and information professionals have started doing what they were doing for thousands of years, evaluating the tools, resources, and claims, writing reviews of tools, educating the users, and providing guidance to the lost but excited user community on how to use AI responsiblity; just like what Jean Armour Polly, another librarian, did 25 years ago.

2026 and Wealth

I read somewhere about five types of wealth: Financial, Social, Physical, Mental, and Time. I wish you could invest in all types of wealth, and the new year brings you whatever you wish for.

One Book

If there is one book I’d recommend you read, it would be

This Is for Everyone: The Unfinished Story of the World Wide Web, by Tim Berners-Lee, the inventor of the World Wide Web

It is a timely read, given the rise of AI, misinformation, and social media’s role in manipulating public opinion.

One Advice

Be kind regardless of the outcome!

Throw the fish back to the sea; even if the fish doesn’t acknowledge, the creator of that fish will notice. Azerbaijani-Persian-Turkish Proverb

Happy 2026

Signing off, caffeinated at 2 AM — Farhad

SRC Bursary for Library and Information Science – Search Solutions 2025

Living Evidence Map for Automation of Systematic Reviews (LEMASyR)

After spending some of my free time, finally could collect and put all the studies and publications related to the automation of systematic reviews in one place.

👍 ALL Studies in one place
👍 Updated DAILY
👍 FREE
👍 No login needed
👍 Private
👍 No Ads
👍 No Cookies
👍 Link to Map: https://nested-knowledge.com/nest/qualitative/21035
👍 Full Training will be released soon; Brief Training: https://youtu.be/bp7n1-IoR7E?t=3326

Webinar: How to Stay Up to Date in Your Field

I put this on social media but didn’t want this blog’s wonderful supporters to miss this free event.

📣 Overcome the Fear of Missing Out (FOMO)

📣 Learn about methods and techniques to effortless updating

📣 Learn about the tools (old & new) that make life easier

📣 All and more for free, and guess by whom?

Registration Link: https://SystematicReview.info/

Recording will be available for the subscribers to our YouTube Channel:

https://www.youtube.com/@SystematicReview?sub_confirmation=1

Screening the Search Results for Systematic Reviews: An Evolution of Semi-Automation Methods

We follow a sensitive approach to searching during systematic reviews to avoid missing relevant results. Since the searches are sensitive, it is expected that the majority of the search results will be irrelevant. After removing the duplicate records, it is time to separate the relevant results from the irrelevant results; it is inevitable to ‘screen’ them. Screening has two-three stages: 1. Title and abstract screening 2. Full-text screening. Some reviewers prefer to consider the title and abstract screening as two separate stages.

Screening in any of these stages involves two steps: 1. Decision: Finding the relevant or irrelevant record 2. Action: Assigning the relevant or irrelevant record to their folder/group/label.

For decades, the only way to identify the relevancy of the records was to read the titles and abstract one by one. As a result, the librarians and information specialists tried to use the following semi-automation methods to help the reviewers to identify the records much fasters:

Method A: Find and Replace in Notepad Using CAPITAL Letters

When people did not have any other word processors such as Microsoft Word, Open Office, Libre Office, or Pages or could not afford them, using Notepad was one of the best options.

We used Find and Replace to find words related to each of the main inclusion criteria (for example, ‘randomized’) and replace them with their All Cap form (for example, RANDOMIZED). This way, the reviewers could see the terms faster, and they did not have to spend time looking for them.

Method B: Importing Results with CAPITAL Letters into Citation Manager

When reference or citation managers became popular, we used to open the saved search results in Notepad or Word, capitalize the terms, and then import them into EndNote, Zotero, RefWorks, Mendeley, Citavi or any other reference manager. The reviewer could still see the capital terms and save time in finding them.

Method C: Find and Replace and Color-Coding in Word Processor

Like capitalization of the letter, word processors such as Open Office and Microsoft Word gave us more formatting freedom than what was possible in Notepad. Using find and replace, we could change the colour of words related to one of the inclusion criteria into green and the other inclusion criteria to blue and so on. It is also possible to change the colour of the word relevant to exclusion criteria into the red.

Some information professionals prefer highlighting with different colours to changing the font colour; others like to follow capitalization with more options for Italic, Bold, Underline, and Strikethrough!

It was possible to send the files in Word Processor file format and Annotated export style to the reviewers to see a citation and the abstract. Alternatively, it was possible to send the Word Document in RIS (RefMan o Reference Manager) tagged export style. After screening, it was possible to save the word document as a text file to import it back to the citation manager because it was an importable RIS file with its tags.

Method D: Title Search/Screening in Citation Managers

Many reviewers are still searching ‘Rats’ or ‘in Rat’ in the title of records using the search feature in citation managers to find and remove animal studies. Others use such a feature more complexly and may use it over 30 times to identify and exclude over half of the search results in one hour!

Method E: Web-based Software Programs

Nowadays, it is hard to find someone who has not heard of Rayyan or Covidence or hundreds of other computer programs that help the reviewers to manage the title/abstract screening stage efficiently. Many of these programs follow methods similar to Method C and involve lots of colour-coding and filtering by search (Method D).

Method F: Machine Learning

The emergence of machine learning (ML) apps such as Rayyan and EPPI-Reviewer, among others, was game-changing for the speed of screening and questioned the systematic reviewing process. Traditionally, we had to screen every record using the eyeball method (reading each record). However, these ML apps have options that allow you to train the app (machine) so the machine can screen the parts or the majority of the results! To do so, you need to make include/exclude decisions on between 50–200 diverse records and then you can train the machine (build an algorithm or model) specific for your review and rank/rate/sort the results based on relevancy. Usually, it is possible to stop the screening after going through 30–60% results. The machine can accurately detect irrelevant and relevant records. In some cases, even better than humans. No need to say ‘days’ faster than humans.

Some of these models/algorithms (for example, RCT Classifiers) have been tested and validated. They are so accurate that you can use the existing classifier rather than re-inventing the wheel by training another one. If it makes sense:

Validated machine learning classifiers/algorithms that we use during the screening can be compared to the validated search filters we use during the search for the systematic reviews.

ML apps have such an influence that even the new PRISMA guideline has a flow diagram for the reviews that use machine learning (automation) to encourage people to use and report their systematic review process without not panic.

Final Thought

We have built our current methods and skills based on the previous methods and skills. We need to remember and acknowledge the information professionals’ efforts over the last three decades and their contribution to the development of semi-automation methods for screening step of systematic reviews. Machine learning is a new tool in our toolbox to use. If you were using a screwdriver, now the drill is here with screwdriver Bits.

Typology of Duplicate Records in Systematic Review Context

Note: There is no advertisement or marketing component in this post.

Those who conduct systematic reviews are aware that after the search is done in more than one database, it is natural to have duplicate records; however, these are not the only duplicates that the review team deal with. Duplicates occur in several stages of the systematic reviewing process, and dealing with them is usually confusing and requires skills.

Intra-Database (Cross-Database) Duplicate Records

Since the researchers and reviewers are now searching the bibliographic databases to find the literature relevant to their research, many publishers do their best to index their journals in as many relevant databases as possible. Why? Because that is the best way to make their journals more visible.

When we search more than one database — the norm of systematic reviews — with the same or similar search strategies, the same journal papers appear among the search results of several databases. When we export the search results from all databases into a citation manager program such as EndNote or Zotero or Mendeley, or others, you have an option to define, find, and remove these duplicates — so-called de-duplication.

Some of such duplicate records cannot be recognised in search, de-duplication, or even title-abstract screening stages of the systematic reviews because their details are not usually the same at first sight. For example, non-English records and journal names are being indexed differently in each database. Since there is more than one reviewer involved in the post-search stages of the systematic review, one reviewer cannot see all the records. It is easy to overlook some of these records; reviewers usually identify some duplicates during full-text screening, data extraction, or sometimes after meta-analysis or peer-review stages.

Some databases provide options for you to remove the records from a certain database. For example, CINAHL allows you to exclude MEDLINE records; other than that, we usually have to use manual, automated, or semi-automated methods to find and remove duplicate records.

Inter-Database Duplicate Records

Sometimes, each database may have the same record more than once. It could be a simple double-entry error, or a version-control error could cause it. Many publishers nowadays publish their papers as e-pub or early view or online first to make it accessible to readers with no or less delay from acceptance date. These papers usually have a unique DOI number but not a set year of publication, volume, issue, or page numbers. In turn, some of the databases grab such early in-press publications and index them to make them available for their users. What happens is that when the full paper is published in a paginated format with full bibliographic details such as year, volume, issue, and page numbers, the databases may forget to update these details or add the fully published paper again. The same paper title may appear twice, if not more, among the search results of the same database.

Intra-Search (Cross-Search) Duplicate Records

Systematic reviews are as update as their search date. Most of the important systematic reviews are being published within 12 months from the search date, and if there is a delay, they usually run an ‘update search’. Even after the publication of systematic reviews, there are always reviewers who try to update them.

There are three ways to update a search: Auto-alerts, running a full search, or date limitation.

Saving the searces in the database’s user account and setting automatic periodical search alerts to receive the new results in your inbox;
Running the update search from scratch and de-duplicating the new search results against the previous (old) search results;
Running the update search using date limitation options in each database; such limitation could be to Date Published, Date Entered, Data Created, Publication Week, or Publication Year depending on how elaborately a database indexes these details.

Running an update search will also create Intra-Search Duplicate Records. For example, if you run a search in 2010 and then update it in 2015 and no matter how accurate your method of updating is, you will realise that there are always records that you have already seen in the 2010 search, and they also appear in 2015 search.

This may happen for several reasons, including but not limited to:

The databases indexing speed is different. Database A may index a record a few months or a year after Database B;
The database updates the e-pub records and assigns new dates or year of publication;
The database updates the records for any reason and adds a new date such as ‘date revised’ or ‘date entered’.

Intra-Method (Cross-Method) Duplicate Records

Systematic searching of the bibliographic databases is the main but not the only way to find the relevant studies for the systematic reviews. Contacting the experts, checking the list of the references included studies, tracking the citations to the included studies, and so on are also among the other methods.

The reviewers are usually confused about reporting the duplicates found from these methods in their PRISMA flow diagram because the main de-duplication is reported immediately after the search stage. Still, the checking references and citations are after the full-text screening stage. So, if there are duplicates between the records from the systematic search method and checking the reference method, it is unclear where to report them. In larger systematic reviews, there is almost always such duplicate records.

Inter-Study Duplicate Records

Once the researchers secure funding for a research project, they try to create as much academic output as possible. It is prevalent in medical sciences that researchers publish the findings in several papers and present them in several conferences. Such dissemination will create conference abstracts that have been presented in different conferences but with the same or very similar title, abstract, and authorship.

While many reviewers consider such abstracts as duplicates or unimportant — they may be right — they are not considered duplicates in a systematic review; rather, they are different reports of the same study.

The best way to deal with them is to keep them under one study name and cite all of them — So-called Studification. For example, Jackson et al. 2021 [8–12]. This way of dealing with them has several benefits:

The reader would know that although this is one study, this study has been reported in several papers;
The systematic effort of identifying all the reports has been documented properly and shows how carefully the researchers have checked every single paper;
If you delete them, the aware readers and users will be confused why you have not included this or that paper or conference abstract; by keeping them, you answer their question that those reports all belong to the same study;
Although these duplicate records may not add anything new when they do, they usually report important missing details or discrepancies. For example, they may report more participants than the original full paper and help you critically appraise the reason for missing those participants.

One of the recent categories of duplicate records in Inter-Study Duplicate Records is Intra-Version (Cross-Version) Duplicate Records. More researchers tend to release their manuscripts earlier or in formats other than journal publication with recent movements towards open science. This release is either through pre-print servers such as Arxiv, medRxiv, bioRxiv, or others, or the institutional repositories. They are duplicates of the published version of the paper; however, such as the above-mentioned conference abstracts, they may help detect discrepancies and important details.

Inter-Dataset Duplicate Data

This is one of the trickiest duplicates to deal with. When a dataset is available, the researchers tend to play with it and publish as many papers as possible. It is possible to identify salami publications of the same research; however, it is always possible that the separate reports of the same study may use the same or similar data and cause Inter-Dataset Duplicate Data.

Inter-Dataset Duplicate Data have several categories depending on their release: Time-dependant, salami, data volume dependant, imprisoned data, and open data.

Time-Dependent Release: Researchers report only part of the results such as primary results in one paper, the final results in another paper, and follow-up results in a third paper;
Salami Release: To increase the number of their publications, they report only one part of the findings per paper;
Data Volume Dependent Release: Since there are a lot of data generated from the research, the researchers have no choice but to report it in several papers because the journals have a limitation of paper length;
Imprisoned Data Release: Since the researchers have access to the private dataset, they publish several papers from those data even a decade after the end of their research. Such publications appear in journals as post hoc analysis or secondary analysis papers;
Open Data Release: The research dataset is open online for the public, and any researcher can access and generate publications out of these data.

The reviewers need to assess and choose the high quality and the most comprehensive report of the dataset.

Intra-Report (Cross-Report) Duplicate Data

Those who are able to run multiple research studies alongside each other — mainly pharmaceutical companies — also tend to publish the findings from those studies together. They usually publish multiple papers, but each paper reports more than one study. While separating these data in an understandable and analysable way is not always easy, it is also difficult to identify unique data per a study from these papers. There is almost always overlapping/duplicate data.

Conclusion

Unlike the simplistic viewpoint that considers finding and removing duplicates as an easy and single step of systematic reviewing, it requires skills to prevent, identify, and remove duplicate and redundant reports and data. Duplication can be detected at any stage of the systematic review.

RAG

Factors Affecting the Quality of Data Extraction using LLMs in Systematic Reviews

Factors related to LLM

Factors related to prompting

Factors related to the quality of the report

Factors related to the interface

Factors related to the requested data

Classification of LLM-Based Tools’ Errors in Data Extraction

Conclusion

Does AI Save or Take Time?

Explainability or Seeing Is Believing (SIB)

Session 2: Semi-Automation of Search in Systematic Reviews

Session 3: Semi-Automation of Screening in Systematic Reviews

Brief Company Update

Bursaries

Pro bono Project

Free Access to Automation Tools

GAJET List

Free Webinars, Recordings, and Oral History Series

Newsletter to Keep You Up with AI Use in Evidence Synthesis

The Surprise I promised last year.

The Year Ahead

You and I are the last hope.

AI is coming for us, and there is no escape.

2026 and Wealth

One Book

One Advice

Subscribe to continue reading

Method A: Find and Replace in Notepad Using CAPITAL Letters

Method B: Importing Results with CAPITAL Letters into Citation Manager

Method C: Find and Replace and Color-Coding in Word Processor

Method D: Title Search/Screening in Citation Managers

Method E: Web-based Software Programs

Method F: Machine Learning

Final Thought

Intra-Database (Cross-Database) Duplicate Records

Inter-Database Duplicate Records

Intra-Search (Cross-Search) Duplicate Records

Intra-Method (Cross-Method) Duplicate Records

Inter-Study Duplicate Records

Inter-Dataset Duplicate Data

Intra-Report (Cross-Report) Duplicate Data

Conclusion