Typology of Duplicate Records in Systematic Review Context

Duplicates in Systematic Review

Note: There is no advertisement or marketing component in this post.

Those who conduct systematic reviews are aware that after the search is done in more than one database, it is natural to have duplicate records; however, these are not the only duplicates that the review team deal with. Duplicates occur in several stages of the systematic reviewing process, and dealing with them is usually confusing and requires skills. 

Intra-Database (Cross-Database) Duplicate Records

Since the researchers and reviewers are now searching the bibliographic databases to find the literature relevant to their research, many publishers do their best to index their journals in as many relevant databases as possible. Why? Because that is the best way to make their journals more visible.

When we search more than one database — the norm of systematic reviews — with the same or similar search strategies, the same journal papers appear among the search results of several databases. When we export the search results from all databases into a citation manager program such as EndNote or Zotero or Mendeley, or others, you have an option to define, find, and remove these duplicates — so-called de-duplication

Some of such duplicate records cannot be recognised in search, de-duplication, or even title-abstract screening stages of the systematic reviews because their details are not usually the same at first sight. For example, non-English records and journal names are being indexed differently in each database. Since there is more than one reviewer involved in the post-search stages of the systematic review, one reviewer cannot see all the records. It is easy to overlook some of these records; reviewers usually identify some duplicates during full-text screening, data extraction, or sometimes after meta-analysis or peer-review stages.

Some databases provide options for you to remove the records from a certain database. For example, CINAHL allows you to exclude MEDLINE records; other than that, we usually have to use manual, automated, or semi-automated methods to find and remove duplicate records.

Inter-Database Duplicate Records

Sometimes, each database may have the same record more than once. It could be a simple double-entry error, or a version-control error could cause it. Many publishers nowadays publish their papers as e-pub or early view or online first to make it accessible to readers with no or less delay from acceptance date. These papers usually have a unique DOI number but not a set year of publication, volume, issue, or page numbers. In turn, some of the databases grab such early in-press publications and index them to make them available for their users. What happens is that when the full paper is published in a paginated format with full bibliographic details such as year, volume, issue, and page numbers, the databases may forget to update these details or add the fully published paper again. The same paper title may appear twice, if not more, among the search results of the same database.

Intra-Search (Cross-Search) Duplicate Records

Systematic reviews are as update as their search date. Most of the important systematic reviews are being published within 12 months from the search date, and if there is a delay, they usually run an ‘update search’. Even after the publication of systematic reviews, there are always reviewers who try to update them. 

There are three ways to update a search: Auto-alerts, running a full search, or date limitation.

  1. Saving the searces in the database’s user account and setting automatic periodical search alerts to receive the new results in your inbox;
  2. Running the update search from scratch and de-duplicating the new search results against the previous (old) search results;
  3. Running the update search using date limitation options in each database; such limitation could be to Date Published, Date Entered, Data Created, Publication Week, or Publication Year depending on how elaborately a database indexes these details.

Running an update search will also create Intra-Search Duplicate Records. For example, if you run a search in 2010 and then update it in 2015 and no matter how accurate your method of updating is, you will realise that there are always records that you have already seen in the 2010 search, and they also appear in 2015 search.

This may happen for several reasons, including but not limited to:

  1. The databases indexing speed is different. Database A may index a record a few months or a year after Database B;
  2. The database updates the e-pub records and assigns new dates or year of publication;
  3. The database updates the records for any reason and adds a new date such as ‘date revised’ or ‘date entered’.

Intra-Method (Cross-Method) Duplicate Records

Systematic searching of the bibliographic databases is the main but not the only way to find the relevant studies for the systematic reviews. Contacting the experts, checking the list of the references included studies, tracking the citations to the included studies, and so on are also among the other methods.

The reviewers are usually confused about reporting the duplicates found from these methods in their PRISMA flow diagram because the main de-duplication is reported immediately after the search stage. Still, the checking references and citations are after the full-text screening stage. So, if there are duplicates between the records from the systematic search method and checking the reference method, it is unclear where to report them. In larger systematic reviews, there is almost always such duplicate records.

Inter-Study Duplicate Records

Once the researchers secure funding for a research project, they try to create as much academic output as possible. It is prevalent in medical sciences that researchers publish the findings in several papers and present them in several conferences. Such dissemination will create conference abstracts that have been presented in different conferences but with the same or very similar title, abstract, and authorship.

While many reviewers consider such abstracts as duplicates or unimportant — they may be right — they are not considered duplicates in a systematic review; rather, they are different reports of the same study.

The best way to deal with them is to keep them under one study name and cite all of them — So-called Studification. For example, Jackson et al. 2021 [8–12]. This way of dealing with them has several benefits:

  1. The reader would know that although this is one study, this study has been reported in several papers;
  2. The systematic effort of identifying all the reports has been documented properly and shows how carefully the researchers have checked every single paper;
  3. If you delete them, the aware readers and users will be confused why you have not included this or that paper or conference abstract; by keeping them, you answer their question that those reports all belong to the same study;
  4. Although these duplicate records may not add anything new when they do, they usually report important missing details or discrepancies. For example, they may report more participants than the original full paper and help you critically appraise the reason for missing those participants.

One of the recent categories of duplicate records in Inter-Study Duplicate Records is Intra-Version (Cross-Version) Duplicate Records. More researchers tend to release their manuscripts earlier or in formats other than journal publication with recent movements towards open science. This release is either through pre-print servers such as Arxiv, medRxiv, bioRxiv, or others, or the institutional repositories. They are duplicates of the published version of the paper; however, such as the above-mentioned conference abstracts, they may help detect discrepancies and important details.

Inter-Dataset Duplicate Data

This is one of the trickiest duplicates to deal with. When a dataset is available, the researchers tend to play with it and publish as many papers as possible. It is possible to identify salami publications of the same research; however, it is always possible that the separate reports of the same study may use the same or similar data and cause Inter-Dataset Duplicate Data. 

Inter-Dataset Duplicate Data have several categories depending on their release: Time-dependant, salami, data volume dependant, imprisoned data, and open data.

  1. Time-Dependent Release: Researchers report only part of the results such as primary results in one paper, the final results in another paper, and follow-up results in a third paper;
  2. Salami Release: To increase the number of their publications, they report only one part of the findings per paper;
  3. Data Volume Dependent Release: Since there are a lot of data generated from the research, the researchers have no choice but to report it in several papers because the journals have a limitation of paper length;
  4. Imprisoned Data Release: Since the researchers have access to the private dataset, they publish several papers from those data even a decade after the end of their research. Such publications appear in journals as post hoc analysis or secondary analysis papers;
  5. Open Data Release: The research dataset is open online for the public, and any researcher can access and generate publications out of these data.

The reviewers need to assess and choose the high quality and the most comprehensive report of the dataset.

Intra-Report (Cross-Report) Duplicate Data

Those who are able to run multiple research studies alongside each other — mainly pharmaceutical companies — also tend to publish the findings from those studies together. They usually publish multiple papers, but each paper reports more than one study. While separating these data in an understandable and analysable way is not always easy, it is also difficult to identify unique data per a study from these papers. There is almost always overlapping/duplicate data.

Conclusion

Unlike the simplistic viewpoint that considers finding and removing duplicates as an easy and single step of systematic reviewing, it requires skills to prevent, identify, and remove duplicate and redundant reports and data. Duplication can be detected at any stage of the systematic review.

Published by Farhad

Medical Information Scientist

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: