We Open-Sourced a Dataset of Predatory Conferences — Here’s Why

And how we’re using it to protect researchers on callforpaper.org

Every week, thousands of researchers receive an email like this:

“Dear Distinguished Scholar, we are pleased to invite you to present your groundbreaking research at the International Conference on Advanced Engineering and Applied Sciences, to be held in Dubai / Paris / Singapore / Tokyo…”

If you’ve spent any time in academia, you know the feeling — that mix of fleeting flattery and immediate suspicion. The subject line sounds important. The venue sounds glamorous. The deadline is always next week. And the registration fee, buried three paragraphs down, is somewhere between $400 and $1,200.

These are predatory conferences. And they are, by some estimates, now more numerous than legitimate ones.

Today, we’re publishing an open dataset to help fight them — and we want your help keeping it up to date.

What makes a conference predatory?

The term was coined by librarian Jeffrey Beall, who spent years cataloguing what he called “predatory publishers” — outfits that charge researchers to publish, provide no real peer review, and exist primarily to extract money from academics under pressure to publish or perish.

Predatory conferences are the same idea applied to events. Their defining features:

No real peer review. Anyone willing to pay can present. Submissions are “reviewed” in days or hours. The acceptance rate is effectively 100%.

Deceptive credibility signals. They claim indexing in Scopus, Web of Science, or IEEE Xplore — claims that are frequently false or misleading. They list “keynote speakers” who have never agreed to attend. They use names almost identical to legitimate conferences.

Mass production at scale. The most notorious offenders — WASET (World Academy of Science, Engineering and Technology) and OMICS International — don’t run a few conferences a year. They run hundreds simultaneously. WASET has been documented running dozens of unrelated conferences on the same two days at the same hotel, across fields as disparate as Islamic architecture, water treatment, and deep learning.

The money is the point. OMICS International was charged by the US Federal Trade Commission in 2016 and ultimately hit with a $50.1 million judgment for deceptive practices. That number gives you a sense of the scale of the operation.

The damage isn’t just financial. Researchers who unknowingly present at these events can face real consequences — institutions questioning the legitimacy of their conference record, papers locked into predatory proceedings that prevent legitimate publication elsewhere, and reputations attached to venues their peers consider embarrassing.

Early-career researchers and scholars from developing countries are disproportionately targeted. The spam is calibrated to find people who need publications on their CV and may not yet have the field knowledge to immediately recognize the red flags.

The problem with existing resources

Several good resources already exist. Beall’s original list, before it was taken down under legal pressure in 2017, was the gold standard. The archived versions are still referenced everywhere. The stop-predatory-journals project on GitHub maintains a community-updated list of publishers. The Caltech library maintains a “Questionable Conferences” list. The Think. Check. Attend. initiative provides a checklist researchers can use to evaluate any conference themselves.

But none of these are quite what we needed for callforpaper.org.

The existing lists are scattered across blog archives, library guides, and GitHub repos in inconsistent formats. Some focus on publishers rather than conference organizers specifically. None are structured as a clean, machine-readable dataset with a defined schema and evidence attribution. And none are connected to a live platform that can actually surface the information at the moment a researcher is considering submitting to a CFP.

We needed something we could query programmatically, keep updated, and contribute back to as we encounter new cases.

What we built

The callforpaper/predatory-conferences repository on GitHub is a community-maintained, open dataset of predatory conference organizers and flagged conference series.

The current dataset compiles and deduplicates entries from eight sources:

Beall’s List (archived)
stop-predatory-journals (GitHub)
Caltech’s Questionable Conferences list (Dana Roth)
The Dolos List
boytchev/spam (researcher-collected spam metadata)
FTC enforcement action records (OMICS)
WASET documentation
Community submissions

It currently covers 50 known organizers and 28 specific conference series, with each entry carrying a status (confirmed or suspected), a source attribution, and an evidence URL where available.

The schema is deliberately simple:

name, domain, aliases, status, evidence_url, source, added_date, notes

Domain-based matching means we can check any CFP submission against the dataset in milliseconds.

Everything is released under CC0 — no restrictions, no attribution required. Use it in your own tools, your institution’s systems, your spam filters, whatever helps.

How it connects to callforpaper.org

On the callforpaper.org side, this dataset feeds into our backend trust-scoring system — the same system that powers our verification badge tiers for conference organizers.

We deliberately chose not to show public “warning” labels on CFP listings. Labeling a conference as predatory opens legal and reputational risks, and more practically, it risks deterring legitimate organizers who happen to share a domain pattern or who are early-stage and not yet well-known.

Instead, the dataset informs a positive reinforcement model. Conferences from organizers not on this list, with verifiable indexing and committee affiliations, earn higher trust tiers. The predatory dataset acts as a signal that suppresses trust scores rather than triggering a visible warning. Researchers benefit from the protection without the platform needing to make public accusations.

The sync is automated. A weekly script fetches the latest CSV from GitHub and updates our backend lookup table. Accepted community contributions are live within days.

The criteria

We publish our classification criteria openly in the repository. The short version: an organizer is classified as confirmed when there is formal documentation — a government enforcement action, a university advisory, archived Beall’s list inclusion, or multiple corroborating credible sources. Suspected requires meeting three or more of fourteen documented patterns, plus at least one public source.

We have a disputed status for entries under active review, and a defined removal process for anyone who believes an entry is incorrect.

What we explicitly don’t include: conferences that are simply low quality, conferences that rejected someone’s paper, or regional events that are small and poorly organized but operating in good faith. Low quality is not the same as predatory. The dataset is about intentional deception and exploitation, not academic mediocrity.

What we’re asking for

The dataset is only as good as the community that maintains it. The most valuable contributions right now are:

New confirmed entries. If you know of an organizer with a government action, university advisory, or strong documented evidence that isn’t in the dataset, open an issue or submit a PR. The issue template walks you through what we need.

Better evidence URLs for existing entries. Some entries, particularly those compiled from older archived lists, are missing direct evidence links. If you can find primary documentation for any of them, that’s enormously useful.

Field-specific expertise. The current dataset skews toward computer science and engineering because that’s where the most public documentation exists. Medicine, social sciences, humanities — these fields have their own predatory conference ecosystems that are underrepresented. If you work in these areas and have encountered documented cases, we want to know.

Corrections. If anything in the dataset is wrong — an organizer that has genuinely reformed, an entry that was added on insufficient evidence — tell us. The dispute process is open and we take it seriously.

A note on what this isn’t

This dataset is not a legal accusation. Suspected entries reflect documented patterns, not proven wrongdoing. We are a community of researchers and engineers trying to protect other researchers, not a regulatory body.

We’re also not naive about the limitations. New predatory organizers emerge constantly, often under new domain names and slightly different branding. No static list ever stays complete. The dataset is most valuable as a layer in a larger trust-scoring system — one signal among several — not as a definitive blacklist.

The goal is to raise the cost of the deception, lower the information asymmetry between researchers and bad actors, and give tools like callforpaper.org the signal they need to surface trustworthy conferences first.

Get involved

The repository is at github.com/CallForPaper/predatory-conferences.

Star it if you find it useful. Fork it if you want to build on it. Open an issue if you have something to add. And if you work at a university library or research institution that maintains its own predatory conference list, reach out — we’d love to consolidate rather than duplicate.

Predatory conferences thrive on information asymmetry. An open, well-maintained, widely-used dataset is one of the most direct ways to reduce it.

callforpaper.org is a platform for discovering and submitting to academic conference calls for papers. The predatory-conferences dataset is maintained as a separate open-source project and is available for use by anyone under CC0.

We Open-Sourced a Dataset of Predatory Conferences — Here's Why