Skip to main content
Luna Tong

Choosing an Audit Competition: How to Spot Snake Oil

A review of common misleading sales tactics used for audit competitions, and the questions you should ask
Article heading

Last summer, we acquired Code4rena, the leading audit competitions platform. In this blog post, we’ll review several misleading sales tactics commonly used in the space. We’ll arm you with the important questions to ask when purchasing a competition–including the ones BD teams don’t want you to think about.

What are audit competitions?

Audit competitions are a hybrid between traditional, consultative audits and bug bounties. They combine the time-boxed and fixed-cost nature of an audit with the crowdsourced, “as many eyes as possible” benefits of bounties. During an audit competition, there’s a fixed window where anyone can submit bugs to the protocol. At the end, a fixed prize pool is distributed across everyone who submitted valid findings.

Unlike traditional audits where you might have 2 or 3 auditors reviewing your codebase, competitions attract hundreds of security researchers, each with their own expertise and perspective. This wide range of approaches often leads to the discovery of complex, obscure vulnerabilities that might be missed in a traditional audit setting.

However, not all audit competitions are created equal. It’s important to look beyond sales claims and ask the right questions. Now, let’s go over some of the common misleading narratives used when selling audit competitions.

Misleading submission metrics

One common misleading tactic is the presentation of raw submission metrics without proper context. Some platforms will display charts showing hundreds of “high-severity findings”. However, these numbers often include duplicate submissions of the same issue.

Oftentimes multiple auditors will report a given issue. A platform might count this as 20 different “high-severity findings” in their deck, when it’s really just a single vulnerability. Additionally, some platforms include invalid or disputed issues in their metrics, further inflating the numbers. Both of these create a false impression of thoroughness and effectiveness.

The following is a real example from a sales deck we’ve seen. We believe the number of high and medium findings is not de-duped, and “total submissions” includes invalid findings. This is because it seems impossible for there to be 248 unique high severity issues in a $60,000 prize pool competition. For reference, this pool size usually corresponds to about ~3,000 SLoC–suggesting a high-severity bug for every 12 lines of code!

Questions to ask:

  • Do these metrics include invalid and spam issues?
  • Do these metrics count unique findings, or are they not de-duplicated?
  • How many of these issues were disputed by the sponsor?
  • What are your criteria for assigning High severity?
  • Can I see the audit report for these competitions or ones of similar size?
  • What were the durations of these competitions?

Misleading participation metrics

Another tactic is inflated “participation” numbers. You might see charts comparing auditor participation across platforms, but these comparisons are often flawed due to differing definitions of what counts as a “participant”.

Some platforms count anyone who clicks an “I’m joining” button as a participant, regardless of whether they actually review your code or submit findings. Others might count users who submitted at least one finding, regardless of validity. And some platforms might count only those who submitted at least one valid finding. Thus, a platform might show 200+ “participants” in their competitions when in reality, only 30-40 people actually submitted valid findings.

In the example below, one platform claims that their $1.2M contest attracted 573 “participants”, versus a $1.1M contest that attracted 72. But if you review the competition leaderboard, only 58 of those 573 “participants” actually submitted valid findings.

Here is another example. This tweet claims the competition had 601 participants, but it doesn’t define what a “participant” means. If you check the competition leaderboard, only about 50 researchers scored more than 1 point.

Questions to ask:

  • What is your definition of a “participant”?
  • How is this metric measured or obtained?
  • How do you know that these “participants” actually audited the code?
  • How many of these “participants” actually submitted a valid finding?
  • How many of them submitted Highs or Mediums?

Claims about exclusivity

Some platforms promote exclusive access to top security researchers as a key selling point. However, in practice, these exclusivity agreements often boil down to offering researchers incentives–like guaranteed bookings or direct fees–to avoid competing platforms.

This doesn’t guarantee participation in your specific competition, nor does it ensure meaningful time or effort from those researchers. They may split focus across multiple audits and competitions or choose not to engage at all. In many cases, their time is primarily allocated to private audits, not competitions.

As a result, teams may end up paying a premium for the possibility of involvement, without any assurance of actual or thorough participation. Here are examples we’ve seen in the wild:

Questions to ask:

  • Which of these “exclusive” researchers is contractually required to participate in my competition?
  • How many hours per week will they work on my competition?
  • Who will be managing them?
  • How is their performance monitored to ensure they are actually working hard on my competition?
  • Will they be exclusively assigned to my competition for the entire time? If No, why not, and what else are they be assigned to?

Claims about consultative auditors

Some platforms attempt to blur the lines between different service offerings by featuring their consultative auditors in materials promoting their competitions. This creates confusion by implying these auditors will be directly involved in your competition when they might not be.

For example, you might see impressive bios and credentials of senior security researchers who work for a parent or sister company in a pitch deck for an audit competition. However, these auditors may typically work on the platform’s traditional audits and not their competitions.

The reality is that in an audit competition, you’re paying for access to a pool of independent security researchers, not the dedicated attention of the platform’s consultative auditors.

Questions to ask:

  • Will they be exclusively assigned to my competition for the entire time?
  • How many hours per week will they work on my competition?
  • Who will be managing them and overseeing their work? What is the process here?
  • How is their performance monitored to ensure they are actually working hard on my competition?

Claims about social media engagement

Some platforms emphasize their social media engagement as a selling point, showcasing likes, retweets, and comments. While this might seem valuable, it’s important to understand what these metrics actually represent.

Social media engagement is one of the easiest metrics to manipulate. Everyone knows how egregiously and pervasively botted CT is. Even authentic engagement doesn’t always translate to meaningful adoption; airdrop farming is a classic example of the pitfalls of kind of “engagement”. Instead, what matters is how many qualified researchers actually participate in your competition and submit valid findings.

While exposure can be beneficial, remember that the primary goal of an audit is to find and fix vulnerabilities in your code. The most valuable engagement comes from researchers who deeply understand your protocol and later become developers or power users, not casual followers on CT.

Anonymous testimonials

Beware of sales materials that feature anonymous testimonials. Phrases like “Leading web3 protocols” or “Major lending platform” are indicators quotes could be cherry-picked.

Without knowing who provided the testimonial, you have no way to verify its accuracy or relevancy. You also can’t determine if it represents a common experience or an isolated incident that’s been amplified for marketing purposes. Platforms should be willing to connect you with real, named customers who can share their experiences directly so you can ask follow-up questions and conduct proper reference checks.

This is a real example of one platform bashing another platform. (Neither of the platforms are ours.)

This testimonial is also problematic for another reason: AI spam is a challenge faced by all platforms. Presenting this as a problem unique to a competitor is disingenuous.

Remember to ask:

  • What is the name and company of the person being quoted?
  • Can you introduce me to them for a reference check? What is their contact info?

Comparisons between individual audits

Finally, another misleading tactic is the direct comparison between two individual audits. These comparisons are almost always flawed and miss important context.

No two audits are directly comparable because of the many variables involved. Different codebases have different complexity levels, vulnerabilities, and maturity. Even multiple audits of the same code may reflect different states of the codebase—the first audit is more likely to find many bugs, while subsequent audits will find fewer because the codebase has already been picked clean. And in general, the scope, duration, and focus areas can vary dramatically between engagements.

Be wary of the differences in severity rating scales and criteria between firms. For example, some platforms do not have “Critical” findings; instead they have “High” and “Medium”. We’ve seen some firms claim that others’ audits didn’t yield any “Critical” issues—when the truth is that they weren’t on the scale to begin with. Alternatively, what may be a “Medium” on one platform may be a “High” on others.

Questions to ask:

  • Is your side of the comparison a competition, or was it actually a consultative audit?
  • Where can I inspect the audit reports for each audit?
  • When did the two audits take place?
  • Did either audit take place after fixes had already been made after a previous audit?
  • Did the two audits have the same scope and same commit hash?
  • Were both auditors instructed to focus on the same areas of the scope?
  • How many lines of code were in scope for each audit?
  • What were the durations and enginer-weeks assigned for each audit?
  • What are your criteria for assigning high severity to an issue?
  • Do either of the reports have multiple issues grouped into a single finding?
  • What did the client fix after each of the audits?
  • Can I have the contact info of the audit customer so I can do a reference check?

General tips when selecting an audit competition

When selecting an audit competition platform, focus on transparency, clear communication, and meaningful metrics. The right platform should be forthcoming about their process, willing to connect you with past customers, and able to clearly explain how they ensure quality and manage their community of researchers.

Always ask about the economics of the engagement beyond just the prize pool:

  • What is the marketplace fee you are charging (in $)?
  • Will this be a conditional pool? If no Highs or Mediums are found, how much of my pool (in $) will be refunded?
  • What other competitions will be running at the same time as mine?
  • (If a slide deck is presented) Can you send me a copy of this deck?

Other frequently asked questions we encounter:

  • Does a conditional pool have a negative effect on turnout? Generally, yes, which is why we advocated against these in the past. The benefit to the customer is that if there are no High or Medium bugs found, most or all of the prize pool gets refunded. The drawback is that this makes participating in the competition much less attractive to researchers, which can harm how many bugs ultimately get reported. That being said, conditional pools become more or less an industry standard across all of the major competitive audit providers, and most platforms will support either a conditional pool or not depending on what you ask for.

  • What are the most important metrics when comparing different providers? The most important metrics are # of High/Medium findings reported and # of researchers who submitted valid findings. These are the key metrics that determine the overall value of your audit. They directly measure the security impact on your protocol and the level of assurance (the number of eyes on the code).

I’m a researcher. How does this affect me?

Misleading sales tactics by any competition platform damage the relationship between platform and sponsor in the long run. Sponsors who are misled and receive less than they were promised become disillusioned with competitions in general. This harms the entire community of security researchers–not just on the misaligned platform, but all competitive audit platforms. Over time, this leads to less funds being directed to community security research, like competitive audits and bug bounties. Ultimately the decision is yours, but we recommend considering this as a factor when choosing which platform(s) you compete on.

Conclusion

Choosing the right partner for an audit competition comes down to asking the right questions. Focus on how many high-quality security researchers will actually review your code and how many valid, unique vulnerabilities they’ll submit. Be skeptical and demand transparency about what you’re paying for. The right platform will be happy to answer.