Here on Techdirt, we write a lot about content moderation and even did a whole big series of content moderation case studies. However, here’s an interesting one that involves Techdirt itself from a couple weeks ago. It’s also a perfect example of Masnick’s Impossibility Theorem in action and a reminder of how the never-ending flood of spam and scams provides cover for bad actors to sneak through abusive reports.
This case should also be a giant red flag to policymakers working on content moderation laws. If your policy assumes everyone reporting content has pure motives, it’s not just naive, it’s negligent. Bad actors will exploit any system that gives them power to take down content, full stop.
Here’s what happened:
We were off on the Friday after Thanksgiving, and I went for a nice hike away from the internet. After getting home that evening, I saw an email saying that when the sender had tried to visit Techdirt, they received a warning from Cloudflare that the site had been designated a “phishing” site.
I logged into our Cloudflare account and found that we had been blocked for phishing.
I did have the ability to request a review:
But, this all seemed pretty damn silly. Then I remembered that a couple days earlier, I had received a very odd email from another security provider, Palo Alto Networks, telling me that it had rejected my request to reclassify Techdirt as a phishing site. Somewhat hilariously, it said that the “previous” category was “computer and internet info” and that I had requested it be reclassified as phishing (I had not…) and instead they had “reclassified” it back to computer-and-internet info.
It seemed fairly obvious that some jackass was going around to security companies trying to get Techdirt reclassified as a phishing site. It didn’t work with Palo Alto Networks, but somehow it did with Cloudflare. It’s unclear if it was tried anywhere else, and how well it worked if it was tried elsewhere.
Thankfully, Cloudflare was quick to respond and to fix the issue. On top of that, the company was completely open and apologetic about how this happened. There was no hiding the ball at all. In fact, Cloudflare’s CEO Matthew Prince noted to me that this kind of thing might be worth writing about, given that it was a different kind of attack (though one he admitted the company never should have fallen for).
So how did this happen? According to Cloudflare, their trust & safety team were trying to go through a backlog of phishing reports and bulk processed them without realizing there was a bogus one (for Techdirt!) in the middle.
I understand that some people in my shoes would be pretty mad about this. However, I’ve spent enough time with trust & safety folks to know that this kind of shit happens all the time. And it kind of has to. The vast, vast majority of trust & safety work is processing just obvious bad stuff: spam and scams. If you’re dealing with hundreds or thousands of those at once, it’s totally possible for a legitimate one to slip through the cracks. If a company actually hand-reviewed every single possible report, then the backlog would grow larger and larger, leaving actual spam and scam sites online.
This is the impossible bind that trust & safety teams find themselves in. Trust & safety teams obviously feel compelled to remove actual spam and scams relatively quickly to protect users. But going too quickly sometimes means making some mistakes.
We were just caught in the crossfire on this one. That’s not to say that this kind of nonsense would work for anyone else. Cloudflare tries to review such reports, but sometimes mistakes happen. I mean, we get the same thing (on a smaller scale) with our spam filter here at Techdirt. If we get 2000 spam comments a day (which happens most days) and one false positive gets caught, we might not spot it. We actually have a separate system that tries to catch those mistakes and shunt them to a separate queue, so I think we still find the vast majority of falsely flagged comments, but I’m sure we miss some.
This is always going to be a challenge for trust & safety teams, and not something that some new regulation can realistically help with. If the law mandated a human review, you’d get problematic results with that too. Backlogs would grow. And even with a human, there’s no guarantee they’d have spotted this bogus request, since they’d probably be rapidly reading through hundreds of other similar reports, without the time or the capacity to go check each site carefully.
Cloudflare told me that the message they received was obvious bullshit. Someone sent them a report about Techdirt, saying “There is malware that they spread to their visitors.” The problem was just that, in this case, no human read it. We just got bulk processed with a bunch of other reports, most of whom I’m sure were really pushing malware or phishing.
Yes, it may be mildly annoying that visitors were warned away from Techdirt for a few hours. But to me, it’s even more fascinating to see someone trying this attack vector and having it work, if only briefly.
It’s a reminder that bad actors will try basically anything to try to find weaknesses in a system. So many of the laws around content moderation around the globe, such as the DSA, often seem to assume that basically everyone is an honest broker and well-meaning when it comes to moderation decisions. But, as we see here, that assumption can help allow bad actors to wreak havoc.
Policymakers need to start from the premise that some people will abuse any system that lets them take down content as they consider new content moderation laws. Laws that assume good faith are doomed. There are inherent tradeoffs in any approach, and even with the best system, mistakes are inevitable. The DMCA teaches us that any system that enables content removal will be abused. Policymakers must factor that in from the start, and yet they almost never acknowledge this.
Anyway, I appreciate Cloudflare’s quick response, apology, and willingness to be quite open about how this happened. And thanks for giving us another interesting content moderation case study at the same time.