mCaptcha: 100% FOSS privacy-respecting captcha system

realaravinth · September 26, 2022, 1:19am

Hello!

I’m the lead developer of mCaptcha, a 100% FOSS CAPTCHA system that uses proof-of-work.

Folks living in oppressive regimes use Tor to access the internet. Using proprietary black box CAPTCHA systems to guard and deny access to such critical resources can be a source of censorship as the black box can impartially deny access. In order to ensure impartial behavior, we should be able to independently reproduce the validation decisions made by guarding mechanisms. mCaptcha is one such protection mechanism, which is transparant and reproducible.

Demo

https://showcase.mcaptcha.org demonstrates a service that relys on mCaptcha for protection. To compute PoW(validation logic), the demo runs JavaScript code. mCaptcha is 100% FOSS, so all source code is freely available at mCaptcha · GitHub.

Requesting feedback

Switching to transparent, libre technologies to protect resources can significantly enhance global access to the internet and privacy in general. So offering impartial, accurate protection for Tor users is is very important to mCaptcha, and I would appreciate any feedback you might have.

P.S I was going to post on here a little later, but I’m applying for funding with the Open Tech Fund and the reviewer prompted that I make contact. The funding journey is published for transparency purposes, please see here to learn more about it.

Vort · September 26, 2022, 6:16am

CAPTCHA is Completely Automated Public Turing test to tell Computers and Humans Apart.
I’m very surprised that hash calculations can be called a Turing test.

realaravinth · September 26, 2022, 7:29am

You are right. mCaptcha’s use of “CAPTCHA” to describe itself is to simplify explaining its functionality, so it isn’t entirely accurate.

Bypassing CAPTCHAs is simple and can be automated with services from captcha farms. So IMHO, the current use cases are very similar for accurate CAPTCHAs like reCAPTCHA/hCaptcha and mCaptcha: both limited abuse from automation.

Vort · September 26, 2022, 10:32am

I tried to use it several hours ago. “I’m not a robot checkbox” was working, but pressing Register resulted in some error.
I decided to test it again to see what exact error it returned, but now I see only “Something wen’t wrong” message while clicking on checkbox (which looks like have grammar error in it).

So service do not tells computers and humans apart, it just slows down both of them.
Which is not a big problem for bots, but makes enough inconvenience for humans.
How much real user will want to wait for registration? 10 seconds?
Let’s say 1 minute. Bot working for just 1 hour will be able to create 60 accounts.
It may be not much load on server by itself, but registration is not final goal for bot. It will then post spam messages on forum for example. Will human moderators be able to remove 60 spam posts per hour? I doubt it.

It was right decision to make demo service to show how it works.
However, there is one detail: it should work for it.
Which is not obvious for now.
Judging from results which I see, I expect that demo may be not just overloaded, but also hacked.
If you will be able to make it withstand attacks, then it may prove that service can be useful.

realaravinth · September 26, 2022, 12:54pm

Appreciate the feedback

Apologies, the server ran out of memory.

The demo is indeed hacked to together, but I manually verified it before posting on here. It’s running out of my bedroom as I can’t afford a more stable location to run it. First time running out of memory, so apologies for the inconvenience.

Not necessarily. reCAPTCHA/hCaptcha serve (multiple) challenges by default for visitors from Tor.

mCaptcha implements PoW difficulty scaling, which greatly improves UX. Time taken to generate PoW when the service is seeing regular traffic will be very low(100-200ms). The difficulty ramps up when the site is under attack and will reduce when the attack is contained.

So in essence, the average Tor user will spend more time solving reCATCHA/hCaptcha than they will with mCaptcha.

I’ve conducted experiments which prove mCaptcha’s ability to contain attack. The source code for the experiment is publicly available here. The setup runs an mCaptcha server, a client service and a Locust(locust dot io) DoS client.

In my experiment, I set it up on a single computer(the funding will allow me to simulate DDoS scenarios). Here are the results as reported by Locust:

With mCaptcha protection:

Attack is detected, and difficulty level is increased. Locust reports 0.6 requests/second.

Thanks for reporting it, this patch fixes it, should be deployed once the CI pipeline finises.

I’m more than happy to discuss the technical aspects of mCaptcha. But I posted here to learn more about the needs Tor hidden service sysadmins and to see how I can be of help to them.

Vort · September 26, 2022, 1:33pm

Ok, now I see correct result for the first time.
No database mean nothing to protect however.

For me, reCAPTCHA asks to complete several challenges to continue with Google search, then, when challenges are complete (if I’m lucky not to hit artificial timeouts), shows that it will not allow me to pass anyway

I understand how to determine if hardware is overloaded.
But criteria for overloading people (moderators for example) can be harder to formalise.

That’s a protection for computers. Ok.

That’s not me.
Maybe someone else will give such sort of feedback.
However, surprisingly, this forum is not much popular.

RendezvousPoint · September 26, 2022, 1:33pm

One worry with a JS based PoW is that when JIT is disabled (such as with the Medium security setting in the Tor Browser) the performance hit might be way too much. I tested it in this case and it was okish, about 10 seconds.

realaravinth · September 26, 2022, 1:59pm

One worry with a JS based PoW is that when JIT is disabled

The PoW generation uses WebAssembly(WASM) and fallback to a JavaScript implementation to support those browsers that can’t run WASM

But I understand what you mean The situation is further complicated by the fact that there are not all devices have the same CPU power. So for slower, older devices, the experience might be worse.

To deal with that, I am working on a survey[0] to benchmark devices on the internet. The results will help the webmaster, who will use mCaptcha to protect their website, in choosing a difficulty factor that will provide decent experience for their visitors.

[0]: mCaptcha/survey on GitHub. The forum wouldn’t let me post the link

Vort · September 26, 2022, 2:29pm

Do you expect people to build survey by themselves? O_o
I tried to click checkbox on https://survey.mcaptcha.org , but something went wrong. This time without ’ )

realaravinth · September 26, 2022, 2:58pm

No, not really The reason I didn’t share that link was because it is broken, and it has a corny pitch to convince my friends at school to participate in it.

The survey is automated. The participant will only have to click a link. It will compute PoW with various difficulties with both the WASM and the JavaScript libraries and submit the data (number threads available and time take to compute PoW) to the server.

I think we are going way off-topic here, @Vort I enjoyed our discussions but if you are curious, I recommend joining the official Matrix chatroom (link is available on the project website). I don’t have the privileges to post on here without mod approval and I believe I am a major pain for the mods by creating unnecessary noise