Blindfolded: monitoring the dark web without seeing the data

Challenge

Grapnel is the darknet intelligence platform built by Kerala Police’s Cyberdome. It crawls hidden web content and flags material officers need to act on — including images depicting the sexual exploitation of children. We’ve contributed to it since 2021, most recently as Kerala Police’s official AI partner for Hack’P 2025.

The defining constraint: only specifically authorised officers may ever view the sensitive images, and that boundary is enforced strictly — including against us. The engineers building the classifier are never permitted to see a single image the model is trained on. Not anonymised, not redacted, not in aggregate.

That forces two problems most AI teams never have to solve at once. The training pipeline has to be run by people who hold the data but aren’t ML engineers. And the model has to be debugged by a team that can’t look at a single example of what it gets wrong.

Solution

Make the pipeline operable by the people with access. We collapsed training into a single Jupyter notebook structured around two folders — safe/ and unsafe/. To build and validate it we used a stand-in dataset we could see, chosen to be tonally unrelated to the real subject matter: cats as safe, dogs as unsafe. Structurally identical to the production data, emotionally distant from it. Once it worked on cats and dogs, officers swap in their real data and run the same notebook on a GPU machine on police premises — wiped before and after each session. The output is a model file. They hand it to us. We deploy it. We never see what it was trained on.

Bake interpretability into the operator workflow. High accuracy isn’t the same as being right for the right reasons, and we couldn’t inspect the failures ourselves. So we built Grad-CAM attribution heatmaps directly into the pipeline — officers generate them over any image themselves, without us. The heatmap shows which regions of an image drove the prediction, turning a black box into something the people with access can interrogate.

It paid off immediately. Officers reported ordinary room photos being flagged as unsafe; the heatmaps centred squarely on the bed. The model had latched onto a spurious correlation — beds were overrepresented in the verified unsafe set — and was right on those images for entirely the wrong reason, which meant it would fail unpredictably elsewhere. The fix was simple once the evidence was visible: rebalance the safe class with more bedrooms and furniture to break the association. None of it required us to see a single sensitive image.

Results

Grapnel has been in active operational use by Kerala Police Cyberdome since 2021, with the image classifier deployed to officer workstations. The system works as designed — and, more importantly, stays correctable without ever widening who can see the data.

The blindfold holds

The team that built and maintains the classifier has never viewed its training data, and never will. Sensitive images stay with authorised officers behind their perimeter; only model files cross the boundary. The data-access surface didn’t grow by a single person to ship a production model.

Evidence officers can act on

Interpretability lives in the operator workflow, not in a research notebook on our side. When the model misbehaves, the officers who can see the images get a heatmap that explains the call and points at the fix — as it did with the bed correlation. Debugging no longer depends on the one team that isn’t allowed to look.

Why it generalises

The shape of this problem — restricted data, expert-but-non-technical operators, a hard wall between engineers and the ground truth — recurs across regulated domains. Healthcare imaging is the one clients most underestimate, where the clinical team holds the truth and a wrong call is measured in patient outcomes. The patterns travel: make the pipeline operable by the domain expert, and put the interpretability where the people with access actually are.