Working on this: https://github.com/KevinXuxuxu/anon_proxy, a sort of anonymization proxy to use with LLM providers. It does model (OpenAI privacy filter) + regex PII detection, and replaces them back-and-forth for API requests and responses. With locally hosted detection model, no PII leaves your local environment. I find it very useful especially when you're working on sensitive documents (legal, tax, immigration etc.), hope you find it helpful as well :)
I'm no where near as smart as OpenAI of course, but I did build https://tools.nicklothian.com/webner/index.html that uses a BERT based named-entity-recognition model running in your browser to do a subset of PII redaction.
It works pretty well for the use cases I was playing with.
The OpenAI model is small enough that I might enhance my tool to use it.
I just used it on a document, but the amount of false positives this generates make it faily difficult to use?
I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.
Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".
Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.
It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.
There's some interesting technical details in this release:
> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
> The released model has 1.5B total parameters with 50M active parameters.
> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.
Exciting! I took a look through the code and found what appear to be the entity types for future releases - this release (V2 config) supports 8 entity types, but the V4 and V7 taxonomies have >20, mostly more personal ID types. Given this is a preview release, I imagine they'll release these.
This is where stochastic approaches start to feel a bit uncomfortable.
Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.
I built a community tool for exactly this, based on privacy first principals but around the what. It’s workflow based and not “put your sensitive data into ChatGPT and hope it captures the right stuff”. Mostly built for security folks but anyone can use it
For my customers I use regexes to block them from potentially publishing personal emails/phone numbers to their websites but I really wouldn't mind running this in addition just for the extra peace of mind. I don't have a GPU on our server, but I hope this is light enough of a model to handle CPU only inference on less than 2k tokens at a time.
It's going to be stochastic in some sense whether you want it to be or not, human error never reaches zero percent. I would bet you a penny you'd get better results doing one two-second automated pass + your usual PII redaction than your PII redaction alone.
The advantage of computers was that they didn't make human errors; they did things repeatedly, quickly, and predictably. If I'm going to accept human error, I'd like it to come from a human.
I think the problem is most secrets arn't stochastic; they're determinant. When the user types in the wrong password, it should be blocked. Using a probabilistic model suggests an attacker only now needs to be really close, but not correct.
Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.
Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?
I dunno what use case you're thinking this is for.
The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.
Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.
> Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.
Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.
As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...
For the confused: this link must have gotten revived or something, I posted this comment a few days ago. Looks like it's getting the accolades I claim it deserves now.
It was put into second-chance pool by moderators. I originally submitted this link a few days ago and today got this (semi?)automated email from HN, an excerpt below:
The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.
This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.
50M effective parameters is impressively light. Is there a similarly light model on the prompt injection side? Most of the mainstream ones seem heavier
SuperagentLM made available on-edge PPI redaction models already a few years ago in sizes 20B, 3B, 200M. They still seem to be available via their legacy API - well worth checking out to compare against this one.
https://docs.superagent.sh/legacy/llms/superagent-lm-redact-...
It works pretty well for the use cases I was playing with.
The OpenAI model is small enough that I might enhance my tool to use it.
I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.
Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".
Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.
It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.
> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.
> The released model has 1.5B total parameters with 50M active parameters.
> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.
1. Pass the raw text through the filter to obtain the spans.
2. Map all the spans back to the original text.
Now you have all the PII information.
https://github.com/chiefautism/privacy-parser
Details in my review article here: https://piieraser.ai/blog/openai-privacy-filter. Disclaimer: I also build PII detection systems.
Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.
Check it out: https://redact.cabreza.com
Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.
Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?
The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.
Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.
Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.
As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...
For anything touching security or privacy, even small inconsistencies can quickly erode trust.
Bringing back the Open to OpenAI..