Rank your votes!
At Grupthink, you can vote for as many answers as you like, and rank them under "rank your votes." Your rankings will have an important effect on the overall results!
More...
Don't remind me
#5 Highly customizable spam detection
(No description was provided.)
6 votes
About this answer
Members who voted on this answer also voted on:
Please login or register to see notification options.
Comments |
Leave a comment
Why would you want to waste your life customizing spam detection?
If I happen to be a Viagra fan, I may want to adjust things to make sure I receive my favorite newsletters.
If I speak English only, I may want to auto-discard any messages in different character sets.
Multi-level spam filtering is another idea --maybe we want to auto-delete messages we are dead-sure are spam, versus items that are suspect and could use some manual review.
Spam is in the eye of the beholder of course -- there's no sure-fire way for a computer to eliminate everything you consider spam and preserve everything you consider "ham".
I'll concede that one man's spam might be another couple's happiness, if you'll concede that how they obtain their supply of Viagra is a nuisance to the rest of us and that most spam (source Spamhaus) comes from English speaking countries with the USA in 1st place.
To keep ahead of spammers the detection process should be hidden from them, that leaves triage filtering as you describe here, which is mostly based on white / black listing, which is not the same as customising spam detection as you are pre-selecting what you chose to pass for detection.
How about, when moving an email in the inbox to the spam folder the filter 'learns' something? And vice-versa: If you have to move an email from junk to the inbox, then the filter 'learns' something. (Granted, if spam is in the inbox and good mail is in a junk folder, the filter could use some improvement.)
Ooh!
What you describe is a machine learning trigger for a "training" process.
Humans come with the pre-programmed ability to automatically extract contextual triggers, machines do not, however, features can be extracted and used to map a pattern in an n-dimensional matrice against a given classification to provide a level of detection comparable to humans, BUT such a system MUST be built by humans who understand the implications of feature extraction and the many other pitfalls. So, years of human learning first.
Additionally, for such a system to be any good it requires vast numbers of vetted positive and negative examples, that is to say in this case, significantly more spam than spammer's supply to each individual and significantly more non-spam mail than each individual reads. This is called training material and there is a correlation between the amount, it's accuracy and the theoretical achievable accuracy of the system. Don't underestimate the difficulty nor the time consuming monotony involved.
Moreover, unlearning is not a simple matter of removing a false example, as each example will have caused a permanent affect to one degree or another on the entire system and it's weighted probabilities. If I recall my own work correctly, the order in which examples are presented can make a difference to the effectiveness of the system. Vast examples is how many permutations?
The inputs are many and varied, but here the outputs are either spam or not-spam. Attempts to customize an already customized and trained detection process would render it useless and result in lots of spam.
I'm happy to filter through spamcop and I've just taken a look at their many user options and none give control over the actual detection process.
Right, no spam detector (currently) can pass a Turing test, i.e. demonstrate human-like intelligence when it comes to distinguishing between spam and good mail. What I had in mind is the "This is spam" button in webmail clients like Yahoo or Gmail that moves spam in the inbox to the spam folder and (I hope) adds the sender to the blacklist. Also, the complementary "This is not spam" button in the spam folder that moves mail to the inbox and (I hope) whitelists the sender.
Some email clients (like thunderbird) have a feature like you describe. You can mark items in your inbox as spam and it moves the item and "supposedly" learns from your selection patterns.
You can also move items out of the junk folder & into the inbox.