1. AN INTRODUCTION TO SPAM
Admittedly, spam hardly needs an introduction. Even the most nontechnical person on the street will know exactly what you're referring to when you mention spam. If you have an e-mail address, the odds are massively in the spammers' favor that they've found you and unpleasantly surprised you with offers of drugs, women, and buried fortunes. The most basic definition of spam is unsolicited e-mail -- any e-mail sent to you that you did not request. This is deliberately a very broad definition, especially because spam is now a legal term!
A huge percentage of spam is badly written junk that most people would never consider opening. The usual reason for the poor quality of the e-mail text is because the spammers rely on specially composed e-mails to evade the first defense of antispam mechanisms. If the e-mails do make it through, they're usually in a difficult to read format, meaning that most spammers rely on either greed or human error to get their e-mails opened and read.
The mechanisms used by spammers to evade antispam systems are covered later in this course.
As you've probably experienced, spammers send huge amounts of identical e-mails to as many e-mail addresses as they can. The reason for this is economies of scale. A spammer can usually expect less than 0.5 percent of the e-mails they send to be opened, regardless of whether they're read or acted on. For reasons covered later in this lesson, if an e-mail is simply opened, it's considered a positive response. Therefore, the more they send, the higher the amount of positive responses the spammers can generate. Here's an example using Bob, our fictional spammer who includes syndicated advertisements in his e-mails:
- Bob receives $0.01 for every syndicated advert viewed.
- Bob can expect 0.5 percent positive responses from the spam he sends.
- Bob sends out one million e-mails.
- He will receive 5,000 positive responses.
- He earns $50 for his efforts.
That amount of money for all that trouble doesn't seem like much, but it is. The problem for the rest of us is that it's hardly any work at all for Bob to spam one million e-mail addresses, and it's very easy to keep doing it. All Bob needs to do is generate ten or so e-mail shots a month, and he's earning himself up to $1,000. And that's before greed gets the better of any of the recipients and they sign up for whatever he's selling. It's a good scheme for Bob, especially because his only expense is the Internet connection from which to send the spam.
Spam is such a growth industry that several companies have started to provide bulk e-mailing services. The ethic and morality of these companies is so ambiguous they aren't given free publicity by being named in this course; but they do exist and, unfortunately, seem to do quite well.
Normally, spam isn't quite as cut-and-dried as this. Some spam is explicitly malicious, such as e-mail borne viruses or malware. Some spam is annoying but harmless, such as the e-mails sent out by Amazon.com to let you know about their offers. Finally, some spam is malicious not for technical reasons but because of social engineering, a topic covered later in this course.
2. WHAT'S THE SPAM ISSUE?
Basic spam, the type that simply offers you products and services from a usually reputable retailer, is little more than annoying. Although larger companies (and those with common sense) have realized that sending spam can quickly alienate potential customers, a significant proportion of companies continue to do so. A fair number of the culprits are those advertising free services, such as free online dating, free classified advertisement listings, and free small business directory entries. Because these companies are little more than legal shells, they have little to lose and lots to gain by spamming their wares.
Spam has implications far beyond being annoying. The managed e-mail service company Brightmail released a study in January, 2004 that showed more than 60 percent of Internet traffic was spam e-mail. That's an astonishing volume; spam on its own accounted for more data than all of the Web browsing, FTP file downloading, P2P file sharing, and all other Internet services combined. This huge volume of data clogs the available bandwidth on which the Internet runs, making everything else run much slower. The Internet as a whole would be noticeably faster if spam was eradicated right now.
Spam Costs You
Spam also costs you money, as the recipient. It lands in your e-mail inbox, and you have to spend time on your broadband, dial-up, or GPRS (General Packet Radio Service) connection to download it. All of these cost time, bandwidth, and money.
Networking systems are quickly moving to wireless technologies, allowing you to access the Internet from PDAs (personal digital assistants) and mobile phones. Wireless Internet access systems all have one thing in common; they are very, very expensive to use compared to fixed line costs. Wireless e-mail (as well as regular e-mail) is an extremely useful technology, and it would be a severe setback if the cost of using it increased beyond reasonable levels because of spam.
HTML E-Mail
One of the cleverest ideas someone had to make e-mail user friendly was to allow HTML code in the e-mail body. By adding functionality to e-mail clients to render HTML pages, flat text e-mails were suddenly brought alive with fonts, styles, formatting, and images. This was great for the majority of e-mail users, who suddenly found it was much more convenient and understandable to write their message in six-inch bright red, bold Arial type. Like most new ideas there was a downside, and the downside to HTML e-mail is significant. To understand why, it's time to take a detour into how programmers write software for Microsoft Windows.
When Microsoft developed Windows 95, it created a system of reusable software components called COM objects. With COM objects, instead of every developer having to code their own engines to perform simple tasks, they could simply reuse Microsoft's COM objects (and their own). One of the available COM objects is the Microsoft Internet Explorer HTML rendering engine -- the piece of code that Internet Explorer uses to display Web pages.
Because all of the code to download and display Web pages was already contained in the Internet Explorer COM objects, nearly every software developer simply reused those objects when they needed that functionality. It's no surprise then that Microsoft reused its own objects when it added HTML e-mail support to Microsoft Outlook. And here's where the problem begins.
As you learned in Lessons 1 and 2, Internet Explorer suffers from a lot of security flaws. To be more specific, the underlying code in the COM objects used to create Internet Explorer's functionality has a lot of security flaws. And because those same COM objects are reused in other applications, such as Outlook, those applications suffer from the same flaws, too. This results in some nasty possibilities malicious spammers can take advantage of.
3. PHISHING
Most concerning is the increasing trend of malicious spam. Customers of almost every major international bank have been targeted by phishing scams of varying sophistication. Phishing is the term given to a specific type of spam that attempts to fool people into supplying confidential information -- the spammers are effectively "fishing" for whatever information they can find.
Essentially, phishing is a type of social engineering -- an attacker attempts to gain the trust of a victim and fool them into taking certain actions.
The normal format of a phishing scam is an e-mail that looks official, is apparently sent from the bank itself, and that requests you to input your personal details (often including credit card and PIN number) for the purposes of account confirmation. These e-mails generally use genuine logos and text styles, often taken directly from the bank's Web site, to try to fool the recipient. Figure 3-1 shows a phishing e-mail as displayed by Outlook:
Ignoring for the moment that no bank would send an e-mail like this, the body of the message looks legitimate. There is a Nationwide logo, which is a major U.K. bank. The From e-mail header says that message was sent from a nationwide.co.uk e-mail address and the message text asks you to visit a page on the Nationwide Web site. This is all very legitimate and believable, at first glance.
Under no circumstances should you attempt to access the Web site or follow any directions shown on this lesson page. This phishing investigation is provided purely as an example for learning purposes.
However, all is not as it seems. Because this e-mail has images and text styles apparently included in it, it must be an HTML e-mail. Viewing the HTML source behind it, shown in Figure 3-2, starts alarm bells ringing:
Figure 3-2 shows some HTML code split into blocks for easy reference. You may not be familiar with some of the code, so let's go over it step by step:
- The first code block has the <a href> tag, followed by the correct Nationwide Web site address. When you move your mouse over the HTML message in Outlook, this code ensures that the Nationwide Web site appears as the tool-tip. If it weren't for the rest of the code in the e-mail, you would be taken to the Nationwide site.
- The next block begins with the <map> tag. This effectively places an invisible mask over the entire message so that when you click anywhere in the e-mail, you're immediately taken to the Web site specified in the href section. As you can see, although the Web address starts with http://, strange numbers and percentage signs follow it. You'll come back to this shortly, but it's this piece of code that overrides the Nationwide hyperlink.
- The penultimate block begins with the <img src> tag. This is the HTML command to load an image from a remote server, and here's where the scam becomes clear. What you're actually seeing in the e-mail shown in Figure 3-1 is not a text e-mail with a hyperlink; it's a bitmap image mock-up of an e-mail that, no matter where you click, takes you to a malicious Web site.
- The final block of seemingly random text, highlighted in yellow, is an antispam evasion mechanism. Later in this course Bayesian filtering will be covered, which explains the purpose of this line of text.
From the analysis of this phishing e-mail, you can now see that no matter where you click in the e-mail, you're taken to a malicious Web site. The Web site in question is defined by the string beginning http://%32%30%33 in Figure 3-2, but this doesn't look like any normal URL; it has actually been obfuscated by hex encoding. To understand why this works, a small detour into computer and operating system architecture is needed.
4. OBFUSCATION BY HEX ENCODING
Hex, short for Hexadecimal, is a base 16 numbering system. In your everyday life, you're used to a base 10 numbering system, beginning at 0 and ending with 9. You're probably aware of binary, which is a base 2 numbering system that only has two values: 0 and 1. Hexadecimal has 16 individual values, going from 0 to 9, and then from A to F. Mathematical operations work exactly the same in hex as they do in decimal; they just use a different numbering system. For example, using decimal, you know that 9 + 1 equals 10. Using hex, 9 + 1 equals A (if you're wondering, F + 1 equals 10). Here's a quick table to help you understand:
| Dec | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| Hex | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| Dec | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| Hex | A | B | C | D | E | F | 10 | 11 | 12 | 13 |
Table 3-1: Decimal and hexadecimal values
Don't worry if you're not completely sure of how this works; it's only important that you understand computers can do their math in hex. The next piece of the puzzle is ASCII (American Standard Code for Information Exchange) character codes. Every textual letter, number, and symbol stored in a file or displayed on screen is an ASCII character; even the text you're reading right now is comprised of ASCII characters. Each ASCII character has an ASCII code, for example the letter "A" has a decimal ASCII code of 65, the letter "B" has a code of 66, and so on.
Input Options
Because of the way Windows works, programs such as Outlook and Internet Explorer can accept input as either plain text or hexadecimal ASCII codes. This means it's perfectly possible to access a Web site by taking the URL, converting each character to its decimal ASCII code, converting that decimal code into a hexadecimal value, and then supplying the result to Internet Explorer! This example might require a small leap of faith, but it does work. Let's do the conversion with www.cnet.com:
- The first character of the URL is w.
- The decimal ASCII code of w is 119.
- 119 converted to hexadecimal is 77.
- The URL begins with www, so the first three hex codes are 77, 77, 77.
- The next character is a full stop.
- The decimal ASCII code of . is 46.
- 46 converted to hexadecimal is 2E.
- Our hex codes are now 77, 77, 77, 2E.
The process is repeated for all the characters in the URL until the entire string is converted. The final step is to replace all the commas with percent signs, and the conversion is complete. The URL www.cnet.com becomes:
%77%77%77%2E%63%6E%65%74%2E%63%6F%6D
If you type this string into Internet Explorer's address bar, you're taken to the CNET Web site, as shown in Figure 3-3. Look closely at the URL in the address bar:
Returning to the URL in the phishing e-mail, the point of all this conversion and messing about is nothing other than to confuse the reader and obfuscate its true purpose. Obfuscating URLs in this manner is a common trick used by malware and spyware programmers, too -- you'll often see this type of text in the output of HijackThis or Spybot if you're unlucky enough to be infected.
The URL in the example phishing e-mail is valid and reachable; however, it isn't included in this lesson. If you do decide to decode it, you're very strongly warned not to visit the Web site under any circumstances. It's a live, malicious Web site.
5. AN INTRODUCTION TO SPAM MECHANISMS
If you've had an e-mail address for any length of time, you no doubt noticed a gradual increase in the amount of spam you receive (if not, you're one of the lucky ones!). And if you were unlucky enough to accidentally open spam e-mail, you may have noticed it was quickly followed by more spam. It's no coincidence -- spammers use clever tracking mechanisms to monitor whether their e-mail is deleted, opened, or even forwarded to another person.
The most obvious and basic tracking mechanism is a read receipt. A read receipt is a flag in the e-mail header that tells your e-mail client to return the status of the e-mail to its sender. Through this system, the spammer can obtain basic information about whether you read or deleted the e-mail. Every popular e-mail client (such as Outlook or The Bat) has an option to deny read receipts for public e-mail received from the Internet. If you use a corporate e-mail system, such as Microsoft Exchange, the system administrator usually has the ability to force your e-mail client to return read receipts so beware!
Web Bugs
A more sophisticated tracking system is achieved through web bugs. In Lessons 1 and 2, you learned how adware systems are used to track your Web browsing and application usage habits. The same principle applies to e-mail. Using HTML e-mail, a spammer can include a reference to a script on its server that's executed every time the e-mail is opened and the content is loaded. To achieve this, a single pixel, transparent GIF image is included in the e-mail. This GIF is invisible to the reader, but essential to the tracking system. When your e-mail client loads the image, the tracking script on the spammer's server is executed, completing the vicious cycle. This is the reason you should never open an e-mail you have good reason to believe is spam.
If you forward an e-mail with a Web bug in it, the spammer will know exactly who you forwarded it to and retrieve personal information on them. E-mail client security vulnerabilities are an absolute goldmine for spammers, the chief culprit being IFRAMEs, which are discussed in the following section.
6. IFRAMES AND INTERNET EXPLORER
As you've learned, spammers use the facilities in HTML e-mails to further their goals. Because almost every application that displays HTML pages uses the Internet Explorer COM objects, they're vulnerable to most of the same security flaws as Internet Explorer is itself. There have been many serious security vulnerabilities that Microsoft has patched, and many that still remain unpatched and exploitable. Chief among them is the IFRAME issue.
IFRAME, an abbreviation for Inline Frame, is a very simple way Web developers can include a Web page from a different location inside their own page. An in-depth discussion of IFRAMEs is beyond the scope of this lesson; however, the CNET Microsoft patches IE, Windows article contains some excellent further reading. Due to a security flaw within the Internet Explorer IFRAME system, a malicious HTML e-mail (or Web site) can bypass the built-in security. Under normal circumstances, the Security Zone system within Internet Explorer will prevent a remote server from gaining local privileges on your computer. Using a flaw in the IFRAME system, a remote server can perform any action on your computer by accessing the Local Zone with you doing nothing more than opening the e-mail or viewing the Web page.
Although Microsoft has released security patches for the IFRAME issue, further flaws still exist in this and other Internet Explorer features. Spammers can obtain even more information about you by exploiting these flaws through specially constructed HTML e-mails.
The moral of the story is, always make sure all your applications are up to date with security patches. Through the magic of COM objects, a security flaw in one product can easily affect others.
Self-Perpetuating Process
Unfortunately spam is something that will never stop. Companies and individual spammers make huge amounts of money by selling valid e-mail addresses to one another. Once a spammer gets a positive response, they can sell that e-mail address to other spammers as part of a validated e-mail list. Once your address is on one of these lists, it's impossible to remove it. As you may already know, clicking a link in the e-mail that purports to remove you from the spammer's list is a surefire way to receive even more spam.
Moving On
This lesson introduced you to the world of spam, and showed you that the issues are not quite as simple as everyone thinks. You saw how spammers track exactly what you do with their e-mail through the use of Web bugs and software vulnerabilities. You also saw an in-depth analysis of a common phishing e-mail, with all its tricks and hidden traps.
Be sure you understand the topics covered in this lesson, especially the more technical aspects of how spammers obfuscate their intentions: the assignment and quiz will give you some practical experience. The Message Board is always available for you to discuss the issues raised in this lesson and any other spam or spyware questions you have. Your fellow students and instructor are ready and waiting!
In Lesson 4, you'll see practical examples of how to manage and defeat the spam you already get, and avoid getting even more in the future. You'll also see how the spammers attempt to evade antispam systems to keep their spam flowing. Finally, the systems commercial organizations use to defend against spam will be touched on to give you a taste of how the people who get millions of spam e-mails every day deal with the problem.
