Follow GFI:
Find us on Facebook Follow us on Twitter Find us on Linkedin Subscribe to our RSS Feed Find us on YouTube Find us on Google+
 

Search Engines, Friend or Foe?

on October 30, 2009

Security is all about identifying threats and provisioning for them before your enemy exploits that threat. There are so many vectors to take care of that it truly is a daunting task.

Input validation, perimeter control, user education, cryptography, physical security, access control; the list goes on and on. Each of these needs its own special considerations, things like: What validations am I going to put in place on my web application to protect my database backend? What kind of input must I discard to ensure stability? What attacks am I to expect?

There is one item though that is not generally seen in this list of concerns and that is search engines. There is awareness on the subject, there is even a term coined to describe the activity, ‘Google hacking’, but I still don’t really see it being taken seriously and not a lot of people know about it either.

In this article I will focus mainly on Google, because it is the most popular search engine and because it offers advanced functionality that helps people find what they want efficiently. That efficiency can be used by you and me but can also be used by someone who has less than good intentions.

How can search engines such as Google be a threat to security?

Well the answer to that is: In many ways.

Confidential information

The obvious threat is finding restricted information.  Google can search not only web pages but in some cases even text in certain supported Document files. These include:

  • Adobe Portable Document Format (pdf)
  • Adobe PostScript (ps)
  • Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
  • Lotus WordPro (lwp)
  • MacWrite (mw)
  • Microsoft Excel (xls)
  • Microsoft PowerPoint (ppt)
  • Microsoft Word (doc)
  • Microsoft Works (wks, wps, wdb)
  • Microsoft Write (wri)
  • Rich Text Format (rtf)
  • Shockwave Flash (swf)
  • Text (ans, txt)

This means that people can search for text in your files if they are available online using Google.  Malicious people might search for the phrase “social security number” and if there is a file on your site containing that term, it might be presented to that malicious person who can then download it.

Google the Super Computer

While I am sure that Google has lots of processing power in its infrastructure I am not referring to taking advantage of that power directly, but it is possible to use Google as a super computer of sorts. Let’s assume that one has an application which stores passwords as an md5 hash. Lots of applications do that and the reason is quite sound – if a password is somehow stolen it will be no big deal, because it is very computationally expensive to get the password back from an md5 hash right? It would take over a year to try breaking an 8 letter code right? WRONG. Well it would take over a year if you simply try to brute force it, but what if you search Google instead? Yes, that’s right, get an md5 hash and search it on Google. If it’s a dictionary word, chances are that you will get a hit!

I ran some tests and here are the results:

MD5 that was searched for The “password” Number of hits
1e6947ac7fb3a9529a9726eb692c8cc5 Secret 646 Hits
1A19642CD3FE09F72D3859B102298BE3 Obscure 4 Hits
c42c37628d81546b28bad6cd8fe18ad8 Password 385 Hits
ee0b5d34cc83316018c12dd6f027e1c7 AFVX 2 Hits

It should take roughly 300 days to guess “obscure” running a sequential brute force attack using only the English alphabet but including lower and upper case. Yet a simple search on Google returns an answer in less than a second. For a simple system an md5 hash of a password would generally be enough, figuring there is no data that might justify using years of computational power to crack, but with search engines you can just search for the answer in seconds. Even searching for the md5 of a random collection of letters such as AFVX, Google found a match. However the same was not true for longer random letter as well as phrases. Searches for md5s of complex random characters and phrases didn’t return any matches. Still keeping Google in mind and playing it safe, storing an md5 hash of a password is no longer enough, now we need to add a pinch of salt to the mix.

Aiding Malicious Hackers

Another hazard presented by search engines is granting the ability to malicious hackers to either find you or to attack you without any warning. An attack can present itself in one of two possibilities – you can either have been a target of opportunity or this was a targeted attack. In both cases a malicious hacker is able to use Google as one of his tools.

Target of Opportunity

Sometimes a hacker doesn’t have a particular target in mind, instead he has exploits and he wants to use those exploits to gain access to as many machines as possible. The first step is to identify the machines that have the vulnerabilities that he can exploit.  In pre-Google days, this step involved scanning the internet for the service that he intends to exploit and then trying to identify each version from the list this scanning would generate. This task would previously have taken a very long time to complete. Today, however, a simple search can get a list of targets within less than a second. Searching for: intitle:index.of “Apache/1.3.34 Server at” for example returns a huge list (3million+) of domains that are running apache 1.3.34 and also have directly listing enabled in some of their web folders.  Obviously it is not just Apache that can be found this way, its IIS, web applications, scripts and appliances even. Anything with a web front-end really, that might be indexed by Google. For example searching for: inurl:hp/device/this.LCDispatcher will return a number of web front-ends for HP Laserjet printers.

Please realize that if you try these searches they will return real live systems and printers that people and companies are using. Use these queries to test your own sites; be aware that accessing printers that you do not own might open you up to legal action. These examples are only provided to illustrate the point and the dangers, please do not misuse the knowledge.

Targeted attack

In the previous section we saw how Google can help a malicious hacker find a target of opportunity, but how can Google help a hacker who intends to target a specific person?

When a malicious hacker intends to infiltrate a specific target, the first step is to gather intelligence. An attacker needs to know what he can access about his target, he basically needs to catalogue every service, server and appliance that are accessible from his location (the internet). Previously he would have achieved this by scanning his target for open ports and fingerprint said ports which would work and still does today but this leaves behind a footprint. This footprint can be detected by firewalls and log analysers and can alert an administrator of someone scanning his network. This might give the administrator time to prepare and keep a close eye on his network. He could possibly be in a position to even track down the attacker. At best it will leave a trail back to the attacker even if he is unable to find a weakness to exploit and never acts on his intentions. However what if he does his finger printing using Google?

Using a search query like site:[domain] Google will list all the indexed pages on that site. In such results one might find, services running, servers, versions, scripts and even appliances. The attacker can then, without exposing himself, start to devise an attack plan without worrying whether the administrator of the target site is on to him. Additionally should he decide that he has no way he can penetrate his target he can safely give up without any consequences.

In conclusion, of course search engines are very useful in allowing us to find things easily and quickly. Unfortunately it also allows people with malicious intent to find things that they are interested in very easily as well. As such my suggestion is to keep search engines in mind when going through your security tasks, be they during development, system administration, web design or anything else that can be affected by a search engine. Don’t depend on security by obscurity as that obscurity might not be as obscure as you may imagine. Taking simple precautions can help a lot. It is possible to control where on your site Google will index and where it will ignore. There are a lot of resources that webmasters can use at: http://www.google.com/webmasters/ and http://www.google.com/support/webmasters/

It is also important to disable directory listings unless absolutely needed and when needed, it should be protected as well. Appliances such as printers should never be connected to the internet unless absolutely necessary and when they are make sure that they’re secure and cannot be accessed by everyone. Always remember that appliances can be used just as any other machine to get a foothold inside your network.

Finally I am curious as to your views on the subject – are search engines something you worry about? Do you think that search engines are a threat to the security of your system, but that maybe it’s a threat that’s mitigated through your normal routines and doesn’t really require any additional steps? Maybe you wouldn’t really consider them a threat at all?

I personally think that they’re a threat that may generally be overlooked, but perhaps it might be not a huge threat at the end of day since the steps to protect against it are ultimately best practices that should be followed in the first place.

 
Comments
John Mello November 4, 200912:08 am

This is s very enlightening item. Some of the concerns you mention were raised when Google introduced its code search engine (http://www.google.com/codesearch). One of the complaints about the engine, which searches public source code for snippets of code, was that it provided hackers with an efficient tool for finding programming errors that can exploited by malware. For example, a hacker might search for functions such as strcpy and gets knowing that if those functions aren’t programmed properly, they can be exploited to cause buffer overflows which can be leveraged to execute malicious code.

By the way, I tried searching for intitle:index.of “Apache/1.3.34 Server at” in some of the other search engines and turned up some interesting results. Yahoo found 13,600,000 occurances of the search term; Bing, 1,270,000; and Ask, 0.

Emmanuel Carabott November 4, 200910:34 am

Hi John, Thanks a lot for your contribution, you’re most certainly right what you mention, searching for code that could potentially be exploitable if not implemented correctly is definitely an important point that I missed!

Leandro Amore November 20, 200910:15 pm

Jhon, Emannuel;
You are right in addressing search engines as dangerous tools, but internet is a dangerous place. We can’t live without browsers and they really, from my point of view, do more good than evil.
Imagine your everyday browsing without the power of search engines,can we go back to the times in which you had to manually register your site in the search engines or that the only thing that was looked by these crawlers where predefined tags?
This is an old discussion, you even have a book for exploiting this kind of “vulnerabilities” called Google hacks released in 2003 (http://oreilly.com/catalog/9780596004477).
I think the battle between the white and black hats will never end, but it will contribute for making internet a more secure place to be if you play by the rules and keep your stuff safe. This is a big task and perhaps for IT guys like us is not that difficult, but we should make our best to train the normal user to keep their things safe.
Every year we will find new threats, as misused social networks or search engines, but we have to keep our selves trained and put some energy in teaching other how to use these powerful resources without getting into trouble.
Best regards
Leandro

Emmanuel Carabott November 26, 200910:16 am

Hi Leandro, Thanks for posting. I think you misunderstood me or maybe I was a bit unclear cannot rule that out either :) Anyhow what I meant to say in the article was not that we should remove search engines or try to stop their evil rein. I know they are essential in everyday life and they ultimately do more good than bad. The article was intended more as a warning / question really. The warning was that when designing systems / software one should consider search engines in their security strategy.

If we lived in a world without search engines then putting a printer on a direct Internet connection to say, save money by having all peripheral offices print to it instead of sending faxes might be safe enough. Even if the printer has a nasty vulnerability that might allow anyone to have administrative access to it, without search engines a malicious person would need to scan numerous IPs, find those that might be printers, filter them out by finding printers and disregarding false positives then further filter the results to find the vulnerable printers. It would take a lot of time to do that which lowers the risk quite a bit. However if we consider search engines then a simple search taking less than 5 seconds will give the attacker a comprehensive list of vulnerable printers. This makes the risk a lot higher.

My question on the other hand was whether people already consider this threat vector or whether they maybe did consider it but decided the risk was not worth the investment needed to tackle it?

Bottom line, I guess we’re really in agreement here Leandro :) Thanks again for sharing your ideas with us.