Last week, I posted about the Dark Internet. This post is about something called the “Deep Web.” Unlike the Dark Internet, the servers on the Deep Web are part of the public Internet and can be reached from any connected computer. The Deep Web is a name given to the web pages that cannot be indexed by the popular search engines such as Bing and Google.
The search engines use programs called bots that crawl through web pages indexing key words so those pages can be returned as results in an Internet search. The pages that can be indexed are referred to as the “Surface Web”.
Some data sources on the web create their pages dynamically when a request is made on one of their pages. They have code that can search for requested information in their data base and then create a custom web page as a response. The problem with trying to crawl these pages for key words with search engine bots is that the pages don’t actually exist until you ask for the data.
There are some new programs that search engines are using to try to get at the data in the Deep Web. They can cycle through key words as inputs to the data sources and get the servers to create web pages which they can then index and add to results. This expansion of search engine capability is very important. Some estimates suggest that the Surface Web only contains about 10% of the amount of data in the Deep Web. For details, click here.