Deep Web Introduction
Most of the search engines are only designed to search the surface of the Web and they deliver less than 10% of the available Internet information, according to many search engine analysts. We're not demeaning their contribution to the Net by reporting this; we're attempting to explain how and why the public can be better served in receiving information that is completely hidden from these “surface” web engines.
To obtain information from the Net, search engines employ robots (also known as crawlers and spiders). Robots are designed to search for information in a structured manner and they only retrieve what they are designed to do.
One of these main search engines has convinced almost everyone of their ranking and indexing system, which is based on the premise that popularity of linking is the central key to providing the most useful and relevant search. Under this design, pages do not exist until they are dynamically created. Dynamic content, often referred to as HTML, enables navigation from one page to another.
To be seen by these robots, content must be static and linked to other pages. Many search engines cannot see nor retrieve content from the Web until the pages are created dynamically as a result of a search.
ReleSeek is able to assimilate content from the deep Web into a central database despite the Net's structure and inherent problems of largely serving the surface search engines.
The deep Web where most search engines cannot tread is over one hundred times larger than the surface Web. Many unique repositories of topical information, including 300 major libraries and dozens of scientific databases, account for 70% of the deep Web.
E-commerce is growing steady and now amounts for 10% of the Web's content. The deep Web is 10 times greater than all recorded print publishing—over 7500 terabytes of information. There are over 300,000 Deep Web sites and almost all are freely accessible. The largest 60 sites of the Deep Web are 40 times larger than the surface Web.
IT'S A FACT — SURFACE ENGINES ONLY RETRIEVE A SMALL PERCENTAGE OF THE NET'S INFORMATION.
Deep Web information is fresher because topical and vertical sites keep their sites more current. Surface robots sometimes delay their database entry for months; they also have databases which are built over the years without updating, as can be evidenced in any surface engine results … some information dates into the 1990s!
While search is a technological wonder, you must consider that you only receive what is spidered and placed into the engine's database, so the cliché applies: "Garbage in, garbage out!" The Net is full of outdated and useless information.
Many industry programmers now question page ranking of popularity linking and believe it stifles the flow of relevant information from all sources. It certainly creates the problem of not being able to access the entire Web. There are no divergent results, only the results from popularized networking; somewhat akin to an "Old Boys Club" with insider networks. There have been attempts by the major search engines to develop advanced searches for the topical databases; however, we believe they are inadequate.
Surface engines provide so much duplication because of the popularity system. News media, including newspapers, TV, radio and other reporting media all "mirror" published reports because of this popularity ranking method. All of this needless action stagnates bandwith.
With ReleSEEK, we compile the results from a directed group of specialty and vertical engines and purge duplications and irrelevant information prior to combining results from our directories including RSS Feeds and Blogs. This compilation of results, returns the latest, most pertinent info from the center of the Deep Web. In addition we add pertinent results of the surface web. The results are cleansed of duplication and forwarded to our cluster software for segmentation. Dated information such as news is purged.