C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.
The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags key word tags. These could be easily manipulated ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a Page Rank checking algorithm which played a big part of their search algorithm.
- Avoid Indexing Spam sites ( duplicate content copied ) . Detect sites that use Black Hat and remove from index
- Don't process static sites ( that do not change ) / crawl more frequently authoritative/changing ( fresh content ) news sites
- Respect Robots text files
- Determine sites that change on a regular basis and cache these.
- The spider should not overload servers by continually hitting the same site.
- The algorithm must be able to avoid spider traps
- Ignore paid for links ( can be difficult )
- Ignore exact match anchor text if its being used to rank keywords / search terms ( backlink profile should look natural ) o the search engine
- Use comments box to add more or question why these.
What are some of the major metrics used?