The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags key word tags. These could be easily manipulated ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a Page Rank checking algorithm which played a big part of their search algorithm.
- Avoid Indexing Spam sites ( duplicate content copied ) . Detect sites that use Black Hat and remove from index
- Don't process static sites ( that do not change ) / crawl more frequently authoritative/changing ( fresh content ) news sites
- Respect Robots text files
- Determine sites that change on a regular basis and cache these.
- The spider should not overload servers by continually hitting the same site.
- The algorithm must be able to avoid spider traps
- Ignore paid for links ( can be difficult )
- Ignore exact match anchor text if its being used to rank keywords / search terms ( backlink profile should look natural ) o the search engine
- Use comments box to add more or question why these.