Quick Navigation

C.1.1 Distinguish between the Internet and World Wide Web

World Wide Web (www):

Tech: The World Wide Web, also known as “www” is a part of the internet, using web browsers to share information across the globe via hyperlinks.

Non-Tech: “WWW” is short for World Wide Web, and it uses browsers such as google chrome and firefox to access information online. It’s basically a software that allows us to connect to other people around the world.

Internet is the global network of networks of computers. Internet is networks of computers, cables and wireless connections, which governed by Internet Protocol (IP), which deals with data and packets.World Wide Web, also known as the Web, is one set of software running on the Internet. Web is a collection of webpages, files and folders connected through hyperlinks and URLs.Internet is the hardware part, and Web is the software part. Therefore, Web relies the Internet to run, but not vice-versa. In addition to WWW other examples would include VoIP and Mail which have their own protocols and run on the internet.

The Internet

Tech

a network of networks (infrastructure of network) 

Many people label www and the internet as the same thing. However the internet is connecting many different computers together, giving people the ability to exchange data within one another, such as the news, pictures, or even videos.

hardware or operator 2. WWW(world wide web) - operating system. The difference between Internet and WWW is that without internet their won't be a WWW. The WWW needs the internet to operate.

C.1.1 Question Section

Q) Distinguish between the internet and World Wide Web (web) / make clear the difference between 2 or more items / concepts

Past Questions

C.1.2 Describe how the web is constantly evolving

The beginnings of the web (Web 1.0 , Web of content)

The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.

Web 2.0 – “Web of the Users”

However, the web started to evolve into the delivery of more dynamic documents, enabling user interaction or even allowing content contribution. The appearance of blogging platforms as Blogger in 1999 gives a time mark for the birth of the Web 2.0. Continuing the model from before, this would be the evolution to a “read-write” web. This opened new possibilities and lead to new concept as blogs, social networks or video-streaming platforms. Web 2.0 might also be looked at from the perspective of the websites themselves evolving in more dynamic and feature-rich. For instance, improved design, JavaScript and dynamic content loading could be considered Web 2.0 features.

Web 3.0 – “Semantic Web”

The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.

But what does this mean?

It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.

Later developments

However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.

Video Section

he Web Expansion (TrendOne, 2008)

C.1.4 Identify the characteristics of the following: • uniform resource identifier (URI) • URL

URIs are a standard for identifying documents using a short string of numbers, letters, and symbols. They are defined by RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax. URLs, URNs, and URCs are all types of URI.

  • Contains information about how to fetch a resource from its location. For example:
  • http://example.com/mypage.html
  • ftp://example.com/download.zip
  • mailto:user@example.com
  • file:///home/user/file.txt
  • tel:1-888-555-5555
  • http://example.com/resource?foo=bar#fragment
  • /other/link.html (A relative URL, only useful in the context of another URL)
  • URLs always start with a protocol (http) and usually contain information such as the network host name (example.com) and often a document path (/foo/mypage.html). URLs may have query parameters and fragment identifiers.

    Identifies a resource by a unique and persistent name, but doesn't necessarily tell you how to locate it on the internet. It usually starts with the prefix urn: For example:

  • urn:isbn:0451450523 to identify a book by its ISBN number.
  • urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66 a globally unique identifier
  • urn:publishing:book - An XML namespace that identifies the document as a type of book.
  • URNs can identify ideas and concepts. They are not restricted to identifying documents. When a URN does represent a document, it can be translated into a URL by a "resolver". The document can then be downloaded from the URL.

    C.1.4 Question Section

    Question Section Past papers

    C.1.3 HTTP(S), HTML, URL, XML, XSLT, JS & CSS

    HTTPS – Hypertext Transfer Protocol Secure

  • Based on HTTP
  • Adds an additional security layer of SSL or TLS
  • ensures authentication of website by using digital certificates
  • ensures integrity and confidentiality through encryption of communication
  • still possible to track IP address and port number of web server (which is why HTTPS websites are also blocked in China)
  • HTML – Hypertext Mark-up Language

  • semantic markup language
  • standard language for web documents
  • uses elements enclosed by tags to markup a document
  • XML – Extensible Mark-up Language

  • markup language with a set of rules defining how to encode a document
  • human-readable
  • similar to HTML in using tags
  • used for representation of arbitrary data structures
  • XLST – Extensible stylesheet language

  • markup language with a set of rules defining how to encode a document
  • human-readable
  • similar to HTML in using tags
  • used for representation of arbitrary data structures
  • JavaScript

  • interpreted programming language
  • core technology of most websites with HTML and CSS
  • high-level, dynamic and untyped; therefore relatively easy for beginners
  • allows to dynamically manipulate the content of HTML documents
  • CSS – Cascading style sheet

  • style sheet language to describe the presentation of a mark-up document, usually HTML
  • used to create better designed websites
  • intended to separate content in presentation in HTML and CSS
  • it uses selectors to describe particular elements of a document, and gives these properties that define things ranging from font color to page position
  • C.1.8 Outline the different components of a web page.

    A web page can contain a variety of components. The basics structure of a HTML document is:

    head

    This is not visible on the page itself, but contains important information about it in form of metadata.

    title

    The title goes inside the head and is usually displayed in the window top of the web browser.

    meta tags

    There are various types of meta tags, which can give search engines information about the page, but are also used for other purposes, such as to specify the charset used.

    body

    The main part of the page document. This is where all the (visible) content goes in.

    Some other typical components:

    Navigation bar

    Usually a collection of links that helps to navigate the website top of page or as hamburger on mobile.

    Hyperlinks

    A hyperlink is a reference to another web page.

    Table Of Contents

    Might be contained in a sidebar and is used for navigation and orientation within the website.

    Banner

    Area at the top of a web page linking to other big topic areas.

    Sidebar

    Usually used for a table of contents or navigation bar.

    C.2.1 Define the term search engine

    A search engine is a  program that allows a user to search for information normally on the web.

    C.2.2 Distinguish between the surface web and the deep web

    Surface Web

    The surface web is the part of the web that can be reached by a search engine. For this, pages need to be static and fixed, so that they can be reached through links from other sites on the surface web. They also need to be accessible without special configuration. Examples include Google, Facebook, Youtube, etc.

  • Pages that are reachable (and indexed) by a search engine
  • Pages that can be reached through links from other sites in the surface web
  • Pages that do not require special access configurations
  • Deep web

    The deep web is the part of the web that is not searchable by normal search engines. Reasons for this include proprietary content that requires authentication or VPN access, e.g. private social media, emails; commercial content that is protected by paywalls, e.g. online news papers, academic research databases; personal information that is protected, e.g. bank information, health records; dynamic content. Dynamic content is usually a result of some query, where data are fetched from a database.

  • Pages not reachable by search engines
  • Substantially larger than the surface web
  • Common characteristics:
    • Dynamically generated pages, e.g. through queries, JavaScript, AJAX, Flash
    • Password protected pages, e.g. emails, private social media
      • Paywalls, e.g. online news papers, academic research databases
      • personal information, e.g. health records
    • Pages without any incoming links
  • C.2.3 Outline the principles of searching algorithms used by search engines

    The most known search algorithms are PageRank and the HITS algorithm, but it is important to know that most search engines include various other factors as well, e.g.

  • the time that a page has existed
  • the frequency of the search keywords on the page
  • other unknown factors (undisclosed)
  • For the following description the terms “inlinks” and “outlinks” are used. Inlinks are links that point to the page in question, i.e. if page W has an inlink, there is a page Z containing the URL of page W. Outlinks are links that point to a different page than the one in question, i.e. if page W has an outlink, it is an URL of another page, e.g. page Z.

    PageRank algorithm

    PageRank works by counting the number and quality of inlinks of a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

    As mentioned it is important to note that there are many other factors considered. For instance, the anchor text of a link is often far more important than its PageRank score.

    • Pages are given a score (rank)
    • Rank determines the order in which pages appear
    • Incoming links add value to a page
    • The importance of an inlink depends on the PageRank (score) of the linking page/Page Authotrity
    • PageRank counts links per page and determines which page are most important
    • Links from site that are relevant carry more weight than links from non related sites.  

    HITS algorithm

    Based on the idea that keywords are not everything that matters; there are sites that might be more relevant even if they don’t contain the most keywords. It introduces the idea of different types of pages, authorities and hubs.

    Authorities: A page is called an authority, if it contains valuable information and if it is truly relevant for the search query. It is assumed that such a page has a high number of in-links.

    Hubs: These are pages that are relevant for finding authorities. They contain useful links towards them. It is therefore assumed that these pages have a high number of out-links.

    The algorithm is based on mathematical graph theory, where a page is represented by a vertex and links between pages are represented by edges (in form of vectors).


    Attempts to computationally determine hubs and authorities on a particular topic through analysis of a relevant subgraph of the web. Based on mutually recursive facts: Hubs point to lots of authorities. Authorities are pointed to by lots of hubs.

    C.2.4  Describe how a web-crawler functions

    A web crawler, also known as a web spider, web robot or simply bot, is a program that browses the web in a methodical and automated manner. For each page it finds, a copy is downloaded and indexed. In this process it extracts all links from the given page and then repeats the same process for all found links. This way, it tries to find as many pages as possible.

    Limitations:

    • They might look at meta data contained in the head of web pages, but this depends on the crawler
    • A crawler might not be able to read pages with dynamic content as they are very simple programs

    Robots.txt

    Stop Bots using Band With

    Save Band width less time on site crawling

    Issue: A crawler consumes resources and a page might not wish to be “crawled”. For this reason “robots.txt” files were created, where a page states what should be indexed and what shouldn’t.

    • A file that contains components to specify pages on a website that must not be crawled by search engine bots
    • File is placed in root directory of the site
    • The standard for robots.txt is called “Robots Exclusion Protocol”
    • Can be specific to a special web crawler, or apply to all crawlers
    • Not all bots follow this standard (malicious bots, malware) -> “illegal” bots can ignore robots.txt
    • Still considered to be better to include a robots.txt instead of leaving it out
    • It keeps the bots from less “noteworthy” content of a website
      more time spend on indexing important/relevant content of the website

    C.2.5 Discuss the relationship between data in a meta tag and how it is accessed by a web-crawler

    Students should be aware that this is not always a transitive relationship.

  • Meta Keywords Attribute - A series of keywords you deem relevant to the page in question.
  • Title Tag - This is the text you'll see at the top of your browser. Search engines view this text as the "title" of your page.
  • Meta Description Attribute - A brief description of the page.
  • Meta Robots Attribute - An indication to search engine crawlers (robots or "bots") as to what they should do with the page.
  • In the past the meta keyword tag could be spammed full of keywords sometimes not even relevant to the content on the page. This tag is mostly ignored by search engines. The met description can sometimes be show in the results, but is not  a factor in actual ranking.

    Robotics Tag

    Robotics Tag : This is super important and can be sued to disallow crawlers from crawling the page, you can specify all crawlers or list the ones that you do not wish to be crawled by.

    Answer depends on different crawlers, but generally speaking:

    • The title tag, not strictly a meta-tag, is what is shown in the results, through the indexer
    • The description meta-tag provides the indexer with a short description of the page and this can also be displayed in the SERPS
    • The keywords meta-tag provides…well keywords about your page

    C.2.6 Discuss the use of parallel web-crawling

  • Size of the web grows, increasing the time it would take to download pages
  • To make this reasonable “it becomes imperative to parallelize the crawling process (Stanford)
  • Advantages

  • Scalability: as the web grows a single process can not handle everything Multithreaded processing can solve the problem
  • Network load dispersion: as the web is geographically dispersed, dispersing crawlers disperses the network load
  • Network load reduction ( scalability, efficiency and throughput )
  • Issues of parallel web crawling

  • Overlapping: parallel web crawlers might index the same page multiple times
  • Quality: If a crawler wants to download ‘important’ pages first, this might not work in a parallel process
  • Communication bandwidth: parallel crawlers need to communicate for the former reasons, which for many processes might take significant communication bandwidth . Why search engines take the quality approach click here
  • If parallel crawlers request the same page frequently over a short time it will overload servers
  • Discuss the use of parallel web crawling

    A crawler is a program that downloads and stores Web pages, often for a Web search engine. Roughly, a crawler starts off by placing an initial set of URLs, So, in a queue, where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some order), downloads the page, extracts any URLs in the downloaded page, and puts the new URLs in the queue. This process is repeated until the crawler decides to stop. Collected pages are later used for other applications, such as a Web search engine or a Web cache. As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single process. Therefore, many search engines often run multiple processes in parallel to perform the above task, so that download rate is maximized (reference http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8408&rep=rep1&type=pdf )

    Why search engines take the quality approach ( dated )

    According to a study released in October 2000, the directly accessible "surface web" consists of about 2.5 billion pages, while the "deep web" (dynamically generated web pages) consists of about 550 billion pages, 95% of which are publicly accessible [LVDSS00].

    By comparison, the Google index released in June 2000 contained 560 million full-text-indexed pages [Goo00]. In other words, Google — which, according to a recent measurement [HHMN00], has the greatest coverage of all search engines — covers only about 0.1% of the publicly accessible web, and the other major search engines do even worse.

    Increasing the coverage of existing search engines by three orders of magnitude would pose a number of technical challenges, both with respect to their ability to discover, download, and index web pages, as well as their ability to serve queries against an index of that size. (For query engines based on inverted lists, the cost of serving a query is linear to the size of the index.) Therefore, search engines should attempt to download the best pages and include (only) them in their index.

    Mercator is an extensible, multithreaded, high-performance web crawler [HN99Mer00]. It is written in Java and is highly configurable. Its default download strategy is to perform a breadth-first search of the web, with the following three modifications:

  • It downloads multiple pages (typically 500) in parallel. This modification allows us to download about 10 million pages a day; without it, we would download well under 100,000 pages per day.
  • Only a single HTTP connection is opened to any given web server at any given time. This modification is necessary due to the prevalence of relative URLs on the web (about 80% of the links on an average web page refer to the same host), which leads to a high degree of host locality in the crawler's download queue. If we were to download many pages from the same host in parallel, we would overload or even crash that web server.
  • If it took t seconds to download a document from a given web server, then Mercator will wait for 10t seconds before contacting that web server again. This modification is not strictly necessary, but it further eases the load our crawler places on individual servers on the web. We found that this policy reduces the rate of complaints we receive while crawling.
  • Further Reading Click Here

    C.2.7 Outline the purpose of web-indexing in search engines

    Search engines index websites in order to respond to search queries with relevant information as quick as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in its database. This way search engines can quickly identify pages relevant to a search query.

    Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed.

    C.2.8 Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search engines


    Naturally an overlap exists with what the web site developer should do to get the site high in the serps ( search engine results page)

    On Page

    Relevancy does your site provide the information the user is searching for. The user experience (UX) is becoming a big part as this can not be manipulated and in the future will play a much bigger role.  User Experience ( time user stays on site / bounce rate ). Many factors play a role in the user experience.  Load Speed, Easy Navigation ( no broken links ),  Spelling, quality and factually correct content, Structured layout 5 Use of images/video 5 page design colors, images video infographics and  formatting so it is easy to scan the page for relevant information. The idea is to get the user to stay on your site ( sticky ) . If a use lands on your site after do a search and leaves after a few seconds sends or even before the page loads ( slow loding ) the is a very BIG  signal to google that they should not have served up that result

    Off Page

    Back Links from other web site the more authoritative the site the better ( example huffington post ), The site that links to your site should also be relevant. Example if your selling dog insurance a site from a respected charitable dog web site would be a very big boost. A link from a site that provides car rental would have little impact as totally irrelevant.

    Social media marketing FACE Book etc. Be a leader in your field and comment on relevant authoritative forums or blogs. Others users sharing via social bookmarking sites.

    This an area in which you can manipulate the search results. If Google discovers this your web site will be dropped from index. So need to ensure any links are natural linking to an authoritative article  info graphic on your web site.

    C.2.9 Describe the different metrics used by search engines.


    The process of making pages appear more prominently in search engine results is called SEO. There are many different techniques, considered in section C.2.11. This field is a big aspect of web marketing, as search engines do not disclose how exactly they work, making it hard for developers to perfectly optimise pages.

    In order to improve the ranking of a web site Google uses many many metrics below is a few of the important ones.

    Top Metrics

    On Page

    • Make sure your site can be crawled and thus indexed avoid flash and provide a sitemap and good web site architecture
    • The Title Create a title tag with your key phrase near or at the beginning. The title should be crafted to get the user to click on your web site when displayed in the search results. The title must reflect the content of yourr site
    • Content will always be important it must be high quality and any information must be factual at least  1000 words for home  page
    • Freshness of content 
    • Mobile Friendly
    • Page load speed under 3 seconds
    • If link broker browser will return a HTTP response code 404. This should be detected by web designer and  provide a help page with user navigation.
    • Text Formatting (use of h1,h2,bold etc)
    • HTTPS 
    • Do Keyword Research to find what users actually search for and build pages for these terms

    These are only a fraction that google will use, more recently they have given a very slight increase for sites that are HTTPS 

    C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.


    The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags  key word tags.  These could be easily manipulated  ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a  Page Rank checking algorithm which played a big part of their search algorithm. 

    • Avoid Indexing Spam sites ( duplicate content copied ) . Detect sites that use Black Hat and remove from index
    • Don't process static sites ( that do not change ) / crawl  more frequently authoritative/changing ( fresh content ) news  sites 
    • Respect Robots text files 
    • Determine sites that change on a regular basis and cache these. 
    • The spider should not overload servers by continually hitting the same site. 
    • The algorithm must be able to avoid spider traps
    • Ignore paid for links ( can be difficult )
    • Ignore exact match anchor text if  its being used to rank keywords /  search terms ( backlink profile should look natural ) o the search engine
    • Use comments box to add more or question why these.

    C.2.11 Discuss the use of white hat and black hat search engine optimization.


    BLACK HAT

    Definition: Black hat SEO is a technique, in simple words, to get the top positions or higher rankings in the major search engines like Google, Yahoo and Bing that breaks the rule and regulations of search engine’s guidelines. See example of guidelines for google click here.

    Keyword stuffing

    This worked at one time, now you still need the key words / search terms in your title and page content you need to ensure that you do not overuse the keywords / phrases as that will trip a search engine filter.  

    PBN

    Google ( currently ) favors older sites, sites with history. In this approach you buy an expired domain with good metrics , build it up and add links to your sites giving a boost in ranking.  This works, but it is costly to set set up and you need to use alias etc.

    Paid For Links

    Similar to PBN the aim is to get good quality links from high authority sites. Have a look at Fiverr where yo can buy such links. This is difficult for google to detect and it is also very effective.  

    Syndicated / Copied Content

    Rather than creating good quality content use content copied from other sites, the content may be  changed using automated techniques.   Google is much better at detecting please refer to PANDA Update

    Over Use of Key Words in Anchor Text

    The anchor text tells google what your site is about example "fleet insurance" , but if you overuse or your backlinks look unnatural you will be penalized please refer to Penguin. Before Penguin this was very effective in getting ranked 

    Web 2.0 Links

    Build a web site on Tumbler for example for the sole purpose of sending links to your money site

    WHITE HAT

    Guest Blogging

    The process of writing a blog post for someone else’s blog is called guest blogging

    Link Baiting

    Create an amazing article info graphic that other sites may use, if you include a link to your site in the article you get more back inks as a result ( natural acquisition of back links as opposed to paid )

    Quality Content

    Search engines evaluate the content of a web page, thus a web page might get higher ranking with more information. This will make it more valuable on the index and other web pages might link to your web page if it has a high standard in content.

    Site optimization Design 

    Good menu navigation. Proper use of title tags and header tags, adding images with keyword alt tags, interlinking again with keyword anchor text. Create a, sitemap to get site crawled plus inform the spiders how often to visit site. 

    A good User Experience (UX)

    This a broad term and overlaps some other areas mentioned example page load speed. The purpose to ensure that if a use click to go to your site they stay without clicking back to the serps  immediately. Google is happy as this a quality signal as its main purpose to provide the user with relevant results.    

    Page Site Load Speed

    Fast loading pages gives the user a good experience aim for under 3 seconds

    Freshness

    Provide fresh content on a regular basis.

    Google is continually ( as are other search engines ) fighting black hat techniques that web masters employ to rank high in the serps. Investigate these 2 major algorithm updates Panda and Penguin. A good example of a current black hat practice is the use of PBN's.

    Students to investigate PBN's Panda and Penguin Quick discussion on these and what Google was targeting and how PBN's are currently being used effectively to rank sites higher ( if caught you will wake up one morning and you web site(s) have been de-indexed from Google.

    C.2.12 future challenges to search engines as the web continues to grow

    Search engines must be fast enough to crawl the exploding volume of new Web pages in order to provide the most up-to-date information. As the number of pages on the Web grows, so will the number of results search engines return. So, it will be increasingly important for search engines to present results in a way that makes it quick and easy for users to find exactly the information they’re looking for. Search engines have to overcome both these challenges.

    • Improvements in Search interface example Voice Search
    • Use of natural language process will also become more prevalent. Today, the search engine takes a set of keywords as the input and returns a list of rank-sorted links as the output. This will slowly fade and the new search framework will have questions as the input and answers as the output. The nascent form of this new framework is already available in search engines like Google and Bing.
    • Check Circle
      semantic searching by machine learning see Rank Brain. RankBrain is designed to help better interpret those queries and effectively translate them, behind the scenes in a way, to find the best pages for the searcher
    • Check Circle
      Personalized Search Because mobile is becoming the primary form of consumption, future search engines will try to use powerful sensing technologies like accelerometer, digital compass, gyroscope and GPS. Google recently bought a company called Behavio which predicts what a user might do next by using the information acquired from the different sensors on the user’s phone.

    C.4.1 Discuss how the web has supported new methods of online interaction such as social networking.

    keywords & Phrases  Web 1 and Web 2.0 

    Web 1.0

    Web 2.0

    Semantic Web

    ubiquitous

    Berners-Lee

    open protocols HTML HTTP

    decentralization

    ubiquitous

    Read Only

    Write Only

    Hyperlinks

    Web of linked documents

    decentralization

    successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.

    keywords & Phrases  Semantic Web

    The aim of the Semantic Web is to shift the emphasis of associative linking from documents to data

    Abundantly available information can be placed in new contexts and reused in unanticipated ways. This is the dynamic that enabled the WWW to spread, as the value of Web documents was seen to be greater in information rich contexts (O’Hara & Hall, 2009).

    WEB of Data

    Relational databases

    Excel Spead sheets

    WEB 3.0

    ubiquitous

    open protocols HTML HTTP

    decentralization

    ubiquitous

    Governments are making data available seehttps://data.gov.uk/

    datasets

    Democracy rules: open and free

    URL / URI

    decentralization

    successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.

    Read Only

    Write Only

    Hyperlinks

    Web of linked documents

    Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.

    Google Maps is it free ? 

    If your business is missing can you add it 

    Can this information be monetized ? How 

    Should Google have a monopoly on location information?

    Are there Alternatives to Google Maps?

    Watch Video below and then create an account on open street map and add a building your condo/house Wells school etc. Read this post and describe in your own word how Open Street Maps is Different from Google Maps. Post your response to google classroom

    The beginnings of the web (Web 1.0 , Web of content)

    The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.

    Web 2.0 – “Web of the Users”

    However, the web started to evolve into the delivery of more dynamic documents, enabling user interaction or even allowing content contribution. The appearance of blogging platforms as Blogger in 1999 gives a time mark for the birth of the Web 2.0. Continuing the model from before, this would be the evolution to a “read-write” web. This opened new possibilities and lead to new concept as blogs, social networks or video-streaming platforms. Web 2.0 might also be looked at from the perspective of the websites themselves evolving in more dynamic and feature-rich. For instance, improved design, JavaScript and dynamic content loading could be considered Web 2.0 features.

    Web 3.0 – “Semantic Web”

    The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.

    But what does this mean?

    It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.

    Later developments

    However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.

    C.4.2 Describe how cloud computing is different from a client-server architecture

    Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.

    It’s worth noting that this comparison is not about two opposites. Both concepts do not exclude each other and can complement one another.

    Client-server architecture

    An application gets split into the client side and server-side. The server can be a central communicator between clients (e.g. email/chat server) or allow different clients to access and manipulate data in a database. A client-server application does also not necessarily need to be working over the internet, but could be limited to a local network, e.g. for enterprise applications.

    Cloud computing


    Lesson Plan Click Here

    Cloud computing still relies on the client-server architecture, but puts the focus on sharing computing resources over the internet. Cloud applications are often offered as a service to individuals and companies - this way companies don’t have to build and maintain their own computing infrastructure in house. Benefits of cloud computing include:

    • Pay per use: elasticity allows the user to only pay for the resources that they actually use.
    • Elasticity: cloud applications can scale up or down depending on current demands. This allows a better use of resources and reduces the need for companies to make large investments in a local infrastructure.Wiki Quote "the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible
    • Self-provisioning: allows the user to set up applications in the cloud without the intervention of the cloud provider
    • Company has options to use any of these SaaS, IaaS or PaaS 
    • Using these services offers many advantages over the server client model   : Can you think of some?

    Azure is Microsoft cloud services the other major one is amazon click here and watch the intro video 

    C.4.3 Discuss the effects of the use of cloud computing for specified organizations

    To include public and private clouds

    ***  Creates an environment conducive to innovative startups and thus the potential for disruptive innovation

    Private cloud

    In a private cloud model a company owns the data centers that deliver the services to internal users only.

  • Scalability
  • Self-provisioning
  • Direct control
  • Changing computer resources on demand
  • Limited access through firewalls improves security
  • Can you think of any disadvantages?

  • Same high costs for maintenance, staffing, management
  • Additional costs for cloud software
  • Public cloud

    In a public cloud services are provided by a third party and are usually available to the general public over the Internet.

    Advantages

  • Easy and inexpensive because the provider covers hardware, application and bandwidth costs
  • Scalability to meet needs
  • No wasted resources
  • Costs calculated by resource consumption only
  • Disadvantages

  • No control over sensitive data
  • Security risks
  • Hybrid cloud

    The idea of a hybrid cloud is to use the best of both private and public clouds by combining both. Sensitive and critical applications run in a private cloud, while the public cloud is used for applications that require high scalability on demand. As TechTarget explains, the goal of a hybrid cloud is to “create a unified, automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still 

    Summary of obstacles/Concerns

    service availability

    data lock-in also if wish to change what format will your data be in ? Could be very expensive to convert to a new data format

    Company goes bust with all your data

    data confidentiality and auditability ( security)

    data transfer bottlenecks

    performance unpredictability

    Data Conversions

    bugs in large-scale distributed systems

    C.4.5 Describe the interrelationship between privacy, identification and authentication

    Privacy

    Identification

    Authentication

    C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing

    C.3.2 Compare the major features of: • mobile computing • ubiquitous computing • peer-2-peer network • grid computing

    C.3.3 Distinguish between interoperability and open standards.

    C.3.4 Describe the range of hardware used by distributed networks.

    Students should be aware of developments in mobile technology that have facilitated the growth of distributed networks.

    Compression & Decompression  Week 2

    C.3.6 Distinguish between lossless and lossy compression.

    C.3.7 Evaluate the use of decompression software in the transfer of information.

    Compression 1  Objectives Understand Compression techniques and the need for compression


    Compression Definition  :  Reduce the size of data * the number of bits used to store data, most services charge on number of bits you transmit / will reduce bandwith use

    Benefits :  Reduce storage needed and associated costs  UX less latency speed use less bandwith:

    Possible downside ?

    What can we compress?

    How?

    Types of Compression ?

    Compression 2  Hoffman Example  Objective to apply a compression method Intro to Binary Trees


    Text Compression Definition  :  Reduce the size of data * the number of bits used to store data

    Compress Hello World to 33bits

    Better Than 33 Bits How?

    How can we compress text ?

    Hoffman

    Why we need compression techniques

    The storage capacity of computers is growing at an unbelievable rate—in the last 25 years, the amount of storage provided on a typical computer has grown about a millionfold—but we still find more to put into our computers. Computers can store whole books or even libraries, and now music and movies too, if only they have the room. Large files are also a problem on the Internet, because they take a long time to download. We also try to make computers smaller—even a cellphone or wristwatch can be expected to store lots of information!

    Text Compression Huffman

    Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
    The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream.
    Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.

    Question Section

    Investigate LEMPEL-ZIV CODING , you just need a high level understanding of how it works.

    Update your Blog with responses to the following :

    State 2 text , 2 graphic and 2 video compression techniques.

    Describe one way that a compressed file may be decompressed 

    Can lossy compression be used on text files ? Explain your answer

    Explain how compression of data may lead to negative consequences. [3]

    Also explain the importance of compression now and in the future.

    HL C.5 Analyzing the web

    Reference for this section  please click here  please note the definition of a tube is conflicting a more simple / general definition can be found in this paper click here  

    Enter your text here...

    Enter your text here...

    Past Paper Question

    C.4.7 Explain why the web may be creating unregulated monopolies

    In theory the world wide web should be a free place where anybody can have a website. However, hosting a website usually comes with a cost - registering a domain name, getting a hosting service or investing in servers oneself, creating and maintaining the website (requires technical knowledge or the cost of hiring a web developer). In addition, to reach an audience further marketing through SEO (see C.2) is usually necessary to get good rankings in search engine results. This means that for the normal individual a traditional website is not the best option. A better alternative is to publish content on an existing platform, e.g. micro blogging on Twitter, blogging on WordPress or Blogspot, sharing social updates on Facebook, sharing photos on Flickr, etc. . This comes with improved comfort for users.

    However, it easily leads to unregulated monopolies in the market because users usually stick to one platform. Tim Berners-Lee describes today’s social networks as centralized silos, which hold all user information in one place. This can be a problem, as such monopolies usually control a large quantity of personal information which could be misused commercially or stolen by hackers. There are certainly many more concerns which won’t fit into the scope of this site.

    C.4.8 Decentralized and democratic web

    A Decentralized Web is free of corporate or government overlords. It is to communication what local farming is to food. With it people can grow their own information

    Eric Newton Innovation Chief/Cronkite News, Arizone State University

    Resources


    https://barefootcas.org.uk/wp-content/uploads/2015/02/KS2-Search-Results-Selection-Activity-Barefoot-Computing.pdf

    https://www.hpe.com/us/en/insights/articles/how-search-worked-before-google-1703.html

    Further Reading 

    http://www.ftsm.ukm.my/ss/Book/EVOLUTION%20OF%20WWW.pdf

    https://eprints.soton.ac.uk/272374/1/evolvingwebfinal.pdf

    http://dig.csail.mit.edu/2007/Papers/AIMagazine/fractal-paper.pdf