Non-Tech: “WWW” is short for World Wide Web, and it uses browsers such as google chrome and firefox to access information online. It’s basically a software that allows us to connect to other people around the world.
Internet is the global network of networks of computers. Internet is networks of computers, cables and wireless connections, which governed by Internet Protocol (IP), which deals with data and packets.World Wide Web, also known as the Web, is one set of software running on the Internet. Web is a collection of webpages, files and folders connected through hyperlinks and URLs.Internet is the hardware part, and Web is the software part. Therefore, Web relies the Internet to run, but not vice-versa. In addition to WWW other examples would include VoIP and Mail which have their own protocols and run on the internet.
a network of networks (infrastructure of network)
Many people label www and the internet as the same thing. However the internet is connecting many different computers together, giving people the ability to exchange data within one another, such as the news, pictures, or even videos.
hardware or operator 2. WWW(world wide web) - operating system. The difference between Internet and WWW is that without internet their won't be a WWW. The WWW needs the internet to operate.
Q) Distinguish between the internet and World Wide Web (web) / make clear the difference between 2 or more items / concepts
A student in the United Kingdom is viewing a page from a newspaper’s website based in South Africa. (a) Using this example, distinguish between the internet and the World Wide Web.
(a) Using this example, distinguish between the internet and the World Wide Web. 
Many newspapers now host an internet version through which users can read the various news stories.
(b) Identify two other electronic ways in which newspapers provide information through the use of the technology brought about by the evolving web. 
The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.
The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.
But what does this mean?
It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.
However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.
URIs are a standard for identifying documents using a short string of numbers, letters, and symbols. They are defined by RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax. URLs, URNs, and URCs are all types of URI.
/other/link.html(A relative URL, only useful in the context of another URL)
URLs always start with a protocol (
http) and usually contain information such as the network host name (
example.com) and often a document path (
/foo/mypage.html). URLs may have query parameters and fragment identifiers.
Identifies a resource by a unique and persistent name, but doesn't necessarily tell you how to locate it on the internet. It usually starts with the prefix
urn: For example:
urn:isbn:0451450523to identify a book by its ISBN number.
urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66a globally unique identifier
urn:publishing:book- An XML namespace that identifies the document as a type of book.
URNs can identify ideas and concepts. They are not restricted to identifying documents. When a URN does represent a document, it can be translated into a URL by a "resolver". The document can then be downloaded from the URL.
The uniform resource locator (URL) is used to identify a specific internet resource. An example from an international newspaper is
(c) By using the URL example given above, identify three characteristics of a URL. 
A web page can contain a variety of components. The basics structure of a HTML document is:
This is not visible on the page itself, but contains important information about it in form of metadata.
The title goes inside the head and is usually displayed in the window top of the web browser.
There are various types of meta tags, which can give search engines information about the page, but are also used for other purposes, such as to specify the charset used.
The main part of the page document. This is where all the (visible) content goes in.
Some other typical components:
Usually a collection of links that helps to navigate the website top of page or as hamburger on mobile.
A hyperlink is a reference to another web page.
Might be contained in a sidebar and is used for navigation and orientation within the website.
Area at the top of a web page linking to other big topic areas.
Usually used for a table of contents or navigation bar.
A search engine is a program that allows a user to search for information normally on the web.
The surface web is the part of the web that can be reached by a search engine. For this, pages need to be static and fixed, so that they can be reached through links from other sites on the surface web. They also need to be accessible without special configuration. Examples include Google, Facebook, Youtube, etc.
The deep web is the part of the web that is not searchable by normal search engines. Reasons for this include proprietary content that requires authentication or VPN access, e.g. private social media, emails; commercial content that is protected by paywalls, e.g. online news papers, academic research databases; personal information that is protected, e.g. bank information, health records; dynamic content. Dynamic content is usually a result of some query, where data are fetched from a database.
Enter your text here...
The most known search algorithms are PageRank and the HITS algorithm, but it is important to know that most search engines include various other factors as well, e.g.
For the following description the terms “inlinks” and “outlinks” are used. Inlinks are links that point to the page in question, i.e. if page W has an inlink, there is a page Z containing the URL of page W. Outlinks are links that point to a different page than the one in question, i.e. if page W has an outlink, it is an URL of another page, e.g. page Z.
PageRank works by counting the number and quality of inlinks of a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
As mentioned it is important to note that there are many other factors considered. For instance, the anchor text of a link is often far more important than its PageRank score.
Based on the idea that keywords are not everything that matters; there are sites that might be more relevant even if they don’t contain the most keywords. It introduces the idea of different types of pages, authorities and hubs.
Authorities: A page is called an authority, if it contains valuable information and if it is truly relevant for the search query. It is assumed that such a page has a high number of in-links.
Hubs: These are pages that are relevant for finding authorities. They contain useful links towards them. It is therefore assumed that these pages have a high number of out-links.
The algorithm is based on mathematical graph theory, where a page is represented by a vertex and links between pages are represented by edges (in form of vectors).
A web crawler, also known as a web spider, web robot or simply bot, is a program that browses the web in a methodical and automated manner. For each page it finds, a copy is downloaded and indexed. In this process it extracts all links from the given page and then repeats the same process for all found links. This way, it tries to find as many pages as possible.
Stop Bots using Band With
Save Band width less time on site crawling
Issue: A crawler consumes resources and a page might not wish to be “crawled”. For this reason “robots.txt” files were created, where a page states what should be indexed and what shouldn’t.
Students should be aware that this is not always a transitive relationship.
In the past the meta keyword tag could be spammed full of keywords sometimes not even relevant to the content on the page. This tag is mostly ignored by search engines. The met description can sometimes be show in the results, but is not a factor in actual ranking.
Robotics Tag : This is super important and can be sued to disallow crawlers from crawling the page, you can specify all crawlers or list the ones that you do not wish to be crawled by.
Answer depends on different crawlers, but generally speaking:
A crawler is a program that downloads and stores Web pages, often for a Web search engine. Roughly, a crawler starts off by placing an initial set of URLs, So, in a queue, where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some order), downloads the page, extracts any URLs in the downloaded page, and puts the new URLs in the queue. This process is repeated until the crawler decides to stop. Collected pages are later used for other applications, such as a Web search engine or a Web cache. As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single process. Therefore, many search engines often run multiple processes in parallel to perform the above task, so that download rate is maximized (reference http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8408&rep=rep1&type=pdf )
According to a study released in October 2000, the directly accessible "surface web" consists of about 2.5 billion pages, while the "deep web" (dynamically generated web pages) consists of about 550 billion pages, 95% of which are publicly accessible [LVDSS00].
By comparison, the Google index released in June 2000 contained 560 million full-text-indexed pages [Goo00]. In other words, Google — which, according to a recent measurement [HHMN00], has the greatest coverage of all search engines — covers only about 0.1% of the publicly accessible web, and the other major search engines do even worse.
Increasing the coverage of existing search engines by three orders of magnitude would pose a number of technical challenges, both with respect to their ability to discover, download, and index web pages, as well as their ability to serve queries against an index of that size. (For query engines based on inverted lists, the cost of serving a query is linear to the size of the index.) Therefore, search engines should attempt to download the best pages and include (only) them in their index.
Mercator is an extensible, multithreaded, high-performance web crawler [HN99, Mer00]. It is written in Java and is highly configurable. Its default download strategy is to perform a breadth-first search of the web, with the following three modifications:
Further Reading Click Here
Search engines index websites in order to respond to search queries with relevant information as quick as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in its database. This way search engines can quickly identify pages relevant to a search query.
Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed.
Naturally an overlap exists with what the web site developer should do to get the site high in the serps ( search engine results page)
Relevancy does your site provide the information the user is searching for. The user experience (UX) is becoming a big part as this can not be manipulated and in the future will play a much bigger role. User Experience ( time user stays on site / bounce rate ). Many factors play a role in the user experience. Load Speed, Easy Navigation ( no broken links ), Spelling, quality and factually correct content, Structured layout 5 Use of images/video 5 page design colors, images video infographics and formatting so it is easy to scan the page for relevant information. The idea is to get the user to stay on your site ( sticky ) . If a use lands on your site after do a search and leaves after a few seconds sends or even before the page loads ( slow loding ) the is a very BIG signal to google that they should not have served up that result.
Back Links from other web site the more authoritative the site the better ( example huffington post ), The site that links to your site should also be relevant. Example if your selling dog insurance a site from a respected charitable dog web site would be a very big boost. A link from a site that provides car rental would have little impact as totally irrelevant.
Social media marketing FACE Book etc. Be a leader in your field and comment on relevant authoritative forums or blogs. Others users sharing via social bookmarking sites.
This an area in which you can manipulate the search results. If Google discovers this your web site will be dropped from index. So need to ensure any links are natural linking to an authoritative article info graphic on your web site.
The process of making pages appear more prominently in search engine results is called SEO. There are many different techniques, considered in section C.2.11. This field is a big aspect of web marketing, as search engines do not disclose how exactly they work, making it hard for developers to perfectly optimise pages.
In order to improve the ranking of a web site Google uses many many metrics below is a few of the important ones.
These are only a fraction that google will use, more recently they have given a very slight increase for sites that are HTTPS
The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags key word tags. These could be easily manipulated ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a Page Rank checking algorithm which played a big part of their search algorithm.
Definition: Black hat SEO is a technique, in simple words, to get the top positions or higher rankings in the major search engines like Google, Yahoo and Bing that breaks the rule and regulations of search engine’s guidelines. See example of guidelines for google click here.
This worked at one time, now you still need the key words / search terms in your title and page content you need to ensure that you do not overuse the keywords / phrases as that will trip a search engine filter.
Google ( currently ) favors older sites, sites with history. In this approach you buy an expired domain with good metrics , build it up and add links to your sites giving a boost in ranking. This works, but it is costly to set set up and you need to use alias etc.
Similar to PBN the aim is to get good quality links from high authority sites. Have a look at Fiverr where yo can buy such links. This is difficult for google to detect and it is also very effective.
Rather than creating good quality content use content copied from other sites, the content may be changed using automated techniques. Google is much better at detecting please refer to PANDA Update
The anchor text tells google what your site is about example "fleet insurance" , but if you overuse or your backlinks look unnatural you will be penalized please refer to Penguin. Before Penguin this was very effective in getting ranked
Build a web site on Tumbler for example for the sole purpose of sending links to your money site
The process of writing a blog post for someone else’s blog is called guest blogging
Create an amazing article info graphic that other sites may use, if you include a link to your site in the article you get more back inks as a result ( natural acquisition of back links as opposed to paid )
Search engines evaluate the content of a web page, thus a web page might get higher ranking with more information. This will make it more valuable on the index and other web pages might link to your web page if it has a high standard in content.
Good menu navigation. Proper use of title tags and header tags, adding images with keyword alt tags, interlinking again with keyword anchor text. Create a, sitemap to get site crawled plus inform the spiders how often to visit site.
This a broad term and overlaps some other areas mentioned example page load speed. The purpose to ensure that if a use click to go to your site they stay without clicking back to the serps immediately. Google is happy as this a quality signal as its main purpose to provide the user with relevant results.
Fast loading pages gives the user a good experience aim for under 3 seconds
Provide fresh content on a regular basis.
Google is continually ( as are other search engines ) fighting black hat techniques that web masters employ to rank high in the serps. Investigate these 2 major algorithm updates Panda and Penguin. A good example of a current black hat practice is the use of PBN's.
Students to investigate PBN's Panda and Penguin Quick discussion on these and what Google was targeting and how PBN's are currently being used effectively to rank sites higher ( if caught you will wake up one morning and you web site(s) have been de-indexed from Google.
Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
It’s worth noting that this comparison is not about two opposites. Both concepts do not exclude each other and can complement one another.
An application gets split into the client side and server-side. The server can be a central communicator between clients (e.g. email/chat server) or allow different clients to access and manipulate data in a database. A client-server application does also not necessarily need to be working over the internet, but could be limited to a local network, e.g. for enterprise applications.
Cloud computing still relies on the client-server architecture, but puts the focus on sharing computing resources over the internet. Cloud applications are often offered as a service to individuals and companies - this way companies don’t have to build and maintain their own computing infrastructure in house. Benefits of cloud computing include:
Azure is Microsoft cloud services the other major one is amazon click here and watch the intro video
To include public and private clouds
*** Creates an environment conducive to innovative startups and thus the potential for disruptive innovation
In a private cloud model a company owns the data centers that deliver the services to internal users only.
In a public cloud services are provided by a third party and are usually available to the general public over the Internet.
The idea of a hybrid cloud is to use the best of both private and public clouds by combining both. Sensitive and critical applications run in a private cloud, while the public cloud is used for applications that require high scalability on demand. As TechTarget explains, the goal of a hybrid cloud is to “create a unified, automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still
data lock-in also if wish to change what format will your data be in ? Could be very expensive to convert to a new data format
Company goes bust with all your data
data confidentiality and auditability ( security)
data transfer bottlenecks
bugs in large-scale distributed systems
Students should be aware of developments in mobile technology that have facilitated the growth of distributed networks.
What’s the original message?
Below is an encoded message. It’s not necessarily a secret message but it does need to be decoded. Study the clues and key to reconstruct the original message.
The storage capacity of computers is growing at an unbelievable rate—in the last 25 years, the amount of storage provided on a typical computer has grown about a millionfold—but we still find more to put into our computers. Computers can store whole books or even libraries, and now music and movies too, if only they have the room. Large files are also a problem on the Internet, because they take a long time to download. We also try to make computers smaller—even a cellphone or wristwatch can be expected to store lots of information!
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
Investigate LEMPEL-ZIV CODING , you just need a high level understanding of how it works.
Update your Blog with responses to the following :
State 2 text , 2 graphic and 2 video compression techniques.
Describe one way that a compressed file may be decompressed
Can lossy compression be used on text files ? Explain your answer
Explain how compression of data may lead to negative consequences. 
Also explain the importance of compression now and in the future.
Enter your text here...
Enter your text here...
Past Paper Question