C.1 Creating the web
C.1.1 Distinguish between the Internet and World Wide Web
World Wide Web (www):
Tech: The World Wide Web, also known as “www” is a part of the internet, using web browsers to share information across the globe via hyperlinks.
Internet is a network of networks. World Wide Web is a system, a means of accessing information on the Internet.
WWW makes use of hypertext (HTTP) to access this information.
Internet is the global network of networks of computers. Internet is networks of computers, cables and wireless connections, which governed by Internet Protocol (IP), which deals with data and packets.World Wide Web, also known as the Web, is one set of software running on the Internet. Web is a collection of web pages, files and folders connected through hyperlinks and URLs.Internet is the hardware part, and Web is the software part. Therefore, Web relies the Internet to run, but not vice-versa. In addition to WWW other examples would include VoIP and Mail which have their own protocols and run on the internet.
C.1.2 Describe how the web is constantly evolving
C.1.3 Identify the characteristics of the following:
Identify the characteristics of the following HTTP HTTPS:
Identify the characteristics of the following HTML URL XML XLST
Identify the characteristics of the following JavaScript , CSS
C.1.4 Identify the characteristics of the following URI URL:
uniform resource identifier (URI)
URL.
The power of a link in the Web is that it can point to any document (or, more generally, resource) of any kind in the universe of information. This requires a global space of identifiers. These Universal Resource Identifiers are the primary element of Web architecture.
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. Basically it is a term for an identifier of a resource. It identifies a ‘thing’, but does not give any information about where to find it.
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. Basically it is a term for an identifier of a resource. It identifies a ‘thing’, but does not give any information about where to find it.
A URI can be further classified as a locator, a name, or both. The term “Uniform Resource Locator” (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network “location”)
URL Is the global address of documents and pages/resources on the world wide web
Difference: URI only identifies the network resource, URL helps locate that resource as well and also defines the mechanism as to how to retrieve the resource over the web
Note: network resources are files that can be plain Web pages, other text documents, graphics, or programs.
The first part of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The second part is called resource name and it specifies the ip address/domain name of where the resource is located on the world wide web.
C.1.5 Describe the purpose of a URL
From above, URL is the global address of documents and pages/resources on the world wide web. The first part of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The second part is called resource name and it specifies the ip address/domain name of where the resource is located on the world wide web.
The purpose of URL is to tell the server which webpage to display or to search for.
The URL contains the name of the protocol to be used to access a file resource
A URL (Uniform Resource Locator), as the name suggests, provides a way to locate a resource on the web
C.1.6 Describe how a domain name server functions
Describe how a domain name server functions
C.1.7 Identify the characteristics of: IP, TCP and FTP
Identify the characteristics of: IP, TCP and FTP
C.1.8 Outline the different components of a web page.
Outline the different components of a web page.
C.1.9 Explain the importance of protocols and standards on the web.
C.1.9 Explain the importance of protocols and standards on the web.
C.1.10 Describe the different types of web page
Describe the different types of web page
C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.12 Explain the functions of a browser
Explain the functions of a browser
C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages the functions of a browser
Client-side Environment
The client-side environment used to run scripts is usually a browser. The processing takes place on the end users computer. The source code is transferred from the web server to the users computer over the internet and run directly in the browser.
The scripting language needs to be enabled on the client computer. Sometimes if a user is conscious of security risks they may switch the scripting facility off. When this is the case a message usually pops up to alert the user when script is attempting to run.
Server-side Environment
The server-side environment that runs a scripting language is a web server. A user's request is fulfilled by running a script directly on the web server to generate dynamic HTML pages. This HTML is then sent to the client browser. It is usually used to provide interactive web sites that interface to databases or other data stores on the server.
This is different from client-side scripting where scripts are run by the viewing web browser, usually in JavaScript. The primary advantage to server-side scripting is the ability to highly customize the response based on the user's requirements, access rights, or queries into data stores.
C.1.14 Describe how web pages can be connected to underlying data sources
HTML are markup languages, basically they are set of tags like <html>, <body>, which is used to present a website using css, and javascript as a whole. All these, happen in the clients system or the user you will be browsing the website.
Now, Connecting to a database, happens on whole another level. It happens on server, which is where the website is hosted.
So, in order to connect to the database and perform various data related actions, you have to use server-side scripts, like php, jsp, asp.net etc.
Now, lets see a snippet of connection using MYSQL Extension of PHP
$db= mysqli_connect('hostname','username','password','databasename');
This single line code, is enough to get you started, you can mix such code, combined with HTML tags to create a HTML page, which is show data based pages. For example:
<?php
$db= mysqli_connect('hostname','username','password','databasename');
?>
<html>
<body>
<?php
$query="SELECT * FROM `mytable`;";
$result= mysqli_query($db,$query);
while($row= mysqli_fetch_assoc($result)){
// Display your datas on the page
}
?>
</body>
</html>
C.1.15 Describe the function of the common gateway interface (CGI)
CGI is a method used to exchange data between the server and the web browser. CGI is a set of standards where a program or script can send data back to the web server where it can be processed.
Common Gateway Interface (CGI) offers a standard protocol for web servers to execute programs that execute like Console applications (also called Command-line interface programs) running on a server that generates web pages dynamically. Such programs are known as CGI scripts or simply as CGIs. The specifics of how the script is executed by the server are determined by the server. In the common case, a CGI script executes at the time a request is made and generates HTML.
CGI is the part of the Web server that can communicate with other programs running on the server. With CGI, the Web server can call up a program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied using HTML form syntax). The program then processes that data and the server passes the program's response back to the Web browser.
C.1.16 Evaluate the structure of different types of web pages (examples seen in past paper include blogs, forums, etc.)
Past paper Questions
Describe how the web is constantly evolving
C.2 Searching the Web
C.2.1 Define the term search engine
A search engine is a program that allows a user to search for information normally on the web.
A search engine is accessed through a browser on the user’s computer.
The list of contents/results returned to the user is known as search engine results page (SERP)
C.2.2 Distinguish between the surface web and the deep web
Surface Web
The surface web is the part of the web that can be reached by a search engine. For this, pages need to be static and fixed, so that they can be reached through links from other sites on the surface web. They also need to be accessible without special configuration. Examples include Google, Facebook, Youtube, etc.
Deep web
The deep web is the part of the web that is not searchable by normal search engines. Reasons for this include proprietary content that requires authentication or VPN access, e.g. private social media, emails; commercial content that is protected by paywalls( paid subscription) , e.g. online news papers, academic research databases; personal information that is protected, e.g. bank information, health records; dynamic content. Dynamic content is usually a result of some query, where data are fetched from a database.
- Dynamically generated pages, e.g. through queries, JavaScript, AJAX, Flash
- Password protected pages, e.g. emails, private social media
- Paywalls, e.g. online news papers, academic research databases
- personal information, e.g. health records
- Pages without any incoming links
C.2.3 Outline the principles of searching algorithms used by search engines
The most known search algorithms are PageRank and the HITS algorithm, but it is important to know that most search engines include various other factors as well, e.g.
For the following description the terms “inlinks” and “outlinks” are used. Inlinks are links that point to the page in question, i.e. if page W has an inlink, there is a page Z containing the URL of page W. Outlinks are links that point to a different page than the one in question, i.e. if page W has an outlink, it is an URL of another page, e.g. page Z.
PageRank algorithm
PageRank works by counting the number and quality of inlinks of a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
As mentioned it is important to note that there are many other factors considered. For instance, the anchor text of a link is often far more important than its PageRank score.
- Pages are given a score (rank)
- Rank determines the order in which pages appear
- Incoming links add value to a page
- The importance of an inlink depends on the PageRank (score) of the linking page/Page Authotrity
- PageRank counts links per page and determines which page are most important
- Links from site that are relevant carry more weight than links from non related sites.
HITS algorithm
Based on the idea that keywords are not everything that matters; there are sites that might be more relevant even if they don’t contain the most keywords. It introduces the idea of different types of pages, authorities and hubs.
Authorities: A page is called an authority, if it contains valuable information and if it is truly relevant for the search query. It is assumed that such a page has a high number of in-links.
Hubs: These are pages that are relevant for finding authorities. They contain useful links towards them. It is therefore assumed that these pages have a high number of out-links.
The algorithm is based on mathematical graph theory, where a page is represented by a vertex and links between pages are represented by edges (in form of vectors).
C.2.4 Describe how a web-crawler functions
A web crawler, also known as a web spider, web robot or simply bot, is a program that browses the web in a methodical and automated manner. For each page it finds, a copy is downloaded and indexed. In this process it extracts all links from the given page and then repeats the same process for all found links. This way, it tries to find as many pages as possible.
Limitations:
- They might look at meta data contained in the head of web pages, but this depends on the crawler
- A crawler might not be able to read pages with dynamic content as they are very simple programs
Robots.txt
Stop Bots using Band With
Save Band width less time on site crawling
Issue: A crawler consumes resources and a page might not wish to be “crawled”. For this reason “robots.txt” files were created, where a page states what should be indexed and what shouldn’t.
- A file that contains components to specify pages on a website that must not be crawled by search engine bots
- File is placed in root directory of the site
- The standard for robots.txt is called “Robots Exclusion Protocol”
- Can be specific to a special web crawler, or apply to all crawlers
- Not all bots follow this standard (malicious bots, malware) -> “illegal” bots can ignore robots.txt
- Still considered to be better to include a robots.txt instead of leaving it out
- It keeps the bots from less “noteworthy” content of a website
more time spend on indexing important/relevant content of the website
C.2.5 Discuss the relationship between data in a meta tag and how it is accessed by a web-crawler
Students should be aware that this is not always a transitive relationship.
In the past the meta keyword tag could be spammed full of keywords sometimes not even relevant to the content on the page. This tag is mostly ignored by search engines. The met description can sometimes be show in the results, but is not a factor in actual ranking.
Robotics Tag
Robotics Tag : This is super important and can be sued to disallow crawlers from crawling the page, you can specify all crawlers or list the ones that you do not wish to be crawled by.
Answer depends on different crawlers, but generally speaking:
- The title tag, not strictly a meta-tag, is what is shown in the results, through the indexer
- The description meta-tag provides the indexer with a short description of the page and this can also be displayed in the SERPS
- The keywords meta-tag provides…well keywords about your page
C.2.6 Discuss the use of parallel web-crawling
Advantages
Issues of parallel web crawling
Discuss the use of parallel web crawling
A crawler is a program that downloads and stores Web pages, often for a Web search engine. Roughly, a crawler starts off by placing an initial set of URLs, So, in a queue, where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some order), downloads the page, extracts any URLs in the downloaded page, and puts the new URLs in the queue. This process is repeated until the crawler decides to stop. Collected pages are later used for other applications, such as a Web search engine or a Web cache. As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single process. Therefore, many search engines often run multiple processes in parallel to perform the above task, so that download rate is maximized (reference http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8408&rep=rep1&type=pdf )
Why search engines take the quality approach ( dated )
According to a study released in October 2000, the directly accessible "surface web" consists of about 2.5 billion pages, while the "deep web" (dynamically generated web pages) consists of about 550 billion pages, 95% of which are publicly accessible [LVDSS00].
By comparison, the Google index released in June 2000 contained 560 million full-text-indexed pages [Goo00]. In other words, Google — which, according to a recent measurement [HHMN00], has the greatest coverage of all search engines — covers only about 0.1% of the publicly accessible web, and the other major search engines do even worse.
Increasing the coverage of existing search engines by three orders of magnitude would pose a number of technical challenges, both with respect to their ability to discover, download, and index web pages, as well as their ability to serve queries against an index of that size. (For query engines based on inverted lists, the cost of serving a query is linear to the size of the index.) Therefore, search engines should attempt to download the best pages and include (only) them in their index.
Mercator is an extensible, multithreaded, high-performance web crawler [HN99, Mer00]. It is written in Java and is highly configurable. Its default download strategy is to perform a breadth-first search of the web, with the following three modifications:
Further Reading Click Here
C.2.7 Outline the purpose of web-indexing in search engines
Search engines index websites in order to respond to search queries with relevant information as quick as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in its database. This way search engines can quickly identify pages relevant to a search query.
Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed.
C.2.8 Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search engines
Naturally an overlap exists with what the web site developer should do to get the site high in the serps ( search engine results page)
On Page
Relevancy does your site provide the information the user is searching for. The user experience (UX) is becoming a big part as this can not be manipulated and in the future will play a much bigger role. User Experience ( time user stays on site / bounce rate ). Many factors play a role in the user experience. Load Speed, Easy Navigation ( no broken links ), Spelling, quality and factually correct content, Structured layout 5 Use of images/video 5 page design colors, images video infographics and formatting so it is easy to scan the page for relevant information. The idea is to get the user to stay on your site ( sticky ) . If a use lands on your site after do a search and leaves after a few seconds sends or even before the page loads ( slow loding ) the is a very BIG signal to google that they should not have served up that result.
Off Page
Back Links from other web site the more authoritative the site the better ( example huffington post ), The site that links to your site should also be relevant. Example if your selling dog insurance a site from a respected charitable dog web site would be a very big boost. A link from a site that provides car rental would have little impact as totally irrelevant.
Social media marketing FACE Book etc. Be a leader in your field and comment on relevant authoritative forums or blogs. Others users sharing via social bookmarking sites.
This an area in which you can manipulate the search results. If Google discovers this your web site will be dropped from index. So need to ensure any links are natural linking to an authoritative article info graphic on your web site.
C.2.9 Describe the different metrics used by search engines.
The process of making pages appear more prominently in search engine results is called SEO. There are many different techniques, considered in section C.2.11. This field is a big aspect of web marketing, as search engines do not disclose how exactly they work, making it hard for developers to perfectly optimise pages.
In order to improve the ranking of a web site Google uses many many metrics below is a few of the important ones.
Top Metrics
On Page
- Make sure your site can be crawled and thus indexed avoid flash and provide a sitemap and good web site architecture
- The Title Create a title tag with your key phrase near or at the beginning. The title should be crafted to get the user to click on your web site when displayed in the search results. The title must reflect the content of yourr site
- Content will always be important it must be high quality and any information must be factual at least 1000 words for home page
- Freshness of content
- Mobile Friendly
- Page load speed under 3 seconds
- If link broker browser will return a HTTP response code 404. This should be detected by web designer and provide a help page with user navigation.
- Text Formatting (use of h1,h2,bold etc)
- HTTPS
- Do Keyword Research to find what users actually search for and build pages for these terms
These are only a fraction that google will use, more recently they have given a very slight increase for sites that are HTTPS
C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.
The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags key word tags. These could be easily manipulated ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a Page Rank checking algorithm which played a big part of their search algorithm.
- Avoid Indexing Spam sites ( duplicate content copied ) . Detect sites that use Black Hat and remove from index
- Don't process static sites ( that do not change ) / crawl more frequently authoritative/changing ( fresh content ) news sites
- Respect Robots text files
- Determine sites that change on a regular basis and cache these.
- The spider should not overload servers by continually hitting the same site.
- The algorithm must be able to avoid spider traps
- Ignore paid for links ( can be difficult )
- Ignore exact match anchor text if its being used to rank keywords / search terms ( backlink profile should look natural ) o the search engine
- Use comments box to add more or question why these.
C.2.11 Discuss the use of white hat and black hat search engine optimization.
BLACK HAT
Definition: Black hat SEO is a technique, in simple words, to get the top positions or higher rankings in the major search engines like Google, Yahoo and Bing that breaks the rule and regulations of search engine’s guidelines. See example of guidelines for google click here.
Keyword stuffing
This worked at one time, now you still need the key words / search terms in your title and page content you need to ensure that you do not overuse the keywords / phrases as that will trip a search engine filter.
PBN
Google ( currently ) favors older sites, sites with history. In this approach you buy an expired domain with good metrics , build it up and add links to your sites giving a boost in ranking. This works, but it is costly to set set up and you need to use alias etc.
Paid For Links
Similar to PBN the aim is to get good quality links from high authority sites. Have a look at Fiverr where yo can buy such links. This is difficult for google to detect and it is also very effective.
Syndicated / Copied Content
Rather than creating good quality content use content copied from other sites, the content may be changed using automated techniques. Google is much better at detecting please refer to PANDA Update
Over Use of Key Words in Anchor Text
The anchor text tells google what your site is about example "fleet insurance" , but if you overuse or your backlinks look unnatural you will be penalized please refer to Penguin. Before Penguin this was very effective in getting ranked
Web 2.0 Links
Build a web site on Tumbler for example for the sole purpose of sending links to your money site
WHITE HAT
Guest Blogging
The process of writing a blog post for someone else’s blog is called guest blogging
Link Baiting
Create an amazing article info graphic that other sites may use, if you include a link to your site in the article you get more back inks as a result ( natural acquisition of back links as opposed to paid )
Quality Content
Search engines evaluate the content of a web page, thus a web page might get higher ranking with more information. This will make it more valuable on the index and other web pages might link to your web page if it has a high standard in content.
Site optimization Design
Good menu navigation. Proper use of title tags and header tags, adding images with keyword alt tags, interlinking again with keyword anchor text. Create a, sitemap to get site crawled plus inform the spiders how often to visit site.
A good User Experience (UX)
This a broad term and overlaps some other areas mentioned example page load speed. The purpose to ensure that if a use click to go to your site they stay without clicking back to the serps immediately. Google is happy as this a quality signal as its main purpose to provide the user with relevant results.
Page Site Load Speed
Fast loading pages gives the user a good experience aim for under 3 seconds
Freshness
Provide fresh content on a regular basis.
Google is continually ( as are other search engines ) fighting black hat techniques that web masters employ to rank high in the serps. Investigate these 2 major algorithm updates Panda and Penguin. A good example of a current black hat practice is the use of PBN's.
Students to investigate PBN's Panda and Penguin Quick discussion on these and what Google was targeting and how PBN's are currently being used effectively to rank sites higher ( if caught you will wake up one morning and you web site(s) have been de-indexed from Google.
C.2.12 future challenges to search engines as the web continues to grow
Search engines must be fast enough to crawl the exploding volume of new Web pages in order to provide the most up-to-date information. As the number of pages on the Web grows, so will the number of results search engines return. So, it will be increasingly important for search engines to present results in a way that makes it quick and easy for users to find exactly the information they’re looking for. Search engines have to overcome both these challenges.
C.3 Distributed approaches to the web
C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing
What is Grid Computing?
the grid that will enable the public to exploit data storage and computer power over the Internet analogous to the electric power utility (a ubiquitous commodity).
What are The Key Resources that we can share on a grid network of Computers ?
Normally, a computer can only operate within the limitations of its own resources. There's an upper limit to how fast it can complete an operation or how much information it can store. Most computers are upgradeable, which means it's possible to add more power or capacity to a single computer, but that's still just an incremental increase in performance.
Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it's as if the user's computer has transformed into a supercomputer.
Grid Computing Lexicon
Interoperability: The ability for software to operate within completely different environments. For example, a computer network might include both PCs and Macintosh computers. Without interoperable software, these computers wouldn't be able to work together because of their different operating systems and architecture.
Open standards: A technique of creating publically available standards. Unlike proprietary standards, which can belong exclusively to a single entity, anyone can adopt and use an open standard. Applications based on the same open standards are easier to integrate than ones built on different proprietary standards.
At least one computer, usually a server, which handles all the administrative duties for the system. Many people refer to this kind of computer as a control node. Other application and Web servers (both physical and virtual) provide specific services to the system.text here...
Cluster: A group of networked computers sharing the same set of resources.
What is ubiquitous Computing?
What is ubiquitous Computing?
C.3.2 Compare the major features of: • mobile computing • ubiquitous computing • peer-2-peer network • grid computing
Mobile computing
Mobile Computing is a technology that allows transmission of data, voice and video via a computer or any other wireless enabled device without having to be connected to a fixed physical link. The main concept involves,
- Mobile communication
- Mobile hardware: portable laptops, smartphones, tablet Pc’s, Personal Digital Assistants
- Mobile software
Characteristics
Portability: The Ability to move a device within a learning environment or to different environments with ease.
Social Interactivity: The ability to share data and collaboration between users.
Context Sensitivity: The ability to gather and respond to real or simulated data unique to a current location, environment, or time.
Connectivity: The ability to be digitally connected for the purpose of communication of data in any environment.
Individual: The ability to use the technology to provide scaffolding on difficult activities and lesson customization for individual learners.
Advantages
- Increase in productivity- as they would be used out in the field of various companies, as it would reduce the time and cost for the client.
- Entertainment- Mobile devices can be used for the entertainment purposes, for personal and even presentations to people and clients.
- Cloud computing- Saving documents on online server and being able to access them anytime and anywhere when you have a connection to the internet.
- Portability- not restricted to one location in order for you to get jobs done or even access email on the go.
Disadvantages
- Quality connectivity- mobile devices will need to either WIFI connectivity or mobile network. such as GPRS, 3G
- Security concerns- Mobile VPNs are unsafe to connect to, and also syncing devices might also lead to security concerns. accessing a WiFi network can also be risky because WPA and WEP security can be bypassed easily.
- Power consumption, due to the use of the batteries
Ubiquitous computing (pervasive computing)
Definition
Ubiquitous computing is the idea of computing being available everywhere and anytime.
- Idea of invisible computing
- Embedded computing (microprocessors)
- Need for low cost, low power computing with connectivity
- Usually includes a variety of sensors
- Smart designs: different architectures
- Need for standards and protocols
Peer-to-peer computing
Definition
PCs handling data locally instead of servers(becomes client and server); individual computers connect directly and communicating with each other as equals.
“A peer-to-peer (P2P) network is created when two or more PCs are connected and share resources without going through a separate server computer”
Characteristics:
- Decentralized
- If one peer falls out not the whole network affected
- But data recovery of one peer that is shutdown is not possible
- Requires independent backup
- Each peer acts as client and server
- Resources and contents shared across all peers and shared faster than client <-> server
- Has to be done by some software to enable this
- Malware can be faster distributed
C.3.3 Distinguish between interoperability and open standards.
Interoperability can be defined as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged”. In order for systems to be able to communicate they need to agree on how to proceed and for this reason standards are necessary. A single company could work on different systems that are interoperable through private standards only known to the company itself. However, for real interoperability between different systems open standards become necessary.
Open standards are standards that follow certain open principles. Definitions vary, but the most common principles are:
public availability
collaborative development, usually through some organization such as the World Wide Web Consortium (W3C) or the IEEE
royalty-free
voluntary adoption
The need for open standards is described well by W3C director and WWW inventor Tim Berners-Lee who said that “the decision to make the Web an open system was necessary for it to be universal. You can’t propose that something be a universal space and at the same time keep control of it.”
Some examples of open standards include:
file formats, e.g. HTML, PNG, SVG
protocols, e.g. IP, TCP
programming languages, e.g. JavaScript(ECMAScript)
C.3.4 Describe the range of hardware used by distributed networks.
Students should be aware of developments in mobile technology that have facilitated the growth of distributed networks.
This of course depends on the different types of distributed systems, but most generally speaking on a low level multiple CPUs need to be interconnected through some network, while at a higher level processes need to be able to communicate and coordinate. For each approach to distributed system, more specific types of hardware could be used:
- Mobile computing: wearables (e.g. Fitbit
), smartphones, tablets, laptops, but also transmitters and other hardware involved in cellular networks
- Ubiquitous computing: embedded devices, IoT devices, mobile computing devices, networking devices
- Peer-to-peer computing: usually PCs, but can include dedicated servers for coordination
- Grid computing: PCs and servers
- Content delivery networks (CDNs) is a system of distributed servers. They can cache content and speed up the delivery of content on a global scale
- Blockchain technology(e.g. Bitcoin, Ethereum) are decentralized and based on multiple peers, which can be PCs but also server farms
- Botnets can probably be considered a form of distributed computing as well, consisting of hacked devices, such as routers or PCs
C.3.5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web
Distributed systems consist of many different nodes that interact with each other. For this reason they are decentralized by design, which you can see in this comparison.

Therefore, the importance of distributed systems for a decentralized web lies in their benefits and disadvantages compared to classic centralized client-server models.
Benefits
- higher fault tolerance
- stability
- scalability
- privacy
- data portability is more likely
- independence from large corporations such as Facebook, Google, Apple or Microsoft
- potential for high performance systems
Disadvantages
- more difficult to maintain
- harder to develop and implement
- increased need for security
Personal conclusion
While some decentralized systems such as Bitcoins are gaining traction and some other systems like Git or Bittorrent have been around for a good time already, most part of the internet is still centralized, as most web applications follow the client-server model, which is further encouraged by corporations wanting to make profit. I found this post from Brewster Kahle’s Blog on the topic very interesting.
Compression & Decompression Week 2
C.3.6 Distinguish between lossless and lossy compression.
Students will not be required to study the detailed compression algorithms
C.3.7 Evaluate the use of decompression software in the transfer of information.
Students can test different compression methods to evaluate their effectiveness.
Compression 1 Objectives Understand Compression techniques and the need for compression
Compression Definition : Reduce the size of data * the number of bits used to store data, most services charge on number of bits you transmit / will reduce bandwith use
Benefits : Reduce storage needed and associated costs UX less latency speed use less bandwith:
Possible downside ?
What can we compress?
How?
Types of Compression ?
Compression 2 Hoffman Example Objective to apply a compression method Intro to Binary Trees
Text Compression Definition : Reduce the size of data * the number of bits used to store data
Compress Hello World to 33bits
Better Than 33 Bits How?
How can we compress text ?
Hoffman
Why we need compression techniques
The storage capacity of computers is growing at an unbelievable rate—in the last 25 years, the amount of storage provided on a typical computer has grown about a millionfold—but we still find more to put into our computers. Computers can store whole books or even libraries, and now music and movies too, if only they have the room. Large files are also a problem on the Internet, because they take a long time to download. We also try to make computers smaller—even a cellphone or wristwatch can be expected to store lots of information!
Video Compression
Videos take up a lot of space. Uncompressed 1080 HD video footage takes up about 10.5 GB of space per minute of video, but can vary with frame rate. If you use a smartphone to shoot your video, 1080p footage at the standard 30 frames per second takes up 130 MB per minute of footage, while 4K video takes up 375 MB of space for each minute of film.
Because videos take up so much space, and because bandwidth is limited, video compression is used with video files to reduce the size of the file. Compression involves packing the file's information into a smaller space. This works through two different kinds of compression: lossy and lossless.
Lossy VIDEO and SOUND Compression Formats
Lossy compression means that the compressed file has less data in it than the original file. Images and sounds that repeat throughout the video might be removed to effectively cut out parts of the video that are seen as unneeded. In some cases, this translates to lower-quality files because information has been lost, hence the designation "lossy."
However, you can lose a relatively large amount of data before you start to notice a difference (think MP3 audio files, which use lossy compression). Lossy compression makes up for the loss in quality by producing comparatively small files. For example, DVDs are compressed using the MPEG-2 format, which can make files 15 to 30 times smaller than the originals, but viewers still perceive DVDs as having high-quality pictures.
Most video files uploaded to the internet use lossy compression to keep the file size small while delivering a relatively high-quality product. If a video were to remain at its (in some cases) extremely high-quality file size, not only would it take forever to upload the content, but users with slow internet connections would have an awful time streaming the video or downloading it to their computers.
Lossless compression formats include Free Lossless Audio Codec (FLAC), Apple Lossless Audio Codec (ALAC), and Windows Media Audio Lossless (WMAL), among others.
Text Compression Huffman
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
Question Section
Describe one way that a compressed file may be decompressed
Can lossy compression be used on text files ? Explain your answer
Explain how compression of data may lead to negative consequences. [3]
Also explain the importance of compression now and in the future.
C.4 The evolving web
C.4.1 Discuss how the web has supported new methods of online interaction such as social networking.
keywords & Phrases Web 1 and Web 2.0
Web 1.0
Web 2.0
Semantic Web
ubiquitous
Berners-Lee
open protocols HTML HTTP
decentralization
ubiquitous
Read Only
Write Only
Hyperlinks
Web of linked documents
decentralization
successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.
keywords & Phrases Semantic Web
The aim of the Semantic Web is to shift the emphasis of associative linking from documents to data
Abundantly available information can be placed in new contexts and reused in unanticipated ways. This is the dynamic that enabled the WWW to spread, as the value of Web documents was seen to be greater in information rich contexts (O’Hara & Hall, 2009).
WEB of Data
Relational databases
Excel Spead sheets
WEB 3.0
ubiquitous
open protocols HTML HTTP
decentralization
ubiquitous
Governments are making data available seehttps://data.gov.uk/
datasets
Democracy rules: open and free
URL / URI
decentralization
successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.
Read Only
Write Only
Hyperlinks
Web of linked documents
Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
Google Maps is it free ?
If your business is missing can you add it
Can this information be monetized ? How
Should Google have a monopoly on location information?
Are there Alternatives to Google Maps?
Watch Video below and then create an account on open street map and add a building your condo/house Wells school etc. Read this post and describe in your own word how Open Street Maps is Different from Google Maps. Post your response to google classroom
The beginnings of the web (Web 1.0 , Web of content)
The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.
Web 2.0 – “Web of the Users”
However, the web started to evolve into the delivery of more dynamic documents, enabling user interaction or even allowing content contribution. The appearance of blogging platforms as Blogger in 1999 gives a time mark for the birth of the Web 2.0. Continuing the model from before, this would be the evolution to a “read-write” web. This opened new possibilities and lead to new concept as blogs, social networks or video-streaming platforms. Web 2.0 might also be looked at from the perspective of the websites themselves evolving in more dynamic and feature-rich. For instance, improved design, JavaScript and dynamic content loading could be considered Web 2.0 features.
Web 3.0 – “Semantic Web”
The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.
But what does this mean?
It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.
Later developments
However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.
C.4.2 Describe how cloud computing is different from a client-server architecture
Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
It’s worth noting that this comparison is not about two opposites. Both concepts do not exclude each other and can complement one another.
Client-server architecture
An application gets split into the client side and server-side. The server can be a central communicator between clients (e.g. email/chat server) or allow different clients to access and manipulate data in a database. A client-server application does also not necessarily need to be working over the internet, but could be limited to a local network, e.g. for enterprise applications.
Cloud computing still relies on the client-server architecture, but puts the focus on sharing computing resources over the internet. Cloud applications are often offered as a service to individuals and companies - this way companies don’t have to build and maintain their own computing infrastructure in house. Benefits of cloud computing include:
- Pay per use: elasticity allows the user to only pay for the resources that they actually use.
- Elasticity: cloud applications can scale up or down depending on current demands. This allows a better use of resources and reduces the need for companies to make large investments in a local infrastructure.Wiki Quote "the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible
- Self-provisioning: allows the user to set up applications in the cloud without the intervention of the cloud provider
- Company has options to use any of these SaaS, IaaS or PaaS
- Using these services offers many advantages over the server client model : Can you think of some?
Azure is Microsoft cloud services the other major one is amazon click here and watch the intro video
What is the difference between scalability and elasticity?
C.4.3 Discuss the effects of the use of cloud computing for specified organizations
To include public and private clouds
*** Creates an environment conducive to innovative startups and thus the potential for disruptive innovation
Private cloud
In a private cloud model a company owns the data centers that deliver the services to internal users only.
Can you think of any disadvantages?
Public cloud
In a public cloud services are provided by a third party and are usually available to the general public over the Internet.
Advantages
Disadvantages
Hybrid cloud
The idea of a hybrid cloud is to use the best of both private and public clouds by combining both. Sensitive and critical applications run in a private cloud, while the public cloud is used for applications that require high scalability on demand. As TechTarget explains, the goal of a hybrid cloud is to “create a unified, automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still
Summary of obstacles/Concerns
service availability
data lock-in also if wish to change what format will your data be in ? Could be very expensive to convert to a new data format
Company goes bust with all your data
data confidentiality and auditability ( security)
data transfer bottlenecks
performance unpredictability
Data Conversions
bugs in large-scale distributed systems
C.4.5 Describe the interrelationship between privacy, identification and authentication
Privacy
Identification
Authentication
C.4.7 Explain why the web may be creating unregulated monopolies
In theory the world wide web should be a free place where anybody can have a website. However, hosting a website usually comes with a cost - registering a domain name, getting a hosting service or investing in servers oneself, creating and maintaining the website (requires technical knowledge or the cost of hiring a web developer). In addition, to reach an audience further marketing through SEO (see C.2) is usually necessary to get good rankings in search engine results. This means that for the normal individual a traditional website is not the best option. A better alternative is to publish content on an existing platform, e.g. micro blogging on Twitter, blogging on WordPress or Blogspot, sharing social updates on Facebook, sharing photos on Flickr, etc. . This comes with improved comfort for users.
However, it easily leads to unregulated monopolies in the market because users usually stick to one platform. Tim Berners-Lee describes today’s social networks as centralized silos, which hold all user information in one place. This can be a problem, as such monopolies usually control a large quantity of personal information which could be misused commercially or stolen by hackers. There are certainly many more concerns which won’t fit into the scope of this site.
C.4.8 Decentralized and democratic web
Search Bubbles
A Filter Bubble Demonstration - Try this at home!
One way to see how filter bubbles work with search engines that do personalization (like Google) is to take a word that can have multiple meanings in different contexts and build up different search histories using those contexts. Then, when you search for the same word after having built up different search histories, the search engine should return results that look a bit different.
For this demonstration to work, you need to be sure to clear your search history before you start each round. This works even better if you have 2 or 3 people working side by side at different computers. That way you can compare the results more easily.
Try this with the word Tea.
1. Have someone build a search history using names of countries where tea is popularor names of countries where teas orgininated. Remember, do not use the word "tea" as a search term quite yet. Examples would be England, Japan, China, Latin America, etc.
2. Have another person build a search history using different spices, herbs, and flowersthat make up common teas. Examples would be roses, cinammon, chrysanthemum, lavender, etc.
3. Have a third person search for anything related to politics, such as names of political parties (not the Tea Party just yet, though!), names of political movements, words like "activism," or "conservative" and "liberal."
4. When you are performing these searches, click on some of the results (preferably general ones that might somehow later be connected to tea!). This will contribute to your search history.
5. Finally, have everyone search for the word "Tea." Have fun comparing results!
Note: Your results may still look very similar; the differences may be subtle. Whether or not the filter bubble is really something to be concerned about will be discussed in the next tab.
Who does this?
These are just a few of the websites that tailor results to you and your clicking history:
Amazon | Washington Post | |
Netflix | Yahoo News | New York Times |
Huffington Post |
C.5 (HL) Analyzing the web
C.5.1 Describe how the web can be represented as a directed graph.
Reference for this section please click here please note the definition of a tube is conflicting a more simple / general definition can be found in this paper click here
Questions on Directed Graphs
(a) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the largest strongly connected component.
(b) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the set IN
(c) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the set OUT.
C.5.2 Outline the difference between the web graph and sub-graphs.
Web graph
- Web graph describes the directed links between web pages in the WWW.
- It is a directed graph with directed edges
- Page A has a hyperlink to Page B, creating a directed edge from Page A to Page B
Sub-Graph
- A set of pages that are part of the internet
- Can be a set of pages linked to a specific topic ex.: Wikipedia -> one topic but references(and hyperlinks) to other web pages
- Can be a set of pages that deal with part of an organization
C.5.3 Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter.
C.5.4 Explain the role of graph theory in determining the connectivity of the web.
Connectivity
This is just a metric to discuss how well parts of a network connect to each other.
Small world graph
This is a mathematical graph whereas not all nodes are directly neighbors, but any given pair of nodes can be reached by a small number of hops or better said with just a few links. This is due to nodes being interconnected through interconnected hubs.
2 Properties of the small world graph:
Mean shortest-path length will be small
Most pairs of nodes will be connected by at least one short path
many clusters (highly connected subgraphs)
Analogy: airlines flight whereas you can reach any city most likely in just under three flights.
Examples: network of our brain neurons
Maximizes connectivity
Minimizes # of connections
6 degrees of separation
This originates from the idea that any human in the world is related in some way over 6 or less connections (steps). This idea can be taken further in a more general perspective on a graph, whereas any given pair of nodes within the network can be reached with just a maximum of 6 steps.
The idea itself can be applied to the web graph, suggesting high connectivity regardless of big size.
Not necessarily small world graph
High connectivity between all nodes
Web diameter (and its importance)
The diameter of the web graph has no standard definition, but is usually considered to be the average distance (as each edge has the same path length, this would be steps) between two random nodes.
This is important because it is an indicator of how quickly one can reach some page from any starting page in average. This is of importance for crawler, which want to index as many pages as possible in the shortest path.
average distance between two random nodes (pages)
important for crawlers to have a guide of how many steps it should take to reach a page
a factor to consider, is if the path is directed or undirected
often there is no direct path between nodes
Importance of hubs and authorities (link to C.2.3)
Hubs and authorities have special characteristics:
Hubs: have a large number of outgoing links
Authorities: have a large number of incoming links
For connectivity, this means that a larger number of hubs improves connectivity, while authorities are more likely to decrease connectivity as they usually do not link to many other pages.
C.5.5 Explain that search engines and web crawling use the web graph to access information.
C.5.6 Discuss whether power laws are appropriate to predict the development of the web.
C.6 (HL)The intelligent web
C.6.1 Define the term semantic web
The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
a proposed development of the World Wide Web in which data in web pages is structured and tagged in such a way that it can be read directly by computers
The Semantic web is an extension of the current web where data in webpages is structured and tagged to give it semantic meaning and make it machine-understandable, allowing computers and people to work in cooperation
C.6.2 Distinguish between the text-web and the multimedia-web
The traditional web is seen as being text based, the semantic web is multimedia based. Text-web - “read-only” Web
Unlike text-web, multimedia web pages use different forms of graphic content
C.6.3 Aims of semantic web
Ultimate aim is to allow computers to do more useful work and allow people and computers to work together
C.6.4 Distinguish between an ontology and folksonomy
[tip: also read C.6.5 in conjunction to better understand C.6.4]
References: http://www.sims.monash.edu.au/subjects/ims2603/resources/week7/7.1.pdf
important to understand very simply classification, folksonomy, and its drawbacks
http://www.ijodls.in/uploads/3/6/0/3/3603729/3_mohmedhanif__29-35_.pdf
tagging and folksonomy
http://www.cl.cam.ac.uk/~aac10/R207/ontology_vs_folksonomy.pdf
Ontology- A collection of names for a particular concept, like our music instruments assignment and relation types, organised in a type, sub-type and sub-sub type as so on
Folksonomy - Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging by the person consuming the information
Folksonomy is a system of classification derived from the practice and method of collaboratively creating and translating tags to annotate and categorize content. Ontology is hard to implement on a large scale and isn’t always web based. It is Key for the semantic web because of high expressive power. Folksonomy is created by users as well as being quick and easy to implement. It is used on a large scale for document collections. Most of the time its web based and important in web 2.0
C.6.5 Describe how folksonomies and emergent social structures are changing the web
Folksonomies and social structures are changing the web because they are a system of classification. Folksonomy is created by users as well as being quick and easy to implement. If we look at facebook for example, most of the content on there is created by the user. They are able to upload images, create statuses and much more. They are changing the web because all the content is determined by the user as opposed to the owners of the companies.
An example of folksonomies is the tag system in image sites or hashtags in social media. Users are defining more and more tags and as the volume of users tagging increases, the accuracy of tags increases such that the web is becoming more and more precise.
Tagging increases user participation in the web while enhancing searching and semantics of the web.
Folksonomy in social structures refers to users tagging media on social sites, for example tagging a picture of a cat as cute and animal. On sites like Flickr when more and more people tag a photo, it helps change the web by introducing semantic meaning to media, meaning created by users themselves. This is only possible through these emerging structures like Facebook, Flickr, etc. that allow us to create folksonomies through tagging
Folksonomy is a key part of Web 2.0 since it allows users to interact with the system and organize the wealth of information online. Most of the Web 2.0 technologies now have the flexibility to allow user to describe the content using keywords, categories, or labels. Note that these keywords, labels, tags, etc. are actually being used in many services on the web now.
This helps in identifying the content from the user context and helps for future retrieval. Folksonomy is important field of web 2.0 services. User index resources by themselves with free keywords, which are called tags. There are a lot of services online, especially for index bookmarks. Del.icio.us is here the most famous one
C.6.6 Explain why there needs to be a balance between expressivity and usability on the semantic web.
C.6.7 Evaluate methods of searching for information on the web.
C.6.8 Distinguish between ambient intelligence and collective intelligence
Slide to Ambient / Collective Intelligence Click Here
C.6.9 Discuss how ambient intelligence can be used to support people.
C.6.10 Explain how collective intelligence can be applied to complex issues.
Past Paper Questions
C.1.1 Distinguish between the internet and World Wide Web (web). 2
C.1.1 Outline one difference between the internet and the World Wide Web (WWW). [2] •
C.1.2 Describe how the web is constantly evolving
C.1.3 Identify the characteristics of the following:
C.1.3 Identify the characteristics of the following:
C.1.3 Identify the characteristics of the following:
C.1.4 Identify the characteristics of the following:
C.1.5 Describe the purpose of a URL
C.1.6 Describe how a domain name server functions.
C.1.7 Identify the characteristics of:
Resources
https://barefootcas.org.uk/wp-content/uploads/2015/02/KS2-Search-Results-Selection-Activity-Barefoot-Computing.pdf
https://www.hpe.com/us/en/insights/articles/how-search-worked-before-google-1703.html
Further Reading
http://www.ftsm.ukm.my/ss/Book/EVOLUTION%20OF%20WWW.pdf
https://eprints.soton.ac.uk/272374/1/evolvingwebfinal.pdf
http://dig.csail.mit.edu/2007/Papers/AIMagazine/fractal-paper.pdf
The Semantic Web
https://www.w3.org/2009/Talks/0615-SanJose-tutorial-IH/Slides.pdf
Ontology http://www.cs.man.ac.uk/~horrocks/Publications/download/2008/HoBe08.pdf
https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf