How The Web Works

TCP/IP Standards = Foundation

TCP/IP standards are the packet foundation, allowing a computer to communicate with another computer on the internet. Services we use are built on top of the basic TCP/IP capability: email, the web, skype, google hangouts, many different chat services

The World Wide Web

World Wide Web
1993 Sir Tim Berners-Lee, working at CERN
"The Web" is a made of a set of open standards:
--TCP/IP -- underlying networking
--HTML -- web page format
--HTTP -- web connection protocol to get a page
--JPEG, PNG -- image formats
--Javascript -- web page programming language
Remarkably for such a world-changing invention, there's not a single vendor-specific or proprietary part of it.
It's all open standards. Not a coincidence!

The familiar "web" of connected web pages runs on top of the basic TCP/IP phone system. Chat programs, email, .. these are other services, distinct from the web, which also run on top of the basic connectivity provided by TCP/IP. The web was created by Tim Berners-Lee working at the physics research facility CERN in Switzerland (now Sir Tim Berners-Lee). Browsers were available in 1993, and the web, urls etc. were becoming broadly popular by 1995.

Study question: why did something as important as the web not come out of a computer company like IBM or Microsoft or Apple or whatever? The web is a free and open standard (like TCP/IP), and for the most part is not locked-in to any particular vendor, and this freedom is a vital part of the web's success. Openness leads to participation, so the lock-in choices get shunned.

1. A URL

URL Uniform Resource Locator
A URL is the address of some information on the web
e.g. http://web.stanford.edu/class/cs101
http: -- system/scheme to use
www.stanford.edu -- domain name of server computer
-recall "domain name" prev lecture
/class/cs101 -- "path", particular page on that server

http://web.stanford.edu/class/cs101

A visit to a web page begins with a URL (Uniform Resource Locator) that points to that web page. Of course you've seen a million URLs over the years, but we'll look at the parts:

The http at the start is the networking scheme to use, and "http" and its secure variant "https" are by far the most common. In the future, if there were some new networking scheme, the URL syntax could still support it by starting with a different word before the colon.

After the // we have the web.stanford.edu which is the domain name of the computer on the internet that has this web page -- the web server. For the browser to request this web page, it will make a TCP/IP connection to that computer.

After the domain name we have the /class/cs101 path which indicates essentially which directory and file we want specifically from that web server.

2. Web Browser "Client"

"HTTP" is the protocol of the web
HTTP has 2 parties: client and server
Client is the browser program, e.g. Firefox, Chrome
In the browser, the user types in a url, hits return
Browser sends a request to server
Browser gets back HTML response, displays it
Basically request/response
Browser keeps history, back-button

The Web Browser is the familiar computer program, such as Firefox, that you run on your local computer to access the web. In short, you type URLs into your browser or click a link, and the browser requests and displays those pages for you. The browser also keeps track of your history of web pages so it can implement the back-button for you.

In networking terminology, the browser is the "client" which makes requests and displays what it gets back. The "server" is the other side of the request/response, servicing requests it gets. This is all done with TCP/IP packets between the browser and the server.

3. Web Server

Web Server: computer on the internet, has content, responds to HTTP requests
Web server program runs on the server computer, handles HTTP requests
-apache, nginx are popular open source web server programs
Web Server...
-Must be running all the time, be on the internet
-Needs a fixed, known IP address, a domain name
-Stores a bunch of files (HTML, JPEG, ..)
-Gets HTTP requests (from browsers)
-Sends back HTTP responses (HTML, JPEG, ..)

The other side of the conversation is the web server -- a machine which hosts a set of web pages, and waits for requests to come in for those pages. The phrase "web server" can refer to the physical machine, or it can mean the program that responds to requests. Below I'll use the phrases "web server machine" or "web server program" to distinguish those two cases.

The web server machine needs to be switched on, ready, and connected to the internet at all times. It is essentially waiting for an incoming request which could happen at any time. In contrast, you can switch on your laptop, do some browsing, and switch it off.

The web server program runs all the time, handling any incoming requests. For simple web pages, the web server program identifies a directory (aka a folder) as the web-root of the files to serve. The "path" part of the url maps into the web-root directory. So the url http://example.com/a.html means to get the file a.html from the web-root directory. The web-root can itself contain directories, so http://example.com/class/cs101/b.html refers to a "class" directory in the web-root directory, in turn containing a "cs101" directory, which contains a b.html file.

Put It Together -- HTTP Request / Response

The HTTP (Hyper Text Transfer Protocol) standard describes how a browser makes a request to the web server program. ("Hypertext" is the idea of links within documents pointing to other documents. This idea long predates the web.) If HTML describes the code for a web page, HTTP is the protocol for getting a web page from the server.

You use this all the time!
HTTP request/response system
Browser has url
1. Browser sends request to server named in url
-request includes the path
-e.g. "/class/cs101/syllabus.html"
-TCP/IP provides the packet-service
2. Server gets request
-looks up that path in its resources
-sends back response HTML
3. Browser gets HTML, displays it
Notes:
Server on all the time, has IP addr
Server stores HTML, JPEG etc. data
Server sends back "404" error if no such resource

The server just sends back back the HTML or whatever data to your browser. Your browser then "renders" this data into a window. This is why the View Source command works -- it just shows you the HTML of the server response, which is what the browser was using anyway.

The approximate appearance of the HTML is specified in the standard, but not the exact details; the appearance can vary with how wide your browser window is, what fonts your machine has etc. If you want to send a document and specify exactly how it looks, where the line breaks are etc. use PDF (Portable Document Format, owned by Adobe but also a free standard).

"Dynamic" Web Applications

Simplest case -- server holds unchanging files
"Web application" .. page contents are dynamic
e.g. GMail inbox
A program on the server runs to produce the page

The above HTTP request/response sequence is the major pattern of the web. For the simple case outlined above, we have static, unchanging content. Each web page corresponds basically to an HTML file stored on the server, and the contents of the file do not change quickly.

A more complex web site will have some pages which are "dynamic" -- the HTML for them is computed, producing HTML on the fly. With a static web site, each web page corresponds roughly to a file stored on the server. With a dynamic web site, a web page corresponds to a program on the server. A request for that web page runs the corresponding program code on the server. That code, essentially runs a series of print statements (like the print we we have used) to dynamically produce the HTML which is sent back as the response. The program could do anything -- look at various data sources, putting together any sort of HTML response page.

Dynamic Web Site - Google Trends

Web application example
www.google.com/trends
Type in a word or two: volcano, primary (4 year cycle), oscar (1 year cycle)
HTTP request as usual, "submit" button
Key: server runs a program to compute the HTML on the fly
Program basically uses print as we have seen .. producing HTML

A dynamic web site with a very simple interface is www.google.com/trends -- which graphs the frequency that different words appear in google searches. The front page shows an HTML form which includes fields where you can type in information. Clicking the button in the form (or sometimes typing return) "submits" the form to the server -- sending a request to the server which includes the values typed into the fields. This runs a program on the server which takes in the inputs from the form, and looks up information stored in files or databases on the server. The program puts all this information together and dynamically produces the HTML and images etc. for the result -- basically using print to produce HTML. Note that this still fits within the request/response pattern, but now the response is a one-off, computed on the fly just for this request.

The site sfbay.craigslist.org/ is another nice simple example of a dynamic form/response website. Here's how it works at a very high level. On the craigslist servers are files that store all the current listings .. say 1 million listings. When you submit your search term, code runs on the craigslist server, pulling out the listings that include that word, and printing out HTML to show you the first 100 matches. This program will use a for-loop, if-statement, and print, just as we have been doing.

Later Topics: Tracking and Privacy

In security lectures, we'll talk about logging and blocking HTTP packets.

Web Page - HTML

Here is simple web page with a few elements...

A Heading

This is the first paragraph.

This is a second paragraph, including a link to the codingbat site

An image is done with "img" tag which includes a "src" url pointing to the image data file, like this

HTML text code describes a web page
You should know a little bit of HTML, not being intimidated
Plain text with "tags" to mark text as bold etc. within brackets < ... >
"h1" or "h2" or "h3" are headings in big font, e.g. <h2>A Heading</h2>
"p" tag introduces a paragraph of text (starting on a new line)
"b" tags to mark bold: <b>like this</b>
"a" tags to mark a url:
<a href="http://codingbat.com">codingbat site</a>
"img" tag to load in image:
<img src="monkey.jpg">
"monkey.jpg" must be sitting in the same folder as the HTML file
Below is the HTML code/tags to produce the page above:

<html> starts the whole thing. The <head> section with <title> sets the title used at the top of the window. Inside <body> is the regular HTML content of the page.




<html>


<head>
<title>PAGE TITLE HERE</title>
<!-- this cryptic bit limits the page to 700 pixels wide -->
<style type="text/css">
body { max-width:700px; }
</style>
</head>

<body>

<h2>A Heading</h2>

<p>This is the first <b>paragraph</b>.

<p>This is a second paragraph, including a link to the
<a href="http://codingbat.com">codingbat site</a>

<p>An image is done with "img" tag which includes a "src" url
pointing to the image data file, like this 
<p><img src="monkey.jpg">

</body>
</html>

A web page is written is written in a plain text code called HTML (Hyper Text Markup Language). Basically, HTML adds "markup" commands <...> within plain text. The markup indicates that parts of text should be a heading, or bold, or a url, and so on.

You do not need to know much HTML markup for this course, just a few tags so you understand the basic idea. You should not be intimidated about producing an HTML page to show some information. Creating a basic looking HTML page is not difficult, although of course a complex page like the nytimes.com front page is a lot of work. You can write HTML by hand, just typing in the text including the HTML tags, or use a program that looks more like a word-processor to you, but which then generates HTML for you.

HTML Edit Demo

On his laptop, Nick edits the file network-html-sample.html, drags the file onto Firefox to display it. You can View Source on this page to see its underlying html. Key steps

In the editor, make a change, save the file
In the browser, click "reload" to see the changes immediately
This is the way to edit html, see results quickly

View Source

When you are visiting any web page, you can use the View Source command in your browser to see the underlying HTML code for the web page you see -- you will see <p> tags for paragraphs, <a href=...< tags for links, and <img src=...> tags for images. Check out the source code NYTimes.com .. it's quite a mess, but it's just HTML, rendered by your browser.

HTML 5

The latest version of HTML, HTML5 is becoming very popular, adding needed features to make better web pages and better dynamic web pages. Older versions of HTML lacked some features, so web pages did not look or work so well, but that's been largely fixed.

Web Design Philosophy

Someday, you will be tasked with organizing some important web contant. The most important thing to know about web design is in this comic: XKCD on web design

There are two points of view: the users of your site have interests, common questions. Very often a user visits a site with a question they want answered. The organization creating the web site has a different set of interests. The creators might care about the org-chart of what division is providing what, and who runs it, or just generally promoting how shiny and awesome their organization and their management are. The old joke is that lame web sites end up looking like the org-charts of their producing organizations.

If you want to make a popular web site, concentrate on the questions and interests of your visitors. Sounds obvious, but it's easy to make the other sort of site. Just look looks good to hierarchy to your site as captives, to be shown little graphics or videos that are essentially advertisements or advocacy. Instead, the most important question for the web design is: what are the most common questions/interests visitors will have and how can we make those answers conveniently available.