URLs and Links
Lecture Notes for CS 142
Fall 2010
John Ousterhout
- Readings for this topic: none.
- Hypertext:
- Documents containing fields that are links
- Clicking on a link takes you someplace else (in the same
document or a different document).
- Hypertext has been around since the 1960s:
- Ted Nelson coined the term (early '60s), built Xanadu system
- Doug Englebart: "Mother of all demos" in 1968
- HyperCard for the Macintosh: 1987
URLs
- URLs: Uniform Resource Locators:
- Provide names for Web content
- Example URL:
http://www.company.com:81/a/b/c.html?user=Alice&year=2008#p2
- Scheme (http:): identifies protocol used to fetch the content.
- http: is the most common scheme; it means use the HTTP
protocol, which we will discuss soon.
- https: is similar to http: except that it
uses SSL encryption when communicating with the server for
greater security.
- file: means read a file from the local disk.
- There are several other schemes, such as ftp:, but they aren't
used much anymore.
- Host name (//host.company.com): name of a machine running an
HTTP server.
- Server's port number (81): allows multiple servers to run
on the same machine. Servers almost always run on port 80 (the default).
- Hierarchical portion (/a/b/c.html): used by server to find
content. Server can use this field however it wishes:
- Path name for a static HTML file.
- Path name for file containing code which, when executed, will
generate a page (e.g., foo.php).
- className/method, identifying a particular method
in a particular class, which will generate HTML (Ruby).
- When you set up a Web server you provide configuration
information that tells how to interpret the hierarchical
portion of a URL.
- Query info (?user=Alice&year=2008): provides additional parameters that
can be used by the server to select dynamic content. For example:
http://www.company.com/showOrder.php?order=4621047
- Fragment (#p2): selects a particular location in the
resulting page (instead of displaying the top of the page, scroll
the window so the particular fragment appears at the top). Used
on the browser only; not sent to the server.
Links
- Links: content in a page which, when clicked on, causes the
browser to display another page.
- Links are implemented with the <a> tag:
<a href="http://www.company.com/news/2009.html">2009 News</a>
- <a> elements can be used in other ways:
- Relative URL (makes it easier to reorganize a Web site):
<a href="2008/March.html">
- Go to a different place in the same page:
- Define an anchor point (a position that can be referenced with
# notation):
- Other uses for URLs:
- Loading a page: type the URL into your browser.
- Nested content within a page
- Images:
- Stylesheets:
<link rel= "stylesheet" type="text/css" href="...">
- Embedded page:
<iframe src="http://www.google.com">
URL Encoding
- What if you want to include a punctuation character in a query value?
http://host.company.com/companyInfo?name=C&H Sugar
- Any character in a URL other than A-Z, a-z, 0-9, or any of
-_.~ must be represented as %xx, where xx
is the hexadecimal value of the character:
http://host.company.com/companyInfo?name=C%26H%20Sugar
- This is yet another example of a escaping, which is an issue whenever
information is encoded textually:
- Some characters in the encoding have special structural significance,
while other characters are just data.
- What if I want to include a special character in my data?
- Must introduce some escaping (or quoting) mechanism for
handling special characters
in data; typically this involves additional special characters
for the escaping mechanism, such as & in HTML/XML or % in URLs
- The escaping mechanism must also escape the escape
characters.
- If you ever receive data whose content is uncontrolled (e.g.,
typed by a user) and you want to incorporate it into a text encoding,
you must check the data for special characters and add appropriate
escaping.
- If you forget to escape special characters, unexpected user data
could change the structure of the encoded value. Malicious users
can capitalize on such mistakes to violate the security of the
system. Example: SQL query injection.
Miscellaneous Topics
- A key (and controversial) element of the Web: referential integrity
not required, broken links OK.
- URI (Uniform Resource Identifier) vs. URL