Tweeter Web Service

In this project you and a teammate will implement a Twitter-like social networking service that allows users to send and read short messages called tweets. The service will be accessed over the Web using the HTTP protocol, but it will not return HTML Web pages for display in a browser. Instead, Tweeter returns data in JSON format (JavaScript Object Notation); this structured form is intended for use by programs such as mobile applications.

Users and Tweets

Each Tweeter user is identified with a 64-bit unique integer. You do not need to worry about how these identifiers are assigned; you can assume that each user knows his or her identifier as well as the identifiers of anyone they wish to follow. Tweeter does not store user information such as name, address, etc.

Tweeter supports friend/follower relationships among users. Any user A can declare that any other user B is their friend. This means that A would like to read any tweets generated by B. If B is A's friend, then A is said to be a follower of B. Each user may follow any number of other users; friendships may be created and deleted at any time.

A tweet is a short message (140 characters or less) generated by a user. Each tweet is identified by a 64-bit unique integer, and ids must be assigned in increasing order: if one tweet is created after another, then it must have a larger id than the other.

Tweeter allows tweets to be read in two ways. First, it provides a mechanism for reading recent tweets generated by a particular user. Second, it provides a mechanism for reading recent tweets generated either by a user or by any of that user's friends. This second mechanism is the one most typically used by interactive applications.

Using HTTP for Requests

Applications interact with Tweeter over the network using the HTTP protocol. See the class lecture notes for basic information on HTTP. Each HTTP request specifies an operation for Tweeter to perform, such as "make user 100 a follower of user 200". This information is encoded using the URL for the request. For example, if Tweeter is listening on port 8080 of the local machine, the preceding request can be invoked with an HTTP PUT request that specifies the following URL:

http://localhost:8080/friendships/create?my_id=100&user_id=200

The hierarchical portion of the URL (/friendships/create) specifies an operation to perform, and the query values (user_id and my_id) specify parameters for that operation.

The HTTP GET method is used for requests that retrieve information from the server without making any changes, such as /followers/ids.json, which returns ids for all of the users who are following a particular user. Requests that modify state on the server, such as /friendships/create use the POST method in HTTP. In these requests, parameters can be specified either as query values in the URL, or in the body of the request. If parameters are specified in the body, they are encoded using the same notation as query values in the URL (e.g., my_id=100&user_id=200).

Note that if an input parameter contains any characters other than letters, digits, hyphen (-), underscore (_), period (.), or tilde (~), those characters will be escaped using URL encoding. See the lecture notes for details on URL encoding. For example if the message parameter for a tweet is actually "I'm on my way home", it will be encoded in the URL like this:

message=I%27m%20on%20my%20way%20home

Your Tweeter code must properly decode these values to extract the original text.

Results and JSON

Tweeter returns information back to applications using JSON format. See the lecture notes for details on the format of JSON objects, and see the descriptions of individual requests below for details on the specific values returned by each request. The JSON is returned as the body of the HTTP result; the response must include a Content-type header with value of application/json; this indicates to the recipient that the response is encoded in JSON format.

Each JSON response consists of a single JSON object with zero or more named properties. For example, here is a response containing a single property named ids, whose value is an array of user identifiers:

{"ids": [44, 99 307, 8216]}

If an error is detected while handling a request, such as a missing parameter, the JSON response contains a single property whose name is error and whose value is a string describing the problem, such as:

{"error": "missing parameter: user_id"}

Tweeter Requests

Here are the specific URLs that your Tweeter sever must support, along with their parameters and results. Note that each request must use a specific HTTP method (GET or POST); it is an error for a request to use the wrong method type. These requests are very similar to the requests supported by the real Twitter Web service (https://dev.twitter.com/rest/public).

/friendships/create

Method: POST
Parameters:

my_id	Identifier of a user that will become a follower.
user_id	Identifier of a user that will be followed.

Make user my_id a follower of user user_id; if it was already a follower, leave it that way. Returns an empty JSON object ({}).

/friendships/destroy

Method: POST
Parameters:

my_id	Identifier of the following user.
user_id	Identifier of the followed user.

If my_id is currently a follower of user user_id, then delete that friendship; if my_id is not currently a follower of user_id, do nothing. Returns an empty JSON object ({}).

/followers/ids.json

Method: GET
Parameters:

user_id

Identifier for a user.

Returns identifiers for all of the users who are followers of user_id . The identifiers are returned as an array in a property named ids. Here is an example result:

{"ids": [44, 99 307, 8216]}

/friends/ids.json

Method: GET
Parameters:

user_id

Identifier for a user.

Returns identifiers for all of the users who are friends of user_id (i.e., all of the users for whom user_id is a follower. The identifiers are returned as an array in a property named ids. Here is an example result:

{"ids": [44, 99 307, 8216]}

/statuses/update

Method: POST
Parameters:

my_id	Identifier of the user that created the tweet.
status	The contents of the tweet (a message of no more than 140 characters).

Create a new tweet for user my_id with the given message. The tweet must be assigned an identifier higher than the identifier for any tweet created before this one. Returns an empty JSON object ({}).

/statuses/home_timeline.json

Method: GET
Parameters:

my_id	Identifier for a user.
count	Maximum number of tweets to return (optional: defaults to 20).
max_id	Optional: if specified, the returned tweets will have ids no higher than this.

Returns the most recent tweets (i.e. highest tweet ids) created by my_id and all of my_id's friends, subject to the count and max_id parameters. The tweets are returned as an array in a property named tweets, and each tweet is described with four properties: id (the tweet's identifier), user (the identifier for the user that created the tweet), time (the time when the tweet was created), and text (the contents of the tweet message). The returned tweets must be in reverse chronological order (most recent tweet first). Here is an example result:

    {"tweets": [
      {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014",
       "text": "On my way home"},
      {"id": 20007, "user": 18, "time": "Mon Oct 27 17:13:22 PDT 2014",
       "text": "Chillin' by the pool"},
      {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014",
       "text": "Just saw a flying saucer!"}
    ]}

/statuses/user_timeline.json

Method: GET
Parameters:

my_id	Identifier for a user.
count	Maximum number of tweets to return (optional: defaults to 20).
max_id	Optional: if specified, the returned tweets will have ids no higher than this.

This request is similar to statuses/home_timeline.json except that only tweets created by user my_id are returned (my_id's friends are not considered). Here is an example result:

    {"tweets": [
      {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014",
       "text": "On my way home"},
      {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014",
       "text": "Just saw a flying saucer!"}
    ]}

Command-Line Options

Your server must support at least the following command-line options, which may be specified when the server is started in order to configure it:

-port p	Port number on which the server should listen for incoming requests (default: 80).
-workspace path	Path to a directory that Tweeter can use to store its data in files. If this directory already contains information when Tweeter starts up, Tweeter should assume that this is old state left behind when a previous server crashed; the new server should use this information to initialize itself.
-help	If this option is specified (no value needed), Tweeter should print out a help message describing the command-line arguments, then it should exit without doing anything else.

Durability

Your server must provide durable storage for both tweets and friendship information, so that information is not lost if a server crashes and restarts. This means that you must store information in files on disk and reuse the saved information when a server restarts. You may assume that servers do not crash in the middle of of executing an operation; the only way a server crashes is for it to stop execution after completing an operation and returning the response. You may assume that the underlying operating system and storage device(s) are perfect and never crash or corrupt data.

It is up to you to decide how to represent information in files, but you must use ordinary files: do not use a database such as SQLite. You may find Java's serialization facilities useful for moving information to and from files.

Scalability and Performance

The real Twitter service is implemented by clusters of hundreds or thousands of servers running in large datacenters, with request handling divided among the servers and special-purpose storage servers to make the data durable. For your Tweeter project you will use only a single server that stores all of the data locally. However, you must take reasonable steps in your design to handle a large workload:

Your system must be capable of handling millions of users and millions of tweets per day (as of 2014, the real Twitter handles about 500 million tweets per day).
Your system must be able to handle large numbers of followers for a popular user such as Barack Obama or Lady Gaga, but you can assume that individual users don't have large numbers of friends (e.g. a user might have several hundred friends, but probably not several million).
Your system must keep large amounts of data in DRAM to improve performance. You may assume that a server machine running your Tweeter service will have 50-100 GB of DRAM. This should be enough to keep all of the basic user data in DRAM, as well as recent tweets (for example, one million tweets will probably occupy about 200-400 MB of DRAM, depending on how much information you keep for each tweet). Your design must allow the most common requests to be serviced entirely from information in DRAM without having to read information from disk.

Implementation Notes

Your implementation of Tweeter must satisfy the following requirements:

You must implement the Tweeter server in Java. I recommend using the Eclipse IDE, but you may use a different development environment if you prefer.
You must work in teams of two. This is important for three reasons: first, by working in teams you can attack a larger project, which leads to more interesting code structuring issues; second, the team approach means that you have someone with whom you can discuss your designs; and third, it reduces the number of projects that I have to read, which permits a larger class size.
Your most important goal is to create a clean, simple, and elegant code structure. Although I expect your code to (mostly) work, and it must implement the features described here, I will be judging it primarily on its structure; it's better to spend time cleaning up the structure and documentation than fixing every last bug.
You may not use any existing library package for generating JSON output: you must write your own module for this.
You may not use any existing library package for processing HTTP requests, extracting data from them, or generating HTTP responses: you must write your own code for this (designing these interfaces is part of the project). You may use only basic Java I/O libraries such as the Socket and InputStreamReader classes. If in doubt about using a particular class, check with me.
When designing your interfaces, I ask that you do this from scratch, without consulting any existing code that offers similar functionality. This is important so that you can think things through for yourself. In addition, many existing packages have bad interfaces, so it may not be good to use them for ideas.
Friend/follower relationships can come and go, but tweets are never deleted.
You do not need to implement access control: for example, anyone can create and delete friendships on behalf of any user.
Your implementation must do a "reasonable" job of error handling. For example, your code must handle gracefully any errors coming from the user, such as an improper implementation of HTTP, an unknown URL, or missing parameters. Your code must also handle errors coming from underlying classes such as the I/O system, at least to the point of logging a message and exiting cleanly. If a system error can be handled so that the system can continue operation, and this can be done without significant additional complexity, you should do so (this will be a qualitative value judgment). Your implementation must never die silently.
You do not need to use multi-threading in your implementation, and I recommend against it, since it will add quite a bit of additional complexity.

Testing

There are several different ways you can test your Tweeter implementation. One approach is to use a Web browser and type URLs into the URL bar. You may be able to use this for many things, but it will only generate GET HTTP requests.

Another option is to use the curl command-line tool to issue requests. Curl offers a large number of arguments that can be used to invoke almost any imaginable HTTP request. Here are a few examples of curl commands:

curl http://localhost:8080/friends/ids.json?user_id=100

This command issues a GET request for the specified URL and prints the body of the response.

curl --data "my_id=99&status=Simple%20update" http://localhost:8080/statuses/update

This command issues a POST request for the specified URL and includes the value of the --data option as the body of the request. You must make sure that the value of the --data option is properly URL-encoded.

curl --data-urlencode "my_id=99&status=Simple update" http://localhost:8080/statuses/update

This example is similar to the previous one, except that curl will automatically provide proper escaping for the value of the --data option.

I strongly recommend that you use logging to record information when interesting events happen; this will make debugging much easier. Logging can be as simple as printing messages to standard output; if you are really ambitious you can learn how to use log4j, but that is not necessary for this class. Real Web services use logging extensively. In some cases they may log every single request, or even multiple log messages per request. A good rule of thumb for logging is to log everything you can possibly afford to log (you can't afford to log something if it would make your log file too large to store on disk, or if the logging requests would have a significant impact on performance).

Development Environment

I recommend using the Eclipse development environment for the projects in this class, but if there is some other development environment that you prefer, that's fine too. Please configure your development environment so that indent widths are 4 spaces, and only space characters are stored in files, not tabs (this will make it easier for me to review your code: tab characters and/or 8-space indents result in very long lines in the code review tool).

Here's how you can configure Eclipse for this:

Go to Window->Preferences->Java->Code Style->Formatter
Select "Edit..."
Set "Indentation size" to 4, "Tab size" to 4, and "Tab policy" to "Spaces only", then save this (you may need to create a new named profile to save this information).
If you have already created some files using different indentation, you can reformat them by selecting the files in the Package Explorer, then right-clicking one of the files and selecting Source->Format.

Checklist

Here is a list of features to check in your project. I recommend going over this list several times as you design and build your project, to make sure you have included all the required elements.

Can parameters can be specified in the body of POST requests, as well as in the URL for either GET or POST requests?
Do you handle escaping properly, both in parameters in URLs (or in a POST body) and in JSON output?
Do you properly escape strings in your JSON results? For example, what if a message contains double-quotes, or a newline character?
If a server crashes and restarts, do you make sure that tweet identifiers don't accidentally get reused?
If a server crashes and restarts, are recent changes still available, such as new friendships or tweets?

Submitting Your Project

You will submit your project by creating an issue on a Web-based code review tool. For this class we will use the Rietveld code review tool, which is hosted on Google App Engine (this tool has a few features that make it more convenient for this class than the GitHub code review facilities). Here is how to submit your project:

Create a git tag named project1 for the commit that you are submitting for the project. Push this tag to GitHub.
Download the script upload.py, which you will use to submit issues to the code review tool.
You will need a Google account to submit a code review, so create one now if you don't already have one.
Invoke the following command:
```
python upload.py --rev XXX..project1
```
where XXX is the git identifier for the very first (initial) commit for your repo.
When asked for "New issue subject", type "AAA/BBB Project 1", where AAA and BBB are the last names of the team members, in alphabetical order (use AAA/BBB/CCC if your team has three people).
Provide your Google account information when asked for email and password. Note: if you have two-step authentication enabled for your Google account, your password will be rejected; go to this page to generate a special app password to use for upload.py.

Once you have created the code review, check to make sure it is visible at cs190codereview.appspot.com.

If you are planning to use late days for this project (or any project) please send me an email before the project deadline so that I know your plans. Send me another email once you eventually upload your code review.

CS 190: Software Design Studio (Spring 2015)

Tweeter Web Service

Users and Tweets

Using HTTP for Requests

Results and JSON

Tweeter Requests

Command-Line Options

Durability

Scalability and Performance

Implementation Notes

Testing

Development Environment

Checklist

Submitting Your Project