Tweeter Web Service

In this project you and a teammate will implement a Twitter-like social networking service that allows users to send and read short messages called tweets. The service will be accessed over the Web using the HTTP protocol, but it will not return HTML Web pages for display in a browser. Instead, Tweeter returns data in JSON format (JavaScript Object Notation); this structured form is intended for use by programs such as mobile applications.

Users and Tweets

Each Tweeter user is identified with a 64-bit unique integer. You do not need to worry about how these identifiers are assigned; you can assume that each user knows his or her identifier as well as the identifiers of anyone they wish to follow. Tweeter does not store user information such as name, address, etc.

Tweeter supports friend/follower relationships among users. Any user A can declare that any other user B is their friend. This means that A would like to read any tweets generated by B. If B is A's friend, then A is said to be a follower of B. Each user may follow any number of other users; friendships may be created and deleted at any time.

A tweet is a short message (140 characters or less) generated by a user. Each tweet is identified by a 64-bit unique integer, and ids must be assigned in increasing order: if one tweet is created after another, then it must have a larger id than the other.

Tweeter allows tweets to be read in two ways. First, it provides a mechanism for reading recent tweets generated by a particular user. Second, it provides a mechanism for reading recent tweets generated either by a user or by any of that user's friends. This second mechanism is the one most typically used by interactive applications.

Using HTTP for Requests

Applications interact with Tweeter over the network using the HTTP protocol. See the class lecture notes for basic information on HTTP. Each HTTP request specifies an operation for Tweeter to perform, such as "make user 100 a follower of user 200". This information is encoded using the URL for the request. For example, if Tweeter is listening on port 8080 of the local machine, the preceding request can be invoked with an HTTP PUT request that specifies the following URL:

http://localhost:8080/friendships/create?my_id=100&user_id=200

The hierarchical portion of the URL (/friendships/create) specifies an operation to perform, and the query values (user_id and my_id) specify parameters for that operation.

The HTTP GET method is used for requests that retrieve information from the server without making any changes, such as /followers/ids.json, which returns ids for all of the users who are following a particular user. Requests that modify state on the server, such as /friendships/create use the POST method in HTTP. In these requests, parameters can be specified either as query values in the URL, or in the body of the request. If parameters are specified in the body, they are encoded using the same notation as query values in the URL (e.g., my_id=100&user_id=200).

Note that if an input parameter contains any characters other than letters, digits, hyphen (-), underscore (_), period (.), or tilde (~), those characters will be escaped using URL encoding. See the lecture notes for details on URL encoding. For example if the message parameter for a tweet is actually "I'm on my way home", it will be encoded in the URL like this:

message=I%27m%20on%20my%20way%20home

Your Tweeter code must properly decode these values to extract the original text.

Results and JSON

Tweeter returns information back to applications using JSON format. See the lecture notes for details on the format of JSON objects, and see the descriptions of individual requests below for details on the specific values returned by each request. The JSON is returned as the body of the HTTP result; the response must include a Content-type header with value of application/json; this indicates to the recipient that the response is encoded in JSON format.

Each JSON response consists of a single JSON object with zero or more named properties. For example, here is a response containing a single property named ids, whose value is an array of user identifiers:

{"ids": [44, 99 307, 8216]}

If an error is detected while handling a request, such as a missing parameter, the JSON response contains a single property whose name is error and whose value is a string describing the problem, such as:

{"error": "missing parameter: user_id"}

Tweeter Requests

Here are the specific URLs that your Tweeter sever must support, along with their parameters and results. Note that each request must use a specific HTTP method (GET or POST); it is an error for a request to use the wrong method type. These requests are very similar to the requests supported by the real Twitter Web service (https://dev.twitter.com/rest/public).

/friendships/create

Method: POST
Parameters:

my_idIdentifier of a user that will become a follower.
user_idIdentifier of a user that will be followed.

Make user my_id a follower of user user_id; if it was already a follower, leave it that way. Returns an empty JSON object ({}).

/friendships/destroy

Method: POST
Parameters:

my_idIdentifier of the following user.
user_idIdentifier of the followed user.

If my_id is currently a follower of user user_id, then delete that friendship; if my_id is not currently a follower of user_id, do nothing. Returns an empty JSON object ({}).

/followers/ids.json

Method: GET
Parameters:

user_idIdentifier for a user.

Returns identifiers for all of the users who are followers of user_id . The identifiers are returned as an array in a property named ids. Here is an example result:

{"ids": [44, 99 307, 8216]}
/friends/ids.json

Method: GET
Parameters:

user_idIdentifier for a user.

Returns identifiers for all of the users who are friends of user_id (i.e., all of the users for whom user_id is a follower. The identifiers are returned as an array in a property named ids. Here is an example result:

{"ids": [44, 99 307, 8216]}
/statuses/update

Method: POST
Parameters:

my_idIdentifier of the user that created the tweet.
statusThe contents of the tweet (a message of no more than 140 characters).

Create a new tweet for user my_id with the given message. The tweet must be assigned an identifier higher than the identifier for any tweet created before this one. Returns an empty JSON object ({}).

/statuses/home_timeline.json

Method: GET
Parameters:

my_idIdentifier for a user.
countMaximum number of tweets to return (optional: defaults to 20).
max_idOptional: if specified, the returned tweets will have ids no higher than this.

Returns the most recent tweets (i.e. highest tweet ids) created by my_id and all of my_id's friends, subject to the count and max_id parameters. The tweets are returned as an array in a property named tweets, and each tweet is described with four properties: id (the tweet's identifier), user (the identifier for the user that created the tweet), time (the time when the tweet was created), and text (the contents of the tweet message). The returned tweets must be in reverse chronological order (most recent tweet first). Here is an example result:

    {"tweets": [
      {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014",
       "text": "On my way home"},
      {"id": 20007, "user": 18, "time": "Mon Oct 27 17:13:22 PDT 2014",
       "text": "Chillin' by the pool"},
      {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014",
       "text": "Just saw a flying saucer!"}
    ]}
/statuses/user_timeline.json

Method: GET
Parameters:

my_idIdentifier for a user.
countMaximum number of tweets to return (optional: defaults to 20).
max_idOptional: if specified, the returned tweets will have ids no higher than this.

This request is similar to statuses/home_timeline.json except that only tweets created by user my_id are returned (my_id's friends are not considered). Here is an example result:

    {"tweets": [
      {"id": 20115, "user": 84, "time": "Mon Oct 27 18:02:57 PDT 2014",
       "text": "On my way home"},
      {"id": 18442, "user": 84, "time": "Sun Oct 26 20:52:35 PDT 2014",
       "text": "Just saw a flying saucer!"}
    ]}

Command-Line Options

Your server must support at least the following command-line options, which may be specified when the server is started in order to configure it:

-port pPort number on which the server should listen for incoming requests (default: 80).
-workspace pathPath to a directory that Tweeter can use to store its data in files. If this directory already contains information when Tweeter starts up, Tweeter should assume that this is old state left behind when a previous server crashed; the new server should use this information to initialize itself.
-helpIf this option is specified (no value needed), Tweeter should print out a help message describing the command-line arguments, then it should exit without doing anything else.

Durability

Your server must provide durable storage for both tweets and friendship information, so that information is not lost if a server crashes and restarts. This means that you must store information in files on disk and reuse the saved information when a server restarts. You may assume that servers do not crash in the middle of of executing an operation; the only way a server crashes is for it to stop execution after completing an operation and returning the response. You may assume that the underlying operating system and storage device(s) are perfect and never crash or corrupt data.

It is up to you to decide how to represent information in files, but you must use ordinary files: do not use a database such as SQLite. You may find Java's serialization facilities useful for moving information to and from files.

Scalability and Performance

The real Twitter service is implemented by clusters of hundreds or thousands of servers running in large datacenters, with request handling divided among the servers and special-purpose storage servers to make the data durable. For your Tweeter project you will use only a single server that stores all of the data locally. However, you must take reasonable steps in your design to handle a large workload:

Implementation Notes

Your implementation of Tweeter must satisfy the following requirements:

Testing

There are several different ways you can test your Tweeter implementation. One approach is to use a Web browser and type URLs into the URL bar. You may be able to use this for many things, but it will only generate GET HTTP requests.

Another option is to use the curl command-line tool to issue requests. Curl offers a large number of arguments that can be used to invoke almost any imaginable HTTP request. Here are a few examples of curl commands:

curl http://localhost:8080/friends/ids.json?user_id=100

This command issues a GET request for the specified URL and prints the body of the response.

curl --data "my_id=99&status=Simple%20update" http://localhost:8080/statuses/update

This command issues a POST request for the specified URL and includes the value of the --data option as the body of the request. You must make sure that the value of the --data option is properly URL-encoded.

curl --data-urlencode "my_id=99&status=Simple update" http://localhost:8080/statuses/update

This example is similar to the previous one, except that curl will automatically provide proper escaping for the value of the --data option.

I strongly recommend that you use logging to record information when interesting events happen; this will make debugging much easier. Logging can be as simple as printing messages to standard output; if you are really ambitious you can learn how to use log4j, but that is not necessary for this class. Real Web services use logging extensively. In some cases they may log every single request, or even multiple log messages per request. A good rule of thumb for logging is to log everything you can possibly afford to log (you can't afford to log something if it would make your log file too large to store on disk, or if the logging requests would have a significant impact on performance).

Development Environment

I recommend using the Eclipse development environment for the projects in this class, but if there is some other development environment that you prefer, that's fine too. Please configure your development environment so that indent widths are 4 spaces, and only space characters are stored in files, not tabs (this will make it easier for me to review your code: tab characters and/or 8-space indents result in very long lines in the code review tool).

Here's how you can configure Eclipse for this:

Checklist

Here is a list of features to check in your project. I recommend going over this list several times as you design and build your project, to make sure you have included all the required elements.

Submitting Your Project

You will submit your project by creating an issue on a Web-based code review tool. For this class we will use the Rietveld code review tool, which is hosted on Google App Engine (this tool has a few features that make it more convenient for this class than the GitHub code review facilities). Here is how to submit your project:

Once you have created the code review, check to make sure it is visible at cs190codereview.appspot.com.

If you are planning to use late days for this project (or any project) please send me an email before the project deadline so that I know your plans. Send me another email once you eventually upload your code review.