Compute cosines to find out whether Doc1 or Doc2 will be ranked higher for the two-word query "Linus pumpkin", given these counts for the (only) 3 documents in the corpus:
term Doc1 Doc2 Doc3 --------------------------------------- Linus 10 0 1 Snoopy 1 4 0 pumpkin 4 100 10
Do this by computing the tf-idf cosine between the query and Doc1 and the cosine between
the query and Doc2, and choose the higher value. You should the ltc.lnn weighting variation (remember that's ddd.qqq),
using the following table:
What is the Euclidean distance between A' and B' (using raw term frequency)?
What is the cosine similarity between A' and B' (using raw term frequency)?
What does this say about using cosine similarity as opposed to Euclidean distance in information retrieval?