Nick Penney


RISD Semester Project

March 2013

Newsmilk is a news aggregation site

Newsmilk uses latent Dirichlet allocation to analyze the full text of 5,000 — 10,000 news articles from sources around the world each morning, and compiles a list of the most frequently-mentioned words or phrases.

3 — 5 topics are curated from the list and posted to the site, with at least 3 reliable citations for each story and a link to the original source, if available.

I used a machine learning library for Python to analyze the article contents, and the site was built with PHP.