Here are some of the datasets that have been used in our presentations and hack sessions. They are recommended for study and for use in demonstrations.

Source: Description Link (Contributor)
UC Irvine Machine Learning datasets: shellfish size vs. weight, income vs. demographics, etc., etc. Search (Hannes)
Stack Overflow: Q&A about Open Data Subscribe or Search (John)
Criteo: Anonymized web click logs similar to prior Kaggle competition 23 files, ~1 TB (Hobs)
Yahoo: Anonymized web click logs (request access with e-mail from .edu TLD) Register with your .edu e-mail address (Hobs)