Data Sources

This page describes several places where you might look for interesting data to use in your SPIS projects.

Here is an overview, followed by more specific information about each one.

Working with Reddit Data

Corgis datasets for Python

When working with the Corgis datasets for Python, be sure to read the part about the test=False parameter.

For many of these datasets, you only get a small sample of the data when you use this code:

import cars
list_of_car = cars.get_cars()
list_of_car = cars.get_cars_by_year("2001")
list_of_car = cars.get_cars_by_make("'Pontiac'")

If instead, you set the test parameter to False, you get a much larger data set that could be considered “big data”:

import cars
# These may be slow!
import cars
# These may be slow!
list_of_car = cars.get_cars(test=False)
list_of_car = cars.get_cars_by_year("2001", test=False)
list_of_car = cars.get_cars_by_make("'Pontiac'", test=False)