A tool to automatically geocode and analyze location based data

As state and national level government agencies continue to make community related data available online (mass.gov/data, NY community health data), it brings the exciting opportunity to look for rich information between the datasets.

For example, one datasets may contain low birth rates across counties in new york, while another dataset may contain youth pregnancy data. Exploring correlations between the two datasets is an important first step towards uncovering the next big scoop (or interesting facts!).

Throughout the semester, I identified three hurdles that make these types of analyses difficult.

  1. Data import: Simply getting data into a program you’re comfortable with is hard. The data may be stored in JSON, or displayed as an HTML table on a website. Government datasets often come as hundreds of files in a zipfile.
  2. Extracting location information: Many datasets are very difficult to deal with because the statistics (e.g., low birth rates) are assigned to communities, cities, counties, or states. One dataset may report birth rates by county, while another reports then by zip code. We can only look between them by knowing how to equate zip codes with county names.
  3. Looking for patterns: Managing even 5 different datasets can quickly be unwieldy.

EasyData is a prototype to make each of these three steps less cumbersome. It tries to automate each of these three steps as much as possible.

Easy Data Import

EasyData will do a good enough job of importing your data. If the data contains headers (e.g., “birth_rate”, “county_name”), it will try to identify them. If the data has errors, it will ignore them.

Automatic Geocoding

EasyData will try to analyze your data to see if any of the columns are zipcodes, state names, addresses, or latitude longitude coordinates. If it thinks it sees an address and a state name, it will try to automatically geocode the table. Otherwise, you can tell EasyData which columns to geocode and it will do the rest.

Automatic Correlation Search

Once your data hase been geocoded, EasyData will combine the statistics of data that references the same location (e.g., downtown boston), and see if there are any interesting trends. It will plot the most interesting trends.

Interested? Try it out!

(It’s a prototype so may be really slow if multiple people are using it.)

Here’s a screenshot so you know its real.