Google Location History Explorer

The goal of this post is to visualize time spent in various Seattle neighborhoods using Google location data and Python * This post utilizes code from Tyler Hartley’s visualizing location history blog post

Overview

  1. Setup
    • download data
    • install modules
  2. Data Wrangling
    • data extraction
  3. Working with Shapefiles in Python
  4. Prep data and pare down locations
  5. Compute your measurement metric
  6. Choropleth Plot
  7. Hexbin Map

Setup

  1. Use Google Takout to download your Google location history
  • If you’ve previously enabled Google location reporting on your smartphone, your GPS data will be periodically uploaded to Google’s servers. Use Google Takeout to download your location history.
    • The decisions of when and how to upload this data are entirely obfuscated to the end user, but as you’ll see below, Android appears to upload a GPS location every 60 seconds. That’s plenty of data to work with.
  1. After downloading your data, install the required modules

Google Takeout

Google Takeout is a Google service that allows users to export any personal Google data. We’ll use Takeout to download our raw location history as a one-time snapshot. Since Latitude was retired, no API exists to access location history in real-time.

Download location data: * Go to takeout. Uncheck all services except “Location History” * The data will be in a json format, which works great for us. Download it in your favorite compression type. * When Google has finished creating your archive, you’ll get an email notification and a link to download. * Download and unzip the file, and you should be looking at a LocationHistory.json file. Working with location data in Pandas. Pandas is an incredibly powerful tool that simplifies working with complex datatypes and performing statistical analysis in the style of R. Chris Albon has great primers on using Pandas here under the “Data Wrangling” section.

Install modules

  • If you use Anaconda to manage your Python packages, I recommend creating a virtual environment with anaconda to install the dependencies. Copying the lines below the instruction into the terminal creates the environment, requirements.txt, etc.
    • conda create -n test-env python=3.5 anaconda
    • source activate test-env
  • make a requirements.txt file for dependencies
    • (echo descartes; echo IPython; echo shapely; echo fiona; echo Basemap) >> requirements.txt
  • install requirements.txt
    • conda install –yes –file requirements.txt

After completing the setup, we’ll read in the LocationHistory.json file from Google Takeout and create a DataFrame.

Code
import json
import time
import fiona
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from shapely.prepared import prep
from pysal.esda.mapclassify import Natural_Breaks
from shapely.geometry import Point, Polygon, MultiPoint, MultiPolygon
from matplotlib.collections import PatchCollection
from descartes import PolygonPatch
from IPython.display import Image
import warnings
warnings.filterwarnings('ignore')

Data Wrangling

  • data extraction
Code
with open('LocationHistory.json', 'r') as fh:
    raw = json.loads(fh.read())

location_data = pd.DataFrame(raw['locations'])
del raw #free up some memory
# convert to typical units
location_data['latitudeE7'] = location_data['latitudeE7']/float(1e7) 
location_data['longitudeE7'] = location_data['longitudeE7']/float(1e7)
location_data['timestampMs'] = location_data['timestampMs'].map(lambda x: float(x)/1000) #to seconds
location_data['datetime'] = location_data.timestampMs.map(datetime.datetime.fromtimestamp)
# Rename fielocation_datas based on the conversions we just did
location_data.rename(columns={'latitudeE7':'latitude', 'longitudeE7':'longitude', 'timestampMs':'timestamp'}, inplace=True)
location_data = location_data[location_data.accuracy < 1000] #Ignore locations with accuracy estimates over 1000m
location_data.reset_index(drop=True, inplace=True)

Explore Data

  • view data and datatypes
Code
location_data.head()
accuracy activity altitude heading latitude longitude timestamp velocity datetime
0 21 [{'timestampMs': '1503180210463', 'activity': ... NaN NaN 47.664788 -122.346873 1.503180e+09 NaN 2017-08-19 15:03:32.631
1 21 [{'timestampMs': '1503180149925', 'activity': ... NaN NaN 47.664788 -122.346873 1.503180e+09 NaN 2017-08-19 15:01:31.487
2 21 [{'timestampMs': '1503179967778', 'activity': ... 65.0 NaN 47.664791 -122.346896 1.503180e+09 NaN 2017-08-19 14:59:29.852
3 21 [{'timestampMs': '1503179844537', 'activity': ... 69.0 NaN 47.664788 -122.346873 1.503180e+09 NaN 2017-08-19 14:57:23.436
4 19 [{'timestampMs': '1503178357753', 'activity': ... 63.0 NaN 47.661274 -122.313973 1.503178e+09 0.0 2017-08-19 14:30:41.000
Code
location_data.dtypes
accuracy              int64
activity             object
altitude            float64
heading             float64
latitude            float64
longitude           float64
timestamp           float64
velocity            float64
datetime     datetime64[ns]
dtype: object
Code
location_data.describe()
accuracy altitude heading latitude longitude timestamp velocity
count 716017.000000 83282.000000 35498.000000 716017.000000 716017.000000 7.160170e+05 46791.000000
mean 59.556770 33.147823 190.420869 37.611641 -105.131568 1.413964e+09 7.548140
std 125.760722 184.995795 103.330412 9.051446 16.496591 2.838440e+07 11.326376
min 1.000000 -715.000000 0.000000 13.689757 -123.260751 1.376790e+09 0.000000
25% 23.000000 -20.000000 98.000000 29.817567 -122.306606 1.390437e+09 0.000000
50% 31.000000 -6.000000 188.000000 29.817867 -95.253652 1.412290e+09 1.000000
75% 51.000000 37.000000 273.000000 47.664313 -94.995611 1.425326e+09 12.000000
max 999.000000 6738.000000 359.000000 50.105984 8.675475 1.503180e+09 208.000000
  • accuracy code “999” may represent missingness
  • find earliest and latest observations in the data
    • save for later
Code
print("earliest observed date: {}".format(min(location_data["datetime"]).strftime('%m-%d-%Y')))
print("latest observed date: {}".format(max(location_data["datetime"]).strftime('%m-%d-%Y')))

earliest_obs = min(location_data["datetime"]).strftime('%m-%d-%Y')
latest_obs = max(location_data["datetime"]).strftime('%m-%d-%Y')
earliest observed date: 08-17-2013
latest observed date: 08-19-2017
  • location_data is a Pandas DataFrame containing all your location history and related info.
  • columns include latitude, longitude, and a timestamp. additional columns are accuracy, activity, altitude, heading, and velocity.
  • all we’ll need is latitude, longitude, and time.

Working with Shapefiles in Python

Shapefile is a widely-used data format for describing points, lines, and polygons. To work with shapefiles, Python gives us shapely. To read and write shapefiles, we’ll use fiona.

To learn Shapely and write this blog post, I leaned heavily on this article from sensitivecities.com.

First up, you’ll need to download shapefile data for the part of the world you’re interested in plotting. I wanted to focus on my current home of Seattle, which like many cities provides city shapefile map data for free. It’s even broken into city neighborhoods! The US Census Bureau provides a ton of national shapefiles here. Your city likely provides this kind of data too. Tom MacWright has GIS with Python, Shapely, and Fiona overview for more detail on Python mapping with these tools

Next, we’ll need to import the Shapefile data we downloaded from the data.seattle.gov link above

Code
shapefilename = 'data/Seattle_Neighborhoods/WGS84/Neighborhoods'
shp = fiona.open(shapefilename+'.shp')
coords = shp.bounds
shp.close()

width, height = coords[2] - coords[0], coords[3] - coords[1]
extra = 0.01
  • Use Basemap to plot the shapefiles
Code
m = Basemap(
    projection='tmerc', ellps='WGS84',
    lon_0=np.mean([coords[0], coords[2]]),
    lat_0=np.mean([coords[1], coords[3]]),
    llcrnrlon=coords[0] - extra * width,
    llcrnrlat=coords[1] - (extra * height), 
    urcrnrlon=coords[2] + extra * width,
    urcrnrlat=coords[3] + (extra * height),
    resolution='i',  suppress_ticks=True)

_out = m.readshapefile(shapefilename, name='seattle', drawbounds=False, color='none', zorder=2)

Prep data and pare down locations

The first step is to pare down your location history to only contain points within the map’s borders.

Code
# set up a map dataframe
df_map = pd.DataFrame({
    'poly': [Polygon(hood_points) for hood_points in m.seattle],
    'name': [hood['S_HOOD'] for hood in m.seattle_info]
})

# Convert our latitude and longitude into Basemap cartesian map coordinates
mapped_points = [Point(m(mapped_x, mapped_y)) for mapped_x, mapped_y in zip(location_data['longitude'], 
            location_data['latitude'])]
all_points = MultiPoint(mapped_points)

# Use prep to optimize polygons for faster computation
hood_polygons = (MultiPolygon(list(df_map['poly'].values)))
prepared_polygons = prep(hood_polygons)

# Filter out the points that do not fall within the map we're making
city_points_filter = filter(prepared_polygons.contains, all_points)
city_points_list = list(city_points_filter)
Code
hood_polygons
Self-intersection at or near point 1373.2245930902682 17705.454459863173

Code
df_map.tail()
name poly
116 North College Park POLYGON ((7030.064427959046 23557.22768256248,...
117 Maple Leaf POLYGON ((8132.34671963913 23955.72380105577, ...
118 Crown Hill POLYGON ((5426.118400305284 23570.62258082016,...
119 Greenwood POLYGON ((5823.505256198943 23565.24123078629,...
120 Sunset Hill POLYGON ((2720.630502370794 22739.9815489074, ...
Code
print("total data points in this period: {}".format(len(all_points)))
print("total data points in the city shape file for this period: {}".format(len(city_points_list)))
percentage_in_city = round(len(city_points_list)/len(all_points),2)*100
print("{}% of points this period are in the city shape file".format(percentage_in_city))
total data points in this period: 716017
total data points in the city shape file for this period: 292617
41.0% of points this period are in the city shape file

Now, city_points contains a list of all points that fall within the map and hood_polygons is a collection of polygons representing, in my case, each neighborhood in Seattle.

Compute your measurement metric

The raw data for my choropleth should be “number of points in each neighborhood.” With Pandas, again, it’s easy. (Warning - depending on the size of the city_points array, this could take a few minutes.)

Code
def num_of_contained_points(apolygon, city_points):
    points = len(list(filter(prep(apolygon).contains, city_points)))
    return (points)
Code
df_map['name'][41]
'University District'
Code
udistrict_points = round(len(list(filter((df_map['poly'][41]).contains, city_points_list)))/len(city_points_list),2)*100
print("{}% of points this in the city shape file are from the {}".format(udistrict_points, df_map['name'][41]))
93.0% of points this in the city shape file are from the University District
Code
df_map['hood_count'] = df_map['poly'].map(lambda x: num_of_contained_points(x, city_points_list))
df_map['hood_hours'] = df_map.hood_count/60.0
  • view most popular neighborhoods by counts
Code
df_map.sort_values(['hood_count'], ascending=[0]).head()
name poly hood_count hood_hours
41 University District POLYGON ((10922.3368171583 18686.63646881089, ... 271001 4516.683333
31 Wallingford POLYGON ((6833.376859840086 19104.80491919432,... 5869 97.816667
40 Roosevelt POLYGON ((9772.031684109563 21918.84443396827,... 2611 43.516667
35 Ravenna POLYGON ((9870.257357092554 21917.8137398672, ... 2455 40.916667
99 Broadway POLYGON ((8668.473681961603 16293.21876423242,... 1457 24.283333

So now, df_map.hood_count contains a count of the number of GPS points located within each neighborhood. But what do those counts really mean? It’s not very meaningful knowing that I spent any n “counts” in a neighborhood, except to compare neighborhood counts against each other. And we could do that. Or we could convert hood_count into time

Turns out, converting counts into time is straightforward. From investigating the location history, it seems that unless the phone is off or without reception, Android reports you location exactly every 60 seconds. Not usually 60 seconds, not sometimes 74 seconds, every 60 seconds. It’s been true on Android 4.2-6.0. Hopefully that means it holds true for you, too. So if we make the assumption that my phone on 24/7 (true) and I have city-wide cellular reception (also true), then all we need to do is hood_count/60.0, as shown above, and now we’re talking in hours instead of counts.

Choropleth Plot

  • The code for creating this hexbin map below is in choropleth.py
Code
#%run choropleth.py
Code
# Check out the full post at http://beneathdata.com/how-to/visualizing-my-location-history/
# for more information on the code below

breaks = Natural_Breaks(
    df_map[df_map['hood_hours'].notnull()].hood_hours.values,
    initial=300,
    k=5)

# the notnull method lets us match indices when joining
jb = pd.DataFrame({'jenks_bins': breaks.yb}, index=df_map[df_map['hood_hours'].notnull()].index)
df_map['jenks_bins'] = jb
df_map.jenks_bins.fillna(-1, inplace=True)
jenks_labels = ['Never been here', "> 0 hours"]+["> %d hours"%(perc) for perc in breaks.bins[:-1]]

def custom_colorbar(cmap, ncolors, labels, **kwargs):    
    """Create a custom, discretized colorbar with correctly formatted/aligned labels.
    
    cmap: the matplotlib colormap object you plan on using for your graph
    ncolors: (int) the number of discrete colors available
    labels: the list of labels for the colorbar. Should be the same length as ncolors.
    """
    from matplotlib.colors import BoundaryNorm
    from matplotlib.cm import ScalarMappable
        
    norm = BoundaryNorm(range(0, ncolors), cmap.N)
    mappable = ScalarMappable(cmap=cmap, norm=norm)
    mappable.set_array([])
    mappable.set_clim(-0.5, ncolors+0.5)
    colorbar = plt.colorbar(mappable, **kwargs)
    colorbar.set_ticks(np.linspace(0, ncolors, ncolors+1)+0.5)
    colorbar.set_ticklabels(range(0, ncolors))
    colorbar.set_ticklabels(labels)
    return colorbar

figwidth = 8
fig = plt.figure(figsize=(figwidth, figwidth*height/width))
ax = fig.add_subplot(111, axisbg='w', frame_on=False)

cmap = plt.get_cmap('Blues')

# draw neighborhoods with grey outlines
df_map['patches'] = df_map['poly'].map(lambda x: PolygonPatch(x, ec='#111111', lw=.8, alpha=1., zorder=4))
pc = PatchCollection(df_map['patches'], match_original=True)

# apply our custom color values onto the patch collection
cmap_list = [cmap(val) for val in (df_map.jenks_bins.values - df_map.jenks_bins.values.min())/(
                  df_map.jenks_bins.values.max()-float(df_map.jenks_bins.values.min()))]
pc.set_facecolor(cmap_list)
ax.add_collection(pc)

#Draw a map scale
m.drawmapscale(coords[0] + 0.08, coords[1] + -0.01,
    coords[0], coords[1], 10., units='mi',
    fontsize=16, barstyle='fancy', labelstyle='simple',
    fillcolor1='w', fillcolor2='#000000', fontcolor='#555555',
    zorder=5, ax=ax,)

# ncolors+1 because we're using a "zero-th" color
cbar = custom_colorbar(cmap, ncolors=len(jenks_labels)+1, labels=jenks_labels, shrink=0.5)
cbar.ax.tick_params(labelsize=16)

current_date = time.strftime("printed: %a, %d %b %Y", time.localtime())


#fig.suptitle("Time Spent in Seattle Neighborhoods", fontdict={'size':24, 'fontweight':'bold'}, y=0.92)
ax.set_title("Time Spent in Seattle Neighborhoods \n Using location data collected from my Android phone via Google Takeout",
             fontsize=14, y=1)
ax.text(1.62, -.06, "Collected from 2013-2017 on Android 4.2-6.0\nGeographic data provided by data.seattle.gov \n%s" % (current_date), 
    ha='right', color='#555555', style='italic', transform=ax.transAxes)
ax.text(1.62, -.09, "kivanpolimis.com ", color='#555555', fontsize=15, ha='right', transform=ax.transAxes)

plt.savefig('seattle_choropleth.png', dpi=100, frameon=False, bbox_inches='tight', pad_inches=0.5, facecolor='#F2F2F2')
Code
Image("seattle_choropleth.png")

Hexbin Map

We can also take a different approach to choropleths, and instead of using each neighborhood polygon as a bin, let Basemap generate uniform hexagonal bins for us. Hexbin maps are great way to visualize point density because all bins are equally sized. Best of all, it requires essentially no extra work as we’ve already defined our neighborhood Patches and paired down our location data. The code for creating this hexbin map below is in hexbin.py

Code
#%run hexbin.py
Code
"""PLOT A HEXBIN MAP OF A LOCATION
"""
figwidth = 8
fig = plt.figure(figsize=(figwidth, figwidth*height/width))
ax = fig.add_subplot(111, axisbg='w', frame_on=False)

# draw neighborhood patches from polygons
df_map['patches'] = df_map['poly'].map(lambda x: PolygonPatch(
    x, fc='#555555', ec='#555555', lw=1, alpha=1, zorder=0))

# plot neighborhoods by adding the PatchCollection to the axes instance
ax.add_collection(PatchCollection(df_map['patches'].values, match_original=True))

# the mincnt argument only shows cells with a value >= 1
# The number of hexbins you want in the x-direction
numhexbins = 50
hx = m.hexbin(
    np.array([geom.x for geom in city_points_list]),
    np.array([geom.y for geom in city_points_list]),
    gridsize=(numhexbins, int(numhexbins*height/width)), #critical to get regular hexagon, must stretch to map dimensions
    bins='log', mincnt=1, edgecolor='none', alpha=1.,
    cmap=plt.get_cmap('Blues'))

# Draw the patches again, but this time just their borders (to achieve borders over the hexbins)
df_map['patches'] = df_map['poly'].map(lambda x: PolygonPatch(
    x, fc='none', ec='#FFFF99', lw=1, alpha=1, zorder=1))
ax.add_collection(PatchCollection(df_map['patches'].values, match_original=True))

# Draw a map scale
m.drawmapscale(coords[0] + 0.05, coords[1] - 0.01,
    coords[0], coords[1], 4.,
    units='mi', barstyle='fancy', labelstyle='simple',
    fillcolor1='w', fillcolor2='#555555', fontcolor='#555555',
    zorder=5)

#fig.suptitle("My location density in Seattle", fontdict={'size':54, 'fontweight':'bold'}, y=0.92)
ax.set_title("My location density in Seattle \n Using location data collected from my Android phone via Google Takeout", fontsize=14, y=1)
ax.text(1.35, -.06, "Collected from 2013-2017 on Android 4.2-6.0\nGeographic data provided by data.seattle.gov \n%s" % (current_date), 
        ha='right', color='#555555', style='italic', transform=ax.transAxes)
ax.text(1.35, -.09, "kivanpolimis.com", color='#555555', fontsize=15, ha='right', transform=ax.transAxes)
plt.savefig('seattle_hexbin.png', dpi=100, frameon=False, bbox_inches='tight', pad_inches=0.5, facecolor='#DEDEDE')
Code
Image("seattle_hexbin.png")

Download this notebook, or see a static view here

Code
import time
print("last updated: {}".format(time.strftime("%a, %d %b %Y %H:%M", time.localtime())))
last updated: Mon, 21 Aug 2017 15:35