Awesome Open Source
Awesome Open Source

BDIT Data Sources

This is a master repo for all of the data sources that we use. Each folder is for a different data source and contains an explanation of what the data source is and how it can be used, a sample of the data, and scripts to import the data into the PostgreSQL database.

Table of Contents

Open Data Releases

  • Travel Times - Bluetooth contains data for all the bluetooth segments collected by the city. The travel times are 5 minute average travel times. The real-time feed is currently not operational. See the Bluetooth README for more info.
  • Watch Your Speed Signs give feedback to drivers to encourage them to slow down, they also record speed of vehicles passing by the sign. Semi-aggregated and monthly summary data are available for the two programs (Stationary School Safety Zone signs and Mobile Signs) and are updated monthly.

For the King St. Transit Pilot, the team has released the following datasets, which are typically a subset of larger datasets specific to the pilot:

INRIX

Data Elements

Field Name Description Type
RoadName/Number Road names or numbers string
tx Date and time datetime
tmc TMC Link ID string
spd Link speed estimate double
count Sample Size Used int
score Quality Indicator 10/20/30

Notes

  • INRIX vehicles disproportionately include heavy vehicles.
  • There's additional sampling bias in that the heavy vehicles do not reflect the general travel patterns.
  • Two sections of freeway(the southernmost sections of 427 and 404) have no available data. These approaches may be imssing due to the idiosyncratic geometries of TMCs near the major freeway-to-freeway interchanges.
  • In any given 15 minute interval between 5am and 10pm, 88.7% of freeway links and 48% of arterial links have observations.

BlipTrack Bluetooth Detectors

Data Elements

Historical Data

Field Name Description Type
TimeStamp timestamp datetime
StartPointName startpoint name of segment string
EndPointName endpoint name of segment string
MinMeasuredTime min waiting time of users completing the route from start to end in the timeframe timestamp-resolution to timestamp int
MaxMeasuredTime max waiting time of users completing the route from start to end in the timeframe timestamp-resolution to timestamp int
AvgMeasuredTime average waiting time of users completing the route from start to end in the timeframe timestamp-resolution to timestamp int
MedianMeasuredTime median waiting time of users completing the route from start to end in the timeframe timestamp-resolution to timestamp int
SampleCount the number of devices completing the route from start to end in the timeframe timestamp-resolution to timestamp int

Retrieval

  • Interfaces for retrieving data
    1. Export from Bliptrack GUI at g4apps.bliptrack.net
    2. Display API: REST-based interface returns live data as JSON or xml in a HTTP(S) response
    3. WebService API: SOAP-based for programmatic access to the system.
  • Functions that web service includes:
    • getAvailableDisplayIds() – returns a list of available Public Displays;
    • getDisplayInfo() – returns detailed information for a Public Display for a given display ID;
    • getPublicDisplayData() – used to get Public Display data (current result set) for a given display ID;
    • getDisplayData() – used to get Public Display data (current result set) for a given display ID (for displays with Restricted Access enabled);
    • getExportableAnalyses() – returns a list of analyses, each with all required information for export;
    • getExportableLiveAnalyses() – returns a list of live analyses, each with all required;
    • information for export;
    • getFilteredAnalyses() – returns a list of analyses matching a specified filter, each with all required information for export;
    • exportDwelltimeReport() – used to export Measured Time data;
    • exportLiveDwelltimeReport() – used to export Live Measured Time data;
    • exportKPIReport() – used to export KPI data;
    • exportQuarterlyReport() – used to export Quarterly KPI data;
    • exportCounterReport() – used to export Counter Reports;
    • getCurrentDwellTime() – used to get current dwell time for a live analysis;
    • getCurrentDwellTimes() – used to get current dwell time for a list of live analyses in a single call;
    • exportPerUserData() – used to export individual dwell time measurements for an analysis; and
    • getCustomCurrentDwellTime() – used to get current dwell time for a live analysis with custom parameters.

GIS - Geographic Information System

Text Description to Centreline Geometry Automation

gis/text_to_centreline/ contains sql used to transform text description of street (in bylaws) into centreline geometries. See the README for details on how to use.

Volume Data

volumes/ contains code and documentation on our many volume datasources:

  • miovision: Multi-modal permanent turning movement counts
  • rescu: ATR data from loop detectors
  • FLOW Data: A database of short-term ATR and TMCs

Miovision - Multi-modal Permanent Video Counters

Miovision currently provides volume counts gathered by cameras installed at specific intersections. There are 32 intersections in total. Miovision then processes the video footage and provides volume counts in aggregated 1 minute bins. Data stored in 1min bin (TMC) is available in miovision_api.volumes whereas data stored in 15min bin for TMC is available in miovision_api.volumes_15min_tmc and data stored in 15min for ATR is available in miovision_api.volumes_15min.

Data Elements

Field Name Description Type
volume_uid unique identifier for table integer
intersection_uid unique identifier for each intersection integer
datetime_bin date and time timestamp without time zone
classification_uid classify types of vehicles or pedestrians or cyclists integer
leg entry leg of movement text
movement_uid classify how the vehicles/pedestrians/cyclists cross the intersection, eg: straight/turn left/turn right etc integer
volume volume integer
volume_15min_tmc_uid unique identifier to link to table miovision_api.volumes_15min_tmc integer

Notes

  • Data entry via Airflow that runs Miovision API daily
  • volume_uid in the table is not in the right sequence due to different time of inserting data into table
  • Although Miovision API data has been available circa Summer'18 but the data is only more reliable May 2019 onwards?
  • miovision_api schema currently have data from Jan 2019 onwards but data prior to May 2019 contains many invalid movements
  • Duplicates might also happen at the Miovision side (happened once thus far)
  • Quality control activities:
    1. unique constraint in miovision_api volumes tables
    2. raise a warning flag when try to insert duplicates data into the table

RESCU - Loop Detectors

Road Emergency Services Communication Unit (RESCU) track traffic volume on expressways using loop detectors. More information can be found on the city's website or here.

Raw data is available in rescu.raw_15min whereas processed 15-min data is available in rescu.volumes_15min.

Data Elements

Field Name Description Type
volume_uid unique identifier for table integer
detector_id unique identifier for each detector text
datetime_bin date and time timestamp
volume_15min volume integer
artery_code artery code integer

Turning Movement Counts (TMC)

Data Elements

  • Location Identifier (SLSN Node ID)
  • CountType
  • Count interval start and end date and times
  • AM Peak, PM peak, and off-peak 7:30-9:30, 10:00-12:00,13:00-15:00,16:00-18:00
  • Roadway 1 and 2 names (intersectoin)
  • 15 min aggregated interval time
  • 15 min aggregated volume per movement (turning and approach) by:
    • vehicle types
    • cyclists and pedestrian counts are approach only

Notes

  • No regular data load schedule.
  • Data files collected by 2-3 staff members.
  • Manually geo-reference volume data to an SLSN node during data import process
  • Data is manually updated into FLOW.
  • Counts are conducted on Tuesdays, Wednesdays, and/or Thursdays during school season (September - June) for 1 to 3 consecutive days
  • Strictly conforms to FLOW LOADER data file structure
  • If numbers collected varies more than defined historical value threshold by 10%, the count will not be loaded.
  • Volume available at both signalized and non-signalized intersections
  • Each count station is given a unique identifier to avoid duplicate records
  • Data will not be collected under irregular traffic conditions(construction, closure, etc), but it maybe skewed by unplanned incidents.

Permanent Count Stations and Automated Traffic Recorder (ATR)

Data Elements

  • Location Identifier(SLSN Link (Node?) ID)
  • Count Type
  • Count interval start and end date and times
  • Roadway Names
  • Location Description
  • Direction
  • Number of Lanes
  • Median and Type
  • Comments
  • 15 min aggregated interval time
  • 15 min volume

Notes

  • The counts represent roadway and direction(s), not on a lane-by-lane level
  • No regular data load schedule
  • Manually geo-reference volume data to an SLSN node during data import process
  • Strictly conforms to FLOW LOADER data file structure
  • Typical ATR counts 24h * 3 days at location in either 1 or both directions
  • Each PCS/ATR is given a unique identifier to avoid duplicate records

Vehicle Detector Station (VDS)

Data Elements

  • Location Identifier (SLSN Link ID)
  • Count Type
  • Roadway Names
  • Lane Number
  • 15 min aggregated interval times
  • 15 min aggregated volume, occupancy, and speed

Notes

  • Raw 20sec interval VDS data is available on the processing server, not loaded into FLOW
  • VDS device health/communication statuses are not recorded.
  • Asset information managed in Excel spreadsheets
  • Automated daily import but no real-time integration
  • Strictly conforms to FLOW LOADER data file structure
  • Quality control activities:
    1. data gap verification
    2. partial data records flagged for manual verification/correction

Incidents

Data Elements

  • Unique system (ROdb) identifier
  • Location
  • DTO district
  • Incident start and end times
  • Incident description free form
  • Incident status and timestamps
  • Police activities and timestamps
  • RESCU operator shift information

Notes

  • Manual data entry
  • Location description from a dropdown list
  • Manual location selection on a map based on location description

Road Disruption Activity (RoDARS)

Data available in city.restrictions

Data Elements

Field Name Description Type
id Unique system identifier string
description project description freeform (direction and number of lanes affected and reason) string
name location description string
road road of disruption string
atroad road at cross if disruption zone is an intersection string
fromroad start crossroad if disruption zone is a segment string
toroad end crossroad if disruption zone is a segment stirng
latitude/longitude geo-information (not always occupied) double
district district of location string(from dropdown list)
roadclass road types string(from dropdown list)
expired event status 0:ongoing;1:expired
starttime start time(may not be accurate) timestamp
endtime end time(may not be accurate) timestamp
workperiod Daily/Continuous/Weekdays/Weekends(not always occupied) string
contractor contractor name(not always occupied) string(from dropdown list)
workeventtype work event types(not always occupied) string(from dropdown list)

Notes

  • Information is collected via applicant submission of RoDARS notification form
  • Data entry into ROdb via dropdown list of values
  • No system integration with special events and filming departmental systems
  • Crucial elements of information are in free-form such as lane blockage/closure
  • Roadway names from a dropdown list that conforms to SLSN

CRASH - Motor Vehicle Accident Report

Data Elements

  • Unique MVAR identifier (externally assigned by Toronto Police Service)
  • Accident Date & Time
  • Type of collision
  • Accident location: street name(s), distance offset, municipality, county, etc.
  • Description of accident and diagram
  • Involved persons:
    • Motorist/passenger/pedestrian/cyclist
    • Person age
    • Gender
    • Injuries and Fatalities

Notes

  • No real-time data integration
  • Manual data integration with TPS and CRC via XML file exchange (not reliable or consistent)

Vision Zero - Google Sheets API

This dataset comes from Google Sheets tracking progress on implementation of safety improvements in school zones.
Data Available in vz_safety_programs_staging.school_safety_zone_2018_raw and vz_safety_programs_staging.school_safety_zone_2019_raw

Data Elements

Field Name Description Type
school_name name of school text
address address of school text
work_order_fb work order of flashing beacon text
work_order_wyss work order of watch your speed sign text
locations_zone coordinate of school text
final_sign_installation final sign installation date text
locations_fb location of flashing beacon text
locations_wyss location of watch your speed sign text

wys: Watch Your Speed Signs

The city has installed Watch Your Speed Signs that display the speed a vehicle is travelling at and flashes if the vehicle is travelling over the speed limit. Installation of the sign was done as part of 3 programs: the normal watch your speed sign program, mobile watch your speed which has signs mounted on trailers that move to a different location every few weeks, and school watch your speed which has signs installed at high priority schools. As part of the Vision Zero Road Safety Plan, these signs aim to reduce speeding.

The wys/api folder contains a Python script that pulls the data from a cloud API daily as well as the sql structure to aggregate the data.

Data Elements

The data is inserted into wys.raw_data. Data from the API is already pre-aggregated into roughly 5 minute bins.

Field name Data type Description Example
raw_data_uid integer A unique identifier for the raw_data table 2655075
api_id integer ID used for the API, and unique identifier for the locations table 1967
datetime_bin timestamp Start time of the bin 2018-10-29 10:00:00
speed integer Exact speed of the number of vehicles in count 47
count integer Number of vehicles in datetime_bin/api_id/speed combination 2
counts_15min integer A unique identifier for counts_15min table. Indicates if the data has already been processed or not. 150102

wys.counts_15min has aggregated 15 minute time bins, and aggregated 5 km/h speed bins by using the aggregate_speed_counts_15min() function. Values for the speed bins are replaced by lookup table IDS.

Field name Data type Description Example
counts_15min integer A unique identifier for the counts_15min table 2655075
api_id integer ID used for the API, and unique identifier for the locations table 1967
datetime_bin timestamp Start time of the 15 minute aggregated bin 2018-10-29 10:00:00
speed_id integer A unique identifier for the 5 minute speed bin in the speed_bins table 5
count integer Number of vehicles in datetime_bin/api_id/speed bin combination 7

WYS Open Data

Semi-aggregated and monthly summary data are available for the two programs (Stationary School Safety Zone signs and Mobile Signs) and are updated monthly. Because the mobile signs are moved frequently, they do not have accurate locations beyond a text description, and are therefore presented as a separate dataset. See WYS documentation for more information on how the datasets are processed.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Jupyter Notebook (236,356
Open Data (1,624
Data Processing (324
Transportation (245
Related Projects