What data should be included in the City of Calgary’s Open Data Pilot Project

By DJ Kelly June 15, 2010

The City of Calgary’s Open Data Pilot Project is set to begin this summer. (Despite recent attempts to quell the future of the project at a council committee meeting. More on that in a future post.)

As the project is being mapped out moving toward a launch date, it’s important to note that it will only be as successful as the usefulness of the data included in the catalogue. Poor design or minor mistakes can be overcome and corrected, but a lack of useful data almost certainly will lead to a failure of the pilot project. This more than anything will determine how many developers and academics make the choice to get involved and try to create something out of the information provided in the data catalogue. If there isn’t much data, or the data provided isn’t very useful, the project will crumble.

So in the interest of helping things get off on the right foot, I’ve put together a list of the data I would like to see included in the initial pilot catalogue this summer.

1. Community and Ward Boundaries
Most of the conversations I have had with people about open data revolve around being able to mashup City data, or data they have accumulated themselves, with mapping data of Calgary to be able to show a visual representation of their data set. Specifically what is required is information about areas of the City that programmers may want to segregate their data by. (For example, creating a map where neighbourhoods with the lowest income appear light yellow, those with the highest appear dark yellow.) In order to be able to do almost ANYTHING useful with any data the City might provide, programmers will NEED to have the GIS created data outlining the boundaries of neighbourhoods and wards. Without providing this information I’m confident the entire open data project will be nothing more than an interesting internal exercise for the City. This will be the tell-tale sign how serious they are taking transparency and accountability: if the City publishes the mapping data for neighbourhoods and wards they have given the pilot project a reasonable chance of success, if they don’t, then it’s fair to think they’re not taking it seriously.

The next three data sets I think are required because of the first two Laws of Open Government Data:

  1. If it can’t be spidered or indexed, it doesn’t exist
  2. If it isn’t available in open and machine readable format, it can’t engage
  3. If a legal framework doesn’t allow it to be repurposed, it doesn’t empower

2. Community Statistics
The City of Calgary produces and posts on it’s website statistics for every community in Calgary. There is a ridiculous amount of interesting and immensely usable data contained in these reports that are updated every few years after a census is completed. However unfortunately you can’t do much with the documents because they are PDFs. You can read each one individually and that’s about it. Right now it is impossible to do comprehensive comparisons because the information is not open and machine readable (and therefore doesn’t engage as much as it could). To make this data available in CSV format would greatly increase its usefulness and potential. The City has made it available to the public for a reason. Making it available as part of an open data catalogue would go a long way to fulfilling that reason.

3. Transit Schedules and Stops
Wow do Calgarians like to complain about Calgary Transit schedules and the Calgary Transit website. For the most part I disagree on the former, but I too find the website’s trip planning functionality cumbersome. You know what though? I say if whiners like me want to complain, then let them try to make something better. There are hundreds of applications online and on smart phones that do what the City is trying to do, but better and cheaper. This might be the conservative side of me coming out, but I say it’s time the City got out of the way and let these small business people show us why they are so good at what they do. If the City were to make transit schedules and stops available I’m confident that within a month we will see current app providers add Calgary to their rosters, thereby giving Calgaraians dozens of new – and more than likely better – ways of planning their Calgary Transit trips. (And yes, if they wanted to, Transit could even eventually partner with ones they liked, shut down their site, and save some major money this way.) They’ve already done this with Google so let’s give the small guys a chance too.

4. Crime Statistics and Locations
Again, all this information is available online for free to the public, but it is behind a proprietary wall. The City of Calgary Police I’m confident spent a lot of money making their “Crimes Web Mapping Application” that they didn’t need to. There are many crime map providers out there that would be happy to do this job for them, if only they made the data available in a machine readable format. The other – and more important reason – this data should be made available in a machine readable format (instead of only via the map application where it can only be read and not used) is so it can be mashed up with other data sets. If someone were to, for example, mash it up with the community statistics or locations of services we might be able to see some patterns emerging and create an even more effective police presence where potential crimes might occur in the future. The police do this currently using anecdotal evidence and personal/personnel experience, but open data allows for all kinds of potential permutations to be created by others that the police may not have the time or money to undertake. We already allow for this kind of work to happen via the most successful public engagement initiative undertaken by police of all time: 911. If they trust us to report the crimes, they should trust us to do something useful with the data too.

5. Fire, Police, Recreation Centre, Community Centre and School Locations
This one is almost a no-brainer. This information is surprisingly hard to find, yet it is so basic. I can only imagine how much more useful it would have been to have this information when we were house hunting a few years ago. (I’d love to see this info and the crime data mashed up with the Canadian Real Estate Association’s MLS.) But I can’t imagine how many other fantastic mapping systems may be created if this data were available in a consistent format. Simply listing the name of the building, it’s street address and it’s longitude/latitude coordinates should be more than enough, and easily put together by anyone at the City in an afternoon.

6. Development Permit Locations and Contact Information
It baffles me that the City publishes crime data in a map but not development permit locations on a map. Any citizen can go down to City Hall and get a copy of the permit for any construction occurring in the City, but this information isn’t published online for some reason. I would have thought it would be a privacy concern of some kind, but that doesn’t make sense either considering the name and phone number of each permit applicant is published on a blue board out front of every location during a two-week window before construction begins. (I think it is also included in the newspaper advertisements during this window too.) This would be great information to have available in a useful format like CSV and KML instead of just a document file at the planning office and on a sandwich board on the street. As a community association president, this would certainly cut down on phone calls at the very least! And would be helpful in keeping track of all development going on in our neighbourhood.

It is important to note that ALL of these suggestions involve ONLY data that is already publicly available, but just in a format that limits the data’s usability and usefulness (such as PDF or proprietary software solutions). The good news about this is there will be many less hoops to jump through in order to get the data included in a pilot. I can think of many other data sets I’d like to see available, but let’s start with the low hanging fruit.

There is one data set that is not currently available to the public that I would like to see included in the initial data catalogue however that is not currently. It’s not really “data” per se, but I think it is something, which should be made available:

7. City of Calgary Contracts
I outline my rationale for this request in this blog post. It probably won’t be in the initial data catalogue, and that’s okay, but the conversation and process required to make this data available in the very near future should begin now. Otherwise it could be years before we see something so simple made available to citizens.

There is one other thing however that must be sorted out before a Pilot Project can go live: the terms of use. I’m sure the City of Calgary’s lawyers have been working overtime on this one, but I would like to suggest the City uses the same terms of use the City of Toronto and City of Edmonton are using. There’s are identical. (Seriously, click those links and read them side-by-side.) Clearly if it is good enough for BOTH of those cities, some major investigation has been done to arrive at that wording. At the very least it should be used as a starting point. We should build on the work of others rather than starting from scratch. I like these terms of reference for many reasons, not the least of which is the following section of the license which alleviates much of the concern I’ve heard from some aldermen:

The City now grants you a world-wide, royalty-free, non-exclusive licence to use, modify, and distribute the datasets in all current and future media and formats for any lawful purpose. You now acknowledge that this licence does not give you a copyright or other proprietary interest in the datasets. If you distribute or provide access to these datasets to any other person, whether in original or modified form, you agree to include a copy of, or this Uniform Resource Locator (URL) for, these Terms of Use and to ensure any such person agrees to, and is bound by, them but without introducing any further restrictions of any kind.

I’m confident if we can get each of these items included in the Pilot Project, the City has done everything in it’s power to ensure it’s success.

If any readers have suggestions for other data you would like to see, you’re welcome to put it in the comments below, but you should probably send it directly to the City. (I’m just an interested citizen with no direct connection to the pilot project.)