I'm thinking that we could build an HTML -> Web Service tool that
would normalise government data from local councils that at the moment
is only accessible via HTML. My use case is this:
- Rateable values are available on most council websites
(http://www.horizons.govt.nz/default.aspx?mode=0&pageid=37&ratingsid=11220/048.00&HorizonsRatesGrid_current_page=
)
- The data is Crown Copyright -
http://www.horizons.govt.nz/default.aspx?pageid=340
- Every council seems to have it in a slightly different format but
the basics are the same (Valuation Number, Address, Rating Period,
Legal Description, Area, Capital Value, Land Value)
If we could build a standard REST interface
"http://api.open.org.nz/council/rates/11220/111.00AA
" and build scrapers in behind the scenes to gather the data it could
be quite a useful service. I'm sure there is quite a bit of other
information that would be useful and is also stored in HTML on the
council sites. Although we might not be able to get 100% of the
councils we could get a fair chunk of them done with relatively little
work.
Does anything like this exist already?
On 9/07/2009, at 2:37 PM, Glen Barnes wrote:
> Although we might not be able to get 100% of the
> councils we could get a fair chunk of them done with relatively little
> work.
You're reading my mind again, Glen :) I can easily do the scrapage
(nearly two decades in the Perl mines weren't misspent!). How often
does the data change?
The data doesn't change that often (the councils are required to
update Rating Valuations at least every 3 years I think) but people
can challenge their valuations and mistakes can be fixed so the
document may not be static for the whole period. Ideally we could get
an update log from councils but obviously this cannot happen within
the services currently provided by councils so this virtual API is a
first step and still very useful to others. I would expect the
service to work something like this:
- Request comes in for record 'nnnn/nnnn.nn'
- We check our cache and see if we have a current version (current
being within the RV period OR older than say 1 month). This will give
us a reasonable cache while not hitting council servers too often.
- If we have a cached version then we serve it up
- If we don't have a cached version then we scrape it and return the
value
- If we don't have a scraper for that council then we return some form
of error document/HTTP response
We can keep track of stats such as how many requests for each council,
how many we can fulfil, etc. If/When the councils contact us to see
WTF we are up to then we offer to help them provide this service
themselves and help define the API.
Rest of post
Glen
On 10/07/2009, at 11:27 AM, Nathan Torkington wrote:
> On 9/07/2009, at 2:37 PM, Glen Barnes wrote:
>> Although we might not be able to get 100% of the
>> councils we could get a fair chunk of them done with relatively
>> little
>> work.
>
> You're reading my mind again, Glen :) I can easily do the scrapage
> (nearly two decades in the Perl mines weren't misspent!). How often
> does the data change?
>
> Nat
>
> -----------------------------------------
> Full text of this topic in The Open Government Ninjas:
> http://groups.opengovt.org.nz/r/topic/5pBIi1VoKqbC1sc7AWFfRM
>
> To leave The Open Government Ninjas, email
> <email obscured>?Subject=unsubscribe
>
> Start your own free groups and site with
> OnlineGroups.Net http://onlinegroups.net
>
> Host your own online groups site with
> GroupServer http://groupserver.org