Since yesterday I got hold of an Odroid-XU4. This is a mini-Computer similar to a Raspberry Pi, but with serious CPU power: 4 cores with 2 Ghz and additionally 4 cores with 1.4 Ghz. I already compiled a version of imposm3 for ARMv7 which can be downloaded here. I will report back, as soon as I have more to tell about my experiments.
Nearly 2 years ago the OpenGovernment cadastre of trees of the city of Vienna was imported into OpenStreetMap. All trees existing in OSM were not imported to not damage already mapped data. The thing still missing is to check which trees are already mapped and did not originate from the original import.
This post is a short update on my way to a script to automatically align the not yet entered OGD trees with already mapped OSM trees. So far, I wrote a python script that checks for existing entries by location. Thanks to spatial indexing and the very easy to use rtree library, this check can be done within about one minute but I’m not yet convinced by its results. The image below shows an excerp of the results of this analysis.
One thing that can be seen is that the city of Vienna added some more small parks to the dataset. As stated above, don’t take this image too seriously, it displays just preliminary results.
I’m currentrly thinking about re-implementing the script directly within the great QGIS Processing toolbox. This would make geographic debugging a lot easier since one can use the QGIS map window for outputting results while processing data.
For certain projects it might be necessary to work on the complete dataset of the OpenStreetMap project (OSM). My preferred way of using data from OSM is usually done by performing the following steps:
- download the OSM data
- selecting my area of interest by clipping it to my preferred extent
- importing the data into a PostGIS database
The complete database-dump of OSM is called the “Planet File” and weights at the moment about 36GB in its compressd form.
To clip the data I usually resort to a very handy command-line tool called Osmium where you are able to define a region of interest and/or specify certain tags to filter the data by. The import into a PostGIS database can be performed by multiple tools, of which I prefere one called Imposm3 because of its speed. But still, when using it on the complete planet file with limited ressources (I used a dual core CPU and 8GB ram), it gets terrible slow and does not complete without error. My guess is, that the index which is created during the import procedure to access the data itselfe becomes so large that the access to it is not fast enough. For the extract of alone the continent of Europe, the index is more than 20GB.
The logical thing to do is to split the data prior to import and use different databases for each part of the dataset. When using the data later on, one will have to apply whatever further steps are taken to each database individually, but still – when not using too many splits, this should be not a lot of hassle.
What would be more reasonable than to split by continents? Luckily the company Geofabrik already offers extracts of the planet file split by continents and countries, if prefered.
Instead of splitting the complete dataset on my own, I wrote a small script which downloads each continent on its own and then goes on to import each file in a unique PostGIS database. This procedere was quite fast for all countries, except Europe.
Still too Large
If you take a closer look at the metadata of each extract by continent, you can see, that Europe is by far the largest one. While the import of the other continents was done in a matter of hours, the one of Europe took over a day before my computer automatically logged off and on again. Sadly, I have no idea what happened in detail since I was not present when this incident occured.
But then I got another idea. I observed that the import procedure of Europe was happening at an remarkably slower rate than the other ones. What if there was some kind of internal timeout? The amount of system ressources used during the import did not change during the firt two hours. Assumin it dit not change during the complete import procedure, this can not be the reason for stopping.
The complete import procedure was done on a traditional harddrive. The PostGIS database also is located on the same HDD. What I did next, was to specify the -cachedir parameter of Imposm3 to use my internal SSD drive. This should speed up sequential access to the readout of the index which happens a lot during the import.
And so it happened! The import itself was about 10 to 15 times faster when using the SSD as storage medium for the temporary index. This is still not as fast as with the other continents, but still ways faster than before. The source file itself as well as the PostGIS database remained on the HDD. This was great news, since the index is generated only once but accessed millions of times. As far as I know, only write access to SSDs wears them down.
A problematic situation migth occure when paying special attention to the areas on the border between two continents. Since the sliced data overlaps for a certain amount, there is redundand information. In case for the “places” layer shown in the image above, I merged all input datasets by the “Merge shapes layers” SAGA command in the Processing toolbox of QGIS. After that I could apply a cleaning algorithm to the dataset. This might not be practical (or even feasable) with very big layers, so there has to be found another solution.
To avoid this problem, one could split the planet file into regions by oneself and care to cut sharply.
I can recommend using the “continent split planet” way of importing OpenStreetMap data when not having access to a big server with loads of RAM and cpu-cores. The hassel of e.g. reapplying the same cartographic design to each of the continents individually is within limits, since there are only 7 (in case of the splits 8) of them.
Do you know what a “Schwarzplan” is? In English it is called “figure-ground diagram” and is a map showing all built areas as black and everything else in white colour (including the background). Here is an example of such a map for the area of Paris.
From such a map you can gain insights into many different aspects of urban structures. So it is for example possible to recognize hints of the age of parts of the city, their kind of reignship, their history of development, suspected density and many more. Furthermore, figure-ground diagrams have an aesthetically appealing effect and strongly underline the differences between different cities when comparing them.
Up until now, there was no atlas focussing on this kind of map which was sad . This might be due to the fact that one would have to get costly data about the buildings for each city individually But thanks to the OpenStreetMap Project I can present to you the atlas of figure-ground diagrams, a book called:
“Schwarzplan” is an atlas displaying figure-ground plans of 70 different cities all over the world together with some basic statistical information about their populations and extent on 152 pages made out of high quality glossy paper.
Since a week ago you can download the Upper Austria Quiz from the Google Play store: OÖ Quiz on Google Play
I always wanted to learn how to program mobile devices. During Christmas holidays the government of Upper Austria called for mobile apps using their OpenGovernmentalData. So, the chance was taken and Kathi and me started to design an Android app. Actually it is an hybrid app. This means that you develop your complete app as an HTML5 project and bundle it with a small server. The result is a valid android APK which behaves like a native app. The advantages are that the toolkit we used to generate the package (Apache Cordova) is capable of producing running binaries for a whole bunch of operating systems, including Windows 7, BlackBerry, Symbian, WebOS, Android and, if you can afford it, iOS. Sinc we can’t pay 100$ per year to be allowed to just push the Quiz to the AppStore, there is no version for iPhone.
Since we both own Android devices, the app is optimized for Android devices. Still, there were some difficulties. First, we had to decide on a framework. jQuery Mobile seemed to be a reasonable choice at that time since it is well known and I already are used to default jQuery. Next time, I will go with a toolkit supporting any kind of MVC architecture that avoides usage of HTML whenever possible. One of these would be e.g. Kendo UI. But we chose jQuery Mobile and stuck to it.
The size of the toolbar at the bottom was a perculiar problem: While it was easy to adjust the size of the font depending on different screen resolutions of different devices, the toolbar would not change its display height.
We decided that despite its fixed behaviour it will be usable in any case, even when being very small and optimized it for the lowest screen resolution we could find. This means the app runs on even very small devices.
The map background is variable in size and automatically adjusts to the display size.
The questions are generated out of OGD and OSM data. The most work was not to generate the questions which was mostly a case of combining a fixed phrase with a word describing a location, but to manually filter these questions for plausability. It just does not make any sense to let the quiz ask for a very small mountain which only a few people of Austria know by name themselves. The same goes with lakes – in this case we filtered by their area.
It was a fun project and I find myself playing the quiz which is a good sign. Kathi and me, we already have ideas to extend the OÖ Quiz and even have plans for a quite advanced second version with multiplayer support – it depends on our free time whether we will implement it or not.
The OpenData initiative of the city of Vienna (http://data.wien.gv.at/) has released a dataset about all fire hydrants of vienna. On the 17th of April it contains of 12702 unique entries and no additional attributes.
One can see that the area of Vienna is completely covered. The dataset itself seems to be quite accurate.
In OpenStreetMap, a fire hydrant is marked with the tag “emergency=fire_hydrant”. When filtering the OSM dataset of Vienna from the 16th of April the following hydrants (blue dots) are revealed:
The sum of all fire hydrants in the OSM dataset was 67. They are not placed as dense as for example trees, therefore it is easier to find out which of the fire hydrants in the OGD dataset should not be imported.
Preparing for Import
A quick analysis revealed that with a buffer of 20 meters, most of the already mapped fire hydrants (63) can be detected and omitted.
Yesterday the import of the OpenGovernment tree cadastre of vienna was performed. The data used for the import is from the 27th of November 2012.
The final dataset which was used for the import can be found here (http://gisforge.no-ip.org:5984/datastore/osm/ogd_trees_wien_selected.osm.bz2) while the trees that have been omitted can be downloaded here (http://gisforge.no-ip.org:5984/datastore/osm/ogd_trees_wien_notselected.osm.bz2).
Changes to the Original Method
The method used is the one as is described in previous posts with the exception that the substitution of certain entries in the original dataset was done by the python script and not manually by the user.
As was proposed by Friedrich Volkmann, every entry that does not fulfill the following criteria is marked with a the tag “fixme=Baum oder Strauch“.
if height <= 2 and (int(datetime.datetime.now().year) - year) >= 3
This means that every entry in the cadastre of trees which is smaller than 2 meters but is older than 3 years is suspiciously small in size and should be checked manually.
As can be seen by the following image (area around “Urban Loritz Platz”), already mapped trees could successfully be removed from the dataset. No trees are mapped multiple times or are removed.
The dataset with omitted trees can be downloaded by the link provided at the top of this post.
After the Import
The most impressing thing about the import are the parks. While it can get a bit confusing, the many green dots give a good idea of the tree cover.
Another significant display of the imported trees is their position besides streets:
The display of trees should provide the reader of the map a more thorough impression of the area he is looking for. But this might also just be too much information and make the map less readable. This is open to debate.
Interestingly, the linear structure of the imported trees can be used to detect poorly mapped streets. This is the case when e.g. a row of trees crosses the actual street which is only true in rare cases.
What has yet to be performed is the conflation of attributes of already mapped trees with the OGD cadastre of trees. On problem that arises here is that the provided reference number for trees is not unique. Maybe it never was intended to be unique or the data provided does not contain all information necessary. Either way, one has to perform an additional search by location to distinctly identify a tree.
Also, the OGD cadastre of trees is constantly changing because trees are removed or new ones are planted. So it makes sense to think about a method to automatically keep the OSM trees up to date with the OGD cadastre of trees.
If you ever want to simply preview any of the data released by the city of Vienna by their Open Government Data initiative on top of an OpenStreetMap layer, you now can do so.
I present the “OGD Wien Preview and Matcher”. It can be accessed by http://www.gisforge.com/ogdmatcher .
With this web application it is possible to display any spatial dataset available on the servers of OGD Vienna. The list which displays the datasets is generated automatically each time the application is accessed. So, it always reflects the current state. Also, only when selecting a specific set of data, it is loaded from the OGD source. The downside of this method is that there is a slight delay while waiting for the chosen dataset to arrive. But on the other hand, it is not necessary to store this data anywhere. Additionally to that, the displayed information is always the most recent one.
At this moment information is displayed by entities filled only with yellow colour and black border. When too many points are displayed at once, they are grouped together. Line and polygon features are always displayed as they are. When clicking on a feature all information associated with it is visualized. It has to be mentioned that because of the experimental nature of the methods used to construct this application, umlauts are not encoded.
A possibility to compare point datasets from OGD Vienna with a copy of OpenStreetMap data is given. Due to the bad performance of the server the procedure of comparison runs very slow and is not of much use until now.
The web application has a problem with Microsoft Internet Explorer that I yet have to track down. While data is loaded with this browser, it takes for ever to process its display on the map.
The Raspberry Pi is a cheap credit-card-sized ARM-computer with 700MHZ and 256MB of RAM that consumes only about 3.5W energy. It costs about 30€ and runs an optimised debian-linux named “Raspbian“. Since I have heared about it I wanted to see how it performs with a CouchDB installation. CouchDB is a document-based database that should perform well under low-ram circumstances. This is perfect for the Raspberry Pi. There exists a spatial extension named Geocouch which allows the use of a spatial index.
What could be the benefits of this kind of system? Clearly the benefits only lie within a system-architecture that only seldom updates the data but regularily queries it. The swapping of an OSM-database to an autonomous hardware makes the system independet from the main computer. Also, depending on further research on possible queries, such a system could prove to be a versatile information-gateway for OSM-data and lift some weight on heavily used default APIs offered by the OSM-project. With a setup like this, it could be possible for everyone to set up a information-delivering API that is cost-effective to set-up and to contain.
So, it was time to test this configuration against an OpenStreetMap dataset! The setup of CouchDB was not 100% straightforeward, after installing the package I had to modify the start-up script because there was a problem with the ownership of an automatically created directory. For the installation of Geocouch I had to compile it by myself and again modify the start-up script since the method proposed in the readme of the Geocouch-project did not work for me. (I will not go into more detail about setting this environment up, but maybe I will write a post about this later on)
On the hardware-side, additionally to the 8GB-SD card that held the system, I attached a USB-HDD with 400GB space on it. I had to reconfigure the CouchDB configs to relocate its storage of the database as well as its views to a directory on the USB device.
I had two datasets at hand: First all points in the area of vienna, which sum up to 445220 entries and second the complete OSM-dataset for Austria.
To convert OSM data to JSON format and batch-upload it to the CouchDB I used the method and tools described on the OSMCouch page of the OpenStreetMap Wiki. It uses the great OSMIUM framework in combination with a custom description-file to generate GeoJSON compatible output. Preparing the dataset for the extent of Vienna was not that time-consuming, whereas the one for Austria took quite some time to process and resulted in a 6.1GB JSON file. The pre-processing of these datasets was not done on the Raspberry Pi but on a much more powerful computer.
Upload to CouchDB
Uploading was done by a little script also mentioned on the OSMCouch page named “chunkybulks.py”. What this script does is just take a huge JSON-file and upload it to a specified CouchDB-database in chunks of 10.000 entries (the size can be specified manually). If one would try to upload a big file at once, the server most probably could not cope with that. On the same page there is a note saying that the software ImpOSM soon will come with integrated CouchDB support but since it was not yet available I sticked to the old but still reliable method.
This method worked fine when uploading the Viennese points, but with the complete dataset of Austria, it crashed after 450.000 inserted entities. I guess this was because the CouchDB-server could not response fast enough because the bulk size was too large (10.000). After changing it to 1.000 it uploaded entries but it was terribly slow.
Index Generation, Queries
Now came the critical part: Simply put (maybe too simply, but to get the general idea it is good enough), in CouchDB queries are pre-defined by java-script functions. Such a pre-defined query is called a “View”. For faster access, CouchDB generates an index per view. The generation of the index is very time-consuming but once the index is available any future query will be very fast. I’m especially interested in the performance of these kind of queries which rely on an already defined index.
So, I had to create a view – preferably a spatial-view (for a more detailed explanation on this topic, please take a look at the Geocouch-Dokumentation) and execute it once to trigger the generation of an index. As expected, the generation of the index for all 445220 entries in the Vienna-dataset took hours. The index generation for the dataset of Austria took days.
When querying the dataset there is a slow delay noticeable but it happens quite fast considering the power of the RaspberryPi and the size of the database.
One interesting question is: What will happen if I use the complete world-file? Given enough hard-disk space and time, the upload and generation of indices should be possible – but how fast will the query be?
Another thing wich will be interesting is how the import method of the future version of imposm will perform, especially since it has support for DIFFs!
As Friedrich Volkmann from the Austrian OSM-Mailinglist proposed, the single entries from the OpenGovernmentDataset “Baumkataster” do not only include trees, but also shrubs. So, this dataset would better be named “Wood Cadastre” than “Tree Cadastre”. The problem hereby is, that the definition for the OSM tag “natural=tree” only includes trees. So, there has to be applied an additional filtering mechanism. Friedrich proposed to make the decision based on the height of the tree in relation to his age.
I implemented his proposal by checking if the tree is smaller than 2 meters while older than 3 years:
height <= 2 and (int(datetime.datetime.now().year) - year) >= 3
Additionally to this, it would be best to define unique rules for each of the over 90 types of trees. But this is a huge amount of work and in my opinion it is questionable whether this leads to better results or not. After doing some experiments it all comes down to only a handful of trees which would be excluded. With the current general implementation this comes to 678 trees.
It is important to mention that any tree excluded by this method is not ignored but still imported. The difference is, that it is assigned another special tag: “fixme=Baum oder Strauch” (tree or shrub). Only by checking manually the real habit of the plant can be determined.
Andreas Trawoeger is currently working on a yet to be released live-preview-map which is overlaying the trees of the OGD-dataset with the ones already inside OSM. This will give a better overview of the extent of the dataset in question.