Preparing the OpenGovernment TreeCadastre of Vienna for OSM-import (1)

The city of vienna has opended access to some of its geodata to the public. The license under which it is published is compatible with OpenStreetMap, therefore there should be no legal reason not to include any of it into the OSM-database. One of this datasets is the cadastre of trees. ( for the geometrical analysis with maps, scroll down! )

Choosing the Format

The cadastre of trees may be downloaded in various different formates among which are GML, JSON, Shapefile, KML, GeoRSS and CSV. First I went on with the Shapefile format since it is well-proven and there are different ways to access it from different programming languages. But for reasons explained later, CSV is the format to go.

Attribute Data

I used QuantumGIS to inspect the downloaded data.

Looking at the Data-Structures

When looking at the data-structue of the file, one can see the following columns:

  • tree-number (“BAUMNUMMER”): a unique number by which the tree can be identified unmistakably
  • area (“GEBIET”): the kind of surrounding of the tree
  • street (“STRASSE”): name of the street where the tree is located
  • type (“ART”): a string consisting of the latin name, the cultivar and the german name
  • year of plantation (“PFLANZJAHR”)
  • circumference of the stem (“STAMMUMFANG”): in meters
  • diameter of the crown (“KRONENDURCHMESSER”):
  • height (“BAUMHÖHE”): the height of the tree in meters
  • geometry: the actual position of the tree in geographical lat-long
A quick glimpse at the page for the tag “natural=tree” at the OSM-wiki gives an overview over the proposed tags for trees:
  • type: This distinguishes just between “broad_leaved”, “conifer” or “palm” trees. This information has to be calculated out of the “ART” field from the OGD-dataset.
  • genus: The genus is just the first part of the latin name and has to be extracted from the “ART” field
  • species: Here the complete latin name is stated
  • taxon: The taxon is for describing the taxonometry of the tree in greater detail. More information about this can be found on this OSM-wiki page.
  • sex: The sex of the tree
  • circumference: The circumference of the stem in meters.
  • height: The height in meters.
  • name: The name tag should only be used when it describes a very special tree.

 Converting the Data-Structures

The tree-number may be left out. It would be possible to identify the tree later on when maybe applying any updates to the imported dataset, but since there is no tag recommended for data like this, this would add inconsistency to the OSM-database. Also, any updates done later on can identify the tree by its location. There is no information about the sex or the name of the tree, so this information is left out. The circumference in the OSM-database is measured in meters and refers to the stem. So, this value is taken from the “STAMMUMFANG” field which is apparently in centimeters and needs to be converted. Height is the same in both datasets. The diameter of the crown has no appropriate tag in the OSM-naming scheme. This is quite disturbing since I see many cartographic possibilities to use this value. I decided to still include this value with the import by using a tag called “diameter_crown“, like it is proposed on the tree-3D-visualisation pagein the OSM-wiki.

Extraction of Genus and Species

The genus and specieshave to be extracted from the “ART” field. This is done with a python script. The “ART” field is just a string which contains the complete latin name, sometimes the cultivar in single quotes followed by the german name in parenthesis. An example: In the  string

Tilia cordata 'Greenspire' (Stadtlinde)

“Tilia” corresponds to the genus, the species is “Tilia cordata”. “Greenspire” stands for the cultivar and “Stadtline” is the german name.

The Cultivar / Taxon

It is a bit more challanging with the taxon. According to the OSM-wiki-page for taxon, it may contain any latin specification of the botanical name, even the cultivar. Also, the botanical name can be split into its parts by using sub-tags like taxon:cultivar=* . It is a bit unclear to me whether to use the genus/species tag or go on with only “taxon:species” and “taxon:genus”. I consider it best practice to stick with simple “genus” and “species” and include the cultivar with “taxon:cultivar”. The taxon itself is also extraced with the help of a python script. There are some entries that contain two cultivars separated by a comma. This disturbes the dissection process of the “ART”-field. Also, it does not make sense to include two cultivars in the OSM-database. Therefore, the values posing problems are identified manually and removed from the input-CSV before processing it with the python-script. This values and their chosen value are:

"Sumach, Essigbaum" -> Essigbaum
"Kiefer, Föhre" -> Kiefer
"Schwarzkiefer, Schwarzföhre" -> Schwarzkiefer
[edit]
There are two more entries that need to be changed:
“Malus spec. ,Apfel” -> Malus spec. (Apfel)
“Juglans nigra, Schwarznuss” -> Juglans nigra (Schwarznuss)

Determining the Type

The type is not hardwritten in the OGD-dataset but can be determined by looking a the genus of the tree. For this purpose a list of comparisons is used inside the python script:

 if genus == "": ttype = ""
 if genus == "abies": ttype = "conifer"
 if genus == "acer": ttype = "broad_leaved"
 if genus == "aesculus": ttype = "broad_leaved"
 if genus == "ailanthus": ttype = "broad_leaved"
 if genus == "albizia": ttype = "broad_leaved"
 if genus == "alnus": ttype = "broad_leaved"
 if genus == "amelanchier": ttype = "broad_leaved"
 if genus == "araucaria": ttype = "conifer"
 if genus == "baumgruppe": ttype = ""
 if genus == "betula": ttype = "broad_leaved"
 if genus == "broussonetia": ttype = "broad_leaved"
 if genus == "buxus": ttype = "broad_leaved"
 if genus == "calocedrus": ttype = "conifer"
 if genus == "caragana": ttype = "broad_leaved"
 if genus == "carpinus": ttype = "broad_leaved"
 if genus == "castanea": ttype = "broad_leaved"
 if genus == "catalpa": ttype = "broad_leaved"
 if genus == "cedrus": ttype = "conifer"
 if genus == "celtis": ttype = "broad_leaved"
 if genus == "cercidiphyllum": ttype = "broad_leaved"
 if genus == "cercis": ttype = "broad_leaved"
 if genus == "chamaecyparis": ttype = "conifer"
 if genus == "cladrastis": ttype = "broad_leaved"
 if genus == "cornus": ttype = "broad_leaved"
 if genus == "corylus": ttype = "broad_leaved"
 if genus == "cotinus": ttype = "broad_leaved"
 if genus == "cotoneaster": ttype = "broad_leaved"
 if genus == "crataegus": ttype = "broad_leaved"
 if genus == "cryptomeria": ttype = "conifer"
 if genus == "cupressocyparis": ttype = "conifer"
 if genus == "cupressus": ttype = "conifer"
 if genus == "cydonia": ttype = "broad_leaved"
 if genus == "davidia": ttype = "broad_leaved"
 if genus == "elaeagnus": ttype = "broad_leaved"
 if genus == "eucommina": ttype = "broad_leaved"
 if genus == "exochorda": ttype = "broad_leaved"
 if genus == "fagus": ttype = "broad_leaved"
 if genus == "fontanesia": ttype = "broad_leaved"
 if genus == "frangula": ttype = "broad_leaved"
 if genus == "fraxinus": ttype = "broad_leaved"
 if genus == "ginkgo": ttype = "ginkgo"
 if genus == "gleditsia": ttype = "broad_leaved"
 if genus == "gymnocladus": ttype = "broad_leaved"
 if genus == "hibiscus": ttype = "broad_leaved"
 if genus == "ilex": ttype = "palm"
 if genus == "juglans": ttype = "broad_leaved"
 if genus == "juniperus": ttype = "conifer"
 if genus == "koelreuteria": ttype = "broad_leaved"
 if genus == "laburnum": ttype = "broad_leaved"
 if genus == "larix": ttype = "broad_leaved"
 if genus == "liquidambar": ttype = "broad_leaved"
 if genus == "liriodendron": ttype = "broad_leaved"
 if genus == "maclura": ttype = "broad_leaved"
 if genus == "magnolia": ttype = "broad_leaved"
 if genus == "malus": ttype = "broad_leaved"
 if genus == "metasequoia": ttype = "conifer"
 if genus == "morus": ttype = "broad_leaved"
 if genus == "nadelbaum": ttype = "conifer"
 if genus == "ostrya": ttype = "broad_leaved"
 if genus == "parrotia": ttype = "broad_leaved"
 if genus == "paulownia": ttype = "broad_leaved"
 if genus == "phellodendron": ttype = "broad_leaved"
 if genus == "photinia": ttype = "broad_leaved"
 if genus == "picea": ttype = "conifer"
 if genus == "pinus": ttype = "conifer"
 if genus == "platanus": ttype = "broad_leaved"
 if genus == "platycladus": ttype = "conifer"
 if genus == "populus": ttype = "broad_leaved"
 if genus == "prunus": ttype = "broad_leaved"
 if genus == "pseudotsuga": ttype = "conifer"
 if genus == "pterocarya": ttype = "broad_leaved"
 if genus == "pyrus": ttype = "broad_leaved"
 if genus == "quercus": ttype = "broad_leaved"
 if genus == "rhamnus": ttype = "broad_leaved"
 if genus == "rhus": ttype = "broad_leaved"
 if genus == "robinia": ttype = "broad_leaved"
 if genus == "salix": ttype = "broad_leaved"
 if genus == "sambucus": ttype = "broad_leaved"
 if genus == "sequoiadendron": ttype = "conifer"
 if genus == "sophora": ttype = "broad_leaved"
 if genus == "sorbus": ttype = "broad_leaved"
 if genus == "tamarix": ttype = "broad_leaved"
 if genus == "taxus": ttype = "conifer"
 if genus == "tetradium": ttype = "broad_leaved"
 if genus == "thuja": ttype = "conifer"
 if genus == "thujopsis": ttype = "conifer"
 if genus == "tilia": ttype = "broad_leaved"
 if genus == "toona": ttype = "broad_leaved"
 if genus == "tsuga": ttype = "conifer"
 if genus == "ulmus": ttype = "broad_leaved"
 if genus == "zelkova": ttype = "broad_leaved"

The list of geni is complete since I used the “List Individual Values” function of QuantumGIS to get all possible values for genus.

Converting Data to OSM-compatible Format

I tried to make the python script work with the SHP-file, but the python module “pyshp” apparently has problems with the encoding and the OGR-module quits the process with a segmentation fault. Currently the script takes the CSV-file as an input and outputs a newly created SHP-file. This file can be imported by JOSM using the “opendata” plugin and then be uploaded to the OSM-database. But there is a problem: A shapefile can only hold up to 8.3 characters in the attribute discription which truncated some values like “diameter_crown” to “diameter_cr”. So, the way to go is to again create a CSV-file by the script. This proved to be easy to implement. Sadly, JOSM does not import the CSV-file but gets stuck during the process (this is also true for ODS files – in fact every other format than KML had some disadvantages, e.g. unsupportet encoding, inclusion of the lat-lon as tags, …) . So, one can use QuantumGis to convert the CSV to KML which can be read by JSOM without any problems. The python-script produces an output-CSV which has to be converted to UTF8-encoding. Otherwise, QuantumGis will not display any special-characters like “ä”, “ü” or “ö” and will remove them from the dataset. This can be done by one of the many text-editors available (e.g. with linux: “Gedit” or “Geany”).

Geometry Information

It is important not to replace any existing trees in the OSM-database or create duplicate entries. Therefore, the data is analysed by using QuantumGis.

Coverage

Currently there are 2.996 trees mapped in Vienna, most of them in the 8th and 7th district.

Tree Coverage Vienna - OSM

OpenStreetMap Tree Coverage in Vienna (©OpenStreetMap und Mitwirkende, CC BY-SA)

Many of these are located inside courts, so they don’t collide with the OGD-dataset which only contains public trees (which in turn are rather located on open streets than on private areas) as can be seen by the example in the following graphic:

OSM in court, OGD on street

OSM in Court, OGD on Street (OGD Wien, ©OpenStreetMap und Mitwirkende, CC BY-SA)

The OpenGovernmentDataset contains 120.951 treeslocated in all areas of Vienna:

Tree Coverage Vienna - OGD

OpenGovernmentData Vienna Tree Coverage (OGD Wien, ©OpenStreetMap und Mitwirkende, CC BY-SA)

It is easy to see that the total distributed coverage is much better with the OGD-dataset. Additionally, the OSM-dataset contains no information about tree-types, height or else.

Positional Accuracy

As can be seen by the following examplary graphic, many of the trees that are already mapped are located at nearly the same spot as their OGD-counterparts.

Positional Accuracy OSM vs OGT Trees

Positional Accuracy OSM vs OGT Trees (OGD Wien, ©OpenStreetMap und Mitwirkende, CC BY-SA)

This high positional accuracy makes it easy to identify and leave out any already existing trees. These trees will be aggregated in an own file for later (manual?) processing. I made a positional check with buffer-fields around the OSM-trees. These buffers go every meter from 1 meter to 9 meter. The results are presented in the following table. the numbers are the points that overlap with the buffer. The “+ More” field shows how much more trees were selected in comparison to the buffer one meter smaller.

Buffer Size # of Trees Contained + More
1 meter 1034
2 meter 1124 + 90

3 meter

1136 + 12
4 meter 1145 + 9
5 meter 1158 + 13
6 meter 1203 + 45
7 meter 1227 + 24
8 meter 1239 + 12
9 meter 1250 + 11
Increase of tree number when expanding search radius

Increase of Number of Trees when Expanding Search Radius

As can be seen, there are more and more trees selected when expanding the search radius. Until a buffer-size of 5 meters, the amount of additional trees selected is mostly decreasing. From 5 meters it is increasing again which may be because trees may be counted twice because of overlapping buffer-zones. This value defines the upper limit of trees not suitable for import. All trees within a search radius of 5 meters (better choose the higher value to be sure) will not be imported. This will result in a total amount of 1.158 trees that are not imported and processed for later manual checking.

Preparation of Geometry Information

To exclude the unwanted trees and save them in a separate file for later processing, again QuantumGis can be used.

Manual Refinement

After selecting the trees that can be imported, suspicious values like a redicilously high “diameter_crown” have to be removed.

In JOSM the data can be refined even more. This step could have been included in the python script, but it is quite easy to do manually. There is the tag “species=baumgruppe”. This does not make sense. These “baumgruppe”n will be included in the final upload, but only as “natural=tree” without any additional information. With JOSM we can search for “baumgruppe” and remove the undesired values at once for all found trees. There are also some empty attributes. They can easily be found and removed with the “validator” plugin. Just select all elements, and perform a validation. Then select all occuring problems and click on “Fix”. The empty attributes should be deleted automatically. To speed up this process I deactivated all but the needed checks in the options.

Upload

By now the OGD-tree data should be refined and ready to upload !

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: