I needed to load spatial data for Canadian places into a SQL Server 2008 R2 database. Good thing there is geonames.org, which is a geographic database that contains millions of placenames and their corresponding geolocations (latitude and longitude).
Here are the steps I took to create this spatial database.
Download the geospatial data
So I first downloaded the Canadian geolocation database from http://download.geonames.org/export/dump/ called CA.zip (according to the geonames readme).
This export file contains the following fields (excerpt from the readme file)
geonameid | integer id of record in geonames database |
name | name of geographical point (utf8) varchar(200) |
asciiname | name of geographical point in plain ascii characters, varchar(200) |
alternatenames | alternatenames, comma separated varchar(5000) |
latitude | latitude in decimal degrees (wgs84) |
longitude | longitude in decimal degrees (wgs84) |
feature class | see http://www.geonames.org/export/codes.html, char(1) |
feature code | see http://www.geonames.org/export/codes.html, varchar(10) |
country code | ISO-3166 2-letter country code, 2 characters |
cc2 | alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters |
admin1 code | fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20) |
admin2 code | code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) |
admin3 code | code for third level administrative division, varchar(20) |
admin4 code | code for fourth level administrative division, varchar(20) |
population | bigint (4 byte int) |
elevation | in meters, integer |
gtopo30 | average elevation of 30’x30′ (ca 900mx900m) area in meters, integer |
timezone | the timezone id (see file timeZone.txt) |
modification date | date of last modification in yyyy-MM-dd format |
Create a table in your SQL Server database that maps to all these columns
I then created the following table
CREATE TABLE [dbo].[CAGeoNames] ( [geonameid] INT PRIMARY KEY, [name] VARCHAR(200), [asciiname] VARCHAR(200), [alternatenames] VARCHAR(5000), [latitude] DECIMAL(38,10), [longitude] DECIMAL(38,10), [featureclass] VARCHAR(10), [featurecode] VARCHAR(10), [countrycode] VARCHAR(2), [cc2] VARCHAR(60), [admin1code] VARCHAR(20), [admin2code] VARCHAR(80), [admin3code] VARCHAR(20), [admin4code] VARCHAR(20), [population] bigINT, [elevation] INT, [gtopo30] INT, [timezone] VARCHAR(100), [modificationdate] DATETIME ) GO
Import the data
I initially used BULK IMPORT with a ROWTERMINATOR of n because I know each line is terminated by a linefeed. But for some reason, n didn’t work. Neither did rn.
Baffled, I know that n should have worked, but I conceded I should consult Books Online. Turns out Books Online has an exact sample for importing a file that was produced by UNIX .. and for some reason it uses dynamic SQL. Not wanting to waste any more time with the import, I just used the Books Online sample as a template, and true enough, it worked.
One of these days though I will come back to this issue and figure out why a bare n ROWTERMINATOR doesn’t work.
DECLARE @bulk_cmd varchar(1000) SET @bulk_cmd = 'BULK INSERT CAGeoNames FROM ''C:CACA.txt'' WITH (ROWTERMINATOR = '''+CHAR(10)+''')' EXEC(@bulk_cmd) GO
Add a GEOGRAPHY type column
Next up, we’ll use SQL Server 2008’s new GEOGRAPHY data type.
ALTER TABLE [dbo].[CAGeoNames] ADD [GeoLocation] GEOGRAPHY GO
Create GEOGRAPHY data based on the existing longitude and latitude values
And last but not least, we are going to derive the GEOGRAPHY value based on the longitude and latitude values that we just imported. We are going to use the Parse function that comes with the GEOGRAPHY data type.
UPDATE [dbo].[CAGeoNames] SET [GeoLocation] = Geography::Parse('POINT(' + CAST([longitude] AS VARCHAR(20)) + ' ' + CAST([latitude] AS VARCHAR(20)) + ')') GO
Ta-Daa!
That’s it. Now I have my own Canadian geolocation database. I can start using this database now for my SSRS R2/Bing Maps presentations!