I wanted to do a cool experiment on SSIS that I haven’t tried before. I am really interested in this whole mapping component on SSRS, and I know that most likely the biggest challenge of companies is how to geocode addresses that are already in their databases. So what I wanted to do was to try and retrieve lat/long information from public geocoding web services within SSIS.
Please note that the exercise below is purely for experimentation. Please read the restrictions of each of the geocoding services, and go with whichever is appropriate for your purpose.
Also this is a simplistic code to show you how to get the latitude/longitude information. On your own versions, test thoroughly and add the appropriate exception handlers.
Ok, let’s start.
Step 1 – Look for the appropriate geocoding service.
There are few publicly available ones. Some are limited in content, some are very restricted. Check out the summary here – http://en.wikipedia.org/wiki/Geocoding
For this experiment I am going to use Google Maps v3 API since it’s publicly available, and V3 doesn’t need the Maps API key anymore! Please make sure you read the restrictions of using the Google Maps v3 API before deciding to use it in your applications/systems.
Note that for v3, there are only 2 available output formats: json and xml. Previous version of the API supported other output formats like kml and csv. I am going to stick with XML. (I’ll leave JSON for the next experiment).
Here is an example URL and result: http://maps.googleapis.com/maps/api/geocode/xml?address=Pacific+West+Outlet,Gilroy,California,US&sensor=false
Try to study the structure, it will help later on with the parsing for latitude and longitude.
Step 2 – Prep your package
Create your SSIS package. Drop a data flow task, and add your source. In my case my source is an OLE DB source pointing to my SQL Server table that has address fields already.
Step 3 – Code the transformation inside your Script Component
a. Still inside the data flow task, drop a Script Component task that you will use for Transform. Drag the data flow path from your source to the script component.
b. Specify your input parameters, and specify Latitude, Longitude and GeocodeURL as Read/Write. I am only adding the GeocodeURL because I want to be able to spot check my numbers later on.
c. Edit your script in the Script Component. Create a class that will contain your GeoCoding class. Note I’m using C# in my scripts.
using System; using System.Collections.Generic; using System.Text; //added these using System.Data; using System.Net; using System.Web; using System.Xml; // THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF // ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED // TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A // PARTICULAR PURPOSE. namespace sqlmusings { public interface IGeoLocation { double latitude { get; set; } double longitude { get; set; } string geocodeurl { get; set; } } public struct GeoLocation : IGeoLocation { private double _latitude; private double _longitude; private string _geocodeurl; public GeoLocation(double latitude, double longitude, string geocodeurl) { _latitude = latitude; _longitude = longitude; _geocodeurl = geocodeurl; } public double latitude { get { return _latitude; } set { _latitude = value; } } public double longitude { get { return _longitude; } set { _longitude = value; } } public string geocodeurl { get {return _geocodeurl; } set { _geocodeurl = value; } } } public class GeoCode { const string _googleUri = "http://maps.googleapis.com/maps/api/geocode/xml?address="; //sample //http://maps.googleapis.com/maps/api/geocode/xml?address=1128+West+Hastings+Street,+Vancouver+British+Columbia+Canada&sensor=true private static Uri GetGeoCodeURI(string address) { address = HttpUtility.UrlEncode(address); string uri = String.Format("{0}{1}&sensor=false", _googleUri, address); return new Uri(uri); } public static GeoLocation GetCoordinates(string address) { WebClient wc = new WebClient(); Uri uri = GetGeoCodeURI(address); try { string geoCodeInfo = wc.DownloadString(uri); XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(geoCodeInfo); string status = xmlDoc.DocumentElement.SelectSingleNode("status").InnerText; double geolat = 0.0; double geolong = 0.0; XmlNodeList nodeCol = xmlDoc.DocumentElement.SelectNodes("result"); foreach (XmlNode node in nodeCol) { geolat = Convert.ToDouble(node.SelectSingleNode("geometry/location/lat").InnerText, System.Globalization.CultureInfo.InvariantCulture); geolong = Convert.ToDouble(node.SelectSingleNode("geometry/location/lng").InnerText, System.Globalization.CultureInfo.InvariantCulture); } return new GeoLocation(geolat, geolong, uri.ToString()); } catch { return new GeoLocation(0.0, 0.0, "http://"); } } } }
d. Still inside the script editor of the Script Component, now switch to your main.cs. Add the code that calls and uses your Geocoding class.
/* Microsoft SQL Server Integration Services Script Component * Write scripts using Microsoft Visual C# 2008. * ScriptMain is the entry point class of the script.*/ using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using SC_65a998228f6546ed965116b9f8b76953.csproj; // THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF // ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED // TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A // PARTICULAR PURPOSE. [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void PreExecute() { base.PreExecute(); /* Add your code here for preprocessing or remove if not needed */ } public override void PostExecute() { base.PostExecute(); /* Add your code here for postprocessing or remove if not needed You can set read/write variables here, for example: Variables.MyIntVar = 100 */ } public override void Input0_ProcessInputRow(Input0Buffer Row) { /* Add your code here */ sqlmusings.GeoLocation geolocation = sqlmusings.GeoCode.GetCoordinates(Row.AddressLine1 + "," + Row.City + "," + Row.State + "," + Row.Country); Row.Latitude = Convert.ToDecimal(geolocation.latitude); Row.Longitude = Convert.ToDecimal(geolocation.longitude); Row.GeoCodeURL = geolocation.geocodeurl; } }
e. Add your destination. In my case, I’ll just drag a multicast, and add a data viewer, because I just want to see the results.
There you go, we’re done with our cool SSIS experiment!
Cool eh?
References:
http://rickcaminiti.com/misc/google-geocoding-ssis/
http://blog-of-darius.blogspot.com/2011/02/geocoding-with-c-and-google-geocoding.html
Well done! I’m eager to see if it works as well with JSON.
Thanks! The curious me will probably try to see how JSON works with this *soon*
http://blog-of-darius.blogspot.com/2011/07/geocoding-with-c-and-google-geocoding_28.html
I don’t know C# and I put all this stuff in – but all I get is this:
0.0, 0.0, http:// in the data viewer – what is not working? I can’t figure it out… I have an address in an address field…. Please help? seems like I am getting this part of the code returned
return new GeoLocation(0.0, 0.0, “http://”);
First off, thanks for this! But, I have a question that you might be able to answer for me.
When I run my SSIS package for a group of addresses, some retrieve the lat & long coordinates just fine and others return 0.0000
I noticed in the XML there is a difference where the ones returning the coordinates have a location_type = “Approximate” where the ones that fail all have a location_type = “RANGE_INTERPOLATED”. But, there is a lat & long coordinate listed in both xml files.
Any ideas what else I could try?
Thanks,
Greg
[…] Package In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS […]
Hello just wanted to give you a brief heads up and let you know a few of
the pictures aren’t loading properly. I’m not sure why but
I think its a linking issue. I’ve tried it in two different browsers and both show the same outcome.