Required Files
Back to the Data Import API main page
NOTE: Example CSV and XML files are available here.
Two files are needed in order to use the SpatialKey Data Import API (DIAPI); a CSV file containing the data to upload and an XML file describing that data. The CSV file should be no different than CSV files used in the SpatialKey client upload wizard. Just produce a standard SpatialKey ready CSV. For more information on producing CSV files and having your data ready for SpatialKey see these two articles:
The XML file describes the data being uploaded and contains similar information that we collect manually from a user within the client during upload.
UPDATE: Creating the XML descriptor file is now very easy!
The SpatialKey team has recognized that creating the XML descriptor file is the most tedious and error prone aspects of the Data Import API. To help alleviate this we have added a way to export the XML descriptor for a previously uploaded dataset (NOTE: you must be a dataset owner to use this feature). This feature is now available as of the 1.0 version of the SpatialKey client.
To use this feature upload a starter CSV dataset representative of the data you wish to import using the SKDIA through the client application. This will serve two purposes:
- First, it will allow the user to import the dataset using the tools within SpatialKey (setting appropriate data types, validating that the data look correct, etc.) before trying it through the API
- Second, generating the XML descriptor is then just a few clicks away and removes the possibility of typos or mistakes
Once the dataset is uploaded, open the dataset in the dataset manager. As an owner of the dataset, you will now see a new tab in the manager called Data Import API. Simply select the tab and click the Generate XML Configuration button. The XML will get created for the dataset and returned to your browser. Save this file and use it as the XML descriptor referred to below.
The following information is contained within the XML:
- Dataset name - the dataset to create or overwrite/append
- Geo-location information
- Geocoding information (which columns should be used for geocoding)
- Existing latitude and longitude fields
- Column names
- Data types of each column
You will need to create one XML file for each dataset that is managed through the API. Once an XML file is created for a specific dataset, you should be able to save and reuse the XML for each update through the API. Here is an example of a DIAPI XML file:
Figure 1: SaleData.xml - uses existing latitude and longitude in the source data (georeference method=”trust”)
<?xml version="1.0" encoding="UTF-8"?> <umgMeta version="1.0" purpose="for publishing"> <dataset> <name>SalesData</name> <georeference method="trust"> <latitudeField>Latitude</latitudeField> <longitudeField>Longitude</longitudeField> </georeference> </dataset> <columns> <column> <position>1</position> <name>Transaction_date</name> <type confidence="100">DATE/MINUTE</type> </column> <column> <position>2</position> <name>Product</name> <type confidence="100">STRING</type> </column> <column> <position>3</position> <name>Price</name> <type confidence="100">NUMBER/INTEGER</type> </column> <column> <position>4</position> <name>Payment_Type</name> <type confidence="100">STRING</type> </column> <column> <position>5</position> <name>Name</name> <type confidence="100">STRING</type> </column> <column> <position>6</position> <name>Latitude</name> <type confidence="100">LATITUDE</type> </column> <column> <position>7</position> <name>Longitude</name> <type confidence="100">LONGITUDE</type> </column> </columns> </umgMeta>
The XML document is broken into two primary sections:
- The <dataset> element
- The <columns> element
The <dataset> XML element contains a <name> element (it is important to note that if you are updating or overwriting an existing dataset that the name must match the original dataset name exactly) and a <georeference> element. The <name> element should contain the name of the dataset to create or update. The <georeference> element varies depending on the requirements of the data. There are two basic georeference methods available using the DIAPI, geocode and trust. Each of these methods instructs the server on how to access the “where” information within your dataset.
If your data already contains valid latitude and longitude fields then use the trust georeference method. If you have address information and need SpatialKey to dynamically geocode this for you, use the geocode method instead:
Figure 2: TechCrunch.xml - uses geocoding (city and state fields) to get latitude and longitudes for the data (georeference method=”geocode”)
<?xml version="1.0" encoding="UTF-8"?> <umgMeta version="1.0" purpose="for importing"> <dataset> <name>TechCrunch</name> <georeference method="geocode"> <streetField></streetField> <streetValue></streetValue> <cityField>city</cityField> <cityValue></cityValue> <countyField></countyField> <countyValue></countyValue> <stateField>state</stateField> <stateValue></stateValue> <postalCodeField></postalCodeField> <postalCodeValue></postalCodeValue> <countryField></countryField> <countryValue></countryValue> </georeference> </dataset> <columns> <column> <position>1</position> <name>permalink</name> <type confidence="100">STRING</type> </column> <column> <position>2</position> <name>company</name> <type confidence="100">STRING</type> </column> <column> <position>3</position> <name>numemps</name> <type confidence="100">NUMBER/INTEGER</type> </column> <column> <position>4</position> <name>category</name> <type confidence="100">STRING</type> </column> <column> <position>5</position> <name>city</name> <type confidence="100">STRING/CITY</type> </column> <column> <position>6</position> <name>state</name> <type confidence="100">STRING/STATE</type> </column> <column> <position>7</position> <name>fundeddate</name> <type confidence="100">DATE/DAY</type> </column> <column> <position>8</position> <name>raisedamt</name> <type confidence="100">NUMBER/INTEGER</type> </column> <column> <position>9</position> <name>raisedcurrency</name> <type confidence="100">STRING</type> </column> <column> <position>10</position> <name>round</name> <type confidence="100">STRING</type> </column> </columns> </umgMeta>
If your data does not contain location information you can use an empty <georeference/> in the XML and import data that does not contain lat/lon or geocoding information:
Figure 3: - Data that does not use location information
<?xml version="1.0" encoding="UTF-8"?> <umgMeta version="1.0" purpose="for importing"> <dataset> <name>My dataset without Geo Location</name> <georeference/> </dataset> <columns> <column> <position>1</position> <name>column1</name> <type confidence="100">STRING</type> </column> <column> <position>2</position> <name>column2</name> <type confidence="100">STRING</type> </column> <column> <position>X</position> <name>columnX</name> <type confidence="100">NUMBER/INTEGER</type> </column> </columns> </umgMeta>
Each of the field and value elements within the geocode method type are used by SpatialKey to geocode the data. The field elements (streetField, cityField, stateField, etc.) should be empty or contain the column name (header name in the CSV file) of the corresponding data. Note that it is not required to fill in each field value. Only use what fields you have access to, although the more information available in a record the more accurate the geocoding will be. The value elements are optional and give the API user the ability to “hard code” a specific value for that field type. For example, if your data does not contain state information, but you know that all data within the dataset is within the state of California you could simply place the word California, or CA in the stateValue field. The geocoding process would then use that data for each row as it gets geocoded. The field and value pairs are mutually exclusive of each other and the field version will take precedence if both are specified.
Include all elements (as shown in the example or from the sample code) even if they are not all used (leave the XML text empty for the unused elements).
The trust georeference method (see Figure 2) contains many less fields. The <latitudeField> element should point to the column name containing latitude information while the <longitudeField> element should do the same for longitude.
After the dataset element comes the <columns> element. This element should contain a <column> child element for each column in the the dataset.
Figure 3: Column element example
<column> <position>1</position> <name>permalink</name> <type confidence="100">STRING</type> </column>
Each column will contain a <position>, <name>, and <type> element. The <name> should match up with the column header in the CSV file. In a similar fashion the <position> element should correspond to the order of the headers and columns in the CSV as well. The numbering for <position> should start at 1 and count from there up to the total number of columns. The last element is <type>. In the SpatialKey client application after the initial upload of a CSV file, the user is presented with all of the columns found in the CSV. The client makes some judgments based on the contents of the columns and comes up with an educated guess as to the data types associated with each. For the DIAPI, however, this information needs to be placed into the XML document for each column manually.
Data types within SpatialKey are set up in a Data Type / Type Granularity fashion. An example of how this looks in the XML file would be something like: NUMBER/INTEGER. The first value (before the “/”) is the general data type. In this case a number. The second value tells us more about that data. In the example we have INTEGER telling the system that the data is not only a number but it contains no decimal places. There are many different type/granularity combinations for your data. They are presented here with a definition of what each means:
- STRING - simple text, no granularity defined
- STRING/IPADDRESS - an ip address
- STRING/STREET - a street address
- STRING/CITY - a city
- STRING/STATE - a state (could be abbreviation, full state name, etc.)
- STRING/ZIP - zip or postal code (treated as text)
- STRING/COUNTY - a county
- STRING/COUNTRY - a country
- BOOLEAN - for columns with only two values (typically yes/no, 1/0, etc. but could be others)
- NUMBER - generic number, will be stored as a decimal, should use a granularity below
- NUMBER/INTEGER - number with no decimal places
- NUMBER/DECIMAL - number containing decimal data
- NUMBER/CURRENCY - number with possible currency formatting
- NUMBER/HOURS - a number representing a number of hours
- NUMBER/MINUTES - a number representing a number of minutes
- NUMBER/SECONDS - a number representing a number of seconds
- NUMBER/MILLISECONDS - a number representing a number of miliseconds
- DATE/YEAR - date containing only a relevant year
- DATE/MONTH - a date accurate/relevant to the month
- DATE/DAY - a date
- DATE/HOUR - a date and time accurate to the hour
- DATE/MINUTE - a date and time accurate to the minute
- DATE/SECOND - a date and time accurate to the second
- LATITUDE - a latitude field
- LONGITUDE - a longitude field
Once the XML and CSV files are prepared (and preferably zipped up in a .zip file) they are ready to uploaded using the API HTTP calls.

Follow Us: