Data Migration and Google Sheets

For small businesses looking to make a migration to Salesforce, one of the biggest hurdles to overcome is the migration of their data. In broad terms, we can think of it as a two part process: cleaning/scrubbing and a model change. While it can be pain staking work, the good news is that unless your dealing with more then ~ 2 million 'cells' of data, Google Sheets is a great tool to help out.


Every dataset is different, so there are no hard and fast rules. But a few parts of the process are the same: importing the data, scripting for automating cleaning, scripting for automating model changes, testing, change, repeat. The scripts written for cleaning and model changes is 'throw-away', since data sets from customers is going to be different. However, there are functions and processes that I've used time and time again that I'm including here for your consideration. The end goal is to build a multi-stage process that does the migration, and making sure that process is repeatable.


Realizing that columns will change

Referring to any data in your sheet by column index, R1C1 notation is an absolute no-no. The data you are importing is going to change as you test and refine, so referring to your data statically would have you changing the code every time. Instead, I always refer to my columns by their name.



Titles of the columns are retrieved in to one data set, the actual data is stored in another.



When I retrieve data from the set, I only ever get the index of the column by it's name (never it's position). Small function to make it easier.



Go get the data in whatever iteration I'm doing.


For me, this is just a best practice that means I can change the 'shape' of my data import without ever having to make coding changes to accomodate.

Sort, then Search

Computer Science 101, but take the time to sort the data so you can do a binary search. This is going to matter immensely as you code for data model changes, and have to 'link' data back up. Most of the data that you'll be searching through will be a 2d array, so we take that into account.

Quick little sort function that lets me stipulate the 'column' on which I want do my search. This assumes that the column is storing sortable data.




A binary search that 'backs-up' to the first index. Putting two datasets together, I'm always assuming that there will be duplicates, so I want the index of the first entry.

'searchPositionInItem' is thinking of the row as a single item, so this variable is really just the column index in the 2d array that I'm searching through. Of course, when I call this function, I pass in the column index by using the tip from above.




Putting it into use, I call the function to get the first index in the data, and then move forward until I'm not on the same search item. This little snippet sees me adding a bunch of Salesforce donations together based on the Primary Contact Id.

Grouped Highlighting / Banker Highlighting

As much as we want a process to be automated, we're always going to need eyes on the data. And grouping rows together based on a piece of data is always useful. Here's a tiny snippet that changes the background color of the sheet based on the values in one of the columns.

We're going to put the background color for the cell in a 2d array, so that array can be pushed to the sheet in one call. I'm searching column index 1, and seeing if it is the same as the row prior to this. It's also why I start on position 1 instead of 0.

Hacky, but still works, push that background color value into twenty three cells per row. This would obvously change for your data.

And send it to the sheet in one call.



Creating Households in Salesforce

So very Salesforce specific, but something that's been able to ease what tends to be a tricky part of a model change when moving to a NPSP data model that has 'Households'.

When importing data to create Household accounts, if you can create the contacts at the same time, go for it! But a few times I've been presented with the challenge for one reason or another that I need to create the Accounts first, the contacts second. It's a multi-stage process that involves a temporay Id on the Account import, export of the new Salesforce accounts with their Salesforce Ids, and then the import of Contacts with the right Salesforce Account Id. There are lots of posts out there that talk about this process, but what I want to write about is making sure that we're getting the Accounts right.

123 Nowehere Street, SmallsVille, IL versus 123 Nowhere St., SmallsVille, IL versus 123 Nohere St., SmallsVille, IL

Looking at this data, we can easily see that it's probably all meant to be the same address, but if I was creating a Salesforce account based on address, I'd be in trouble. And since I'm not in the business of creating heuristic functions for identifying things like things like this, how do I make sure that I'm not duplicating the accounts prior to insertion?

While, I may not be very good at quickly writing scratch code to decipher these three addresses, Google very much is. Or more accurately, Google Maps. Creating a google project for this is not what I'm looking to delve into here, but rather just to say that I pass each of the addresses into the Google Maps API to get a standardized address back, and that standardized address is what I use to create the accounts that I need.

We're running this is Sheets, and if I try to run all addresses at once, I'll timeout. I need to split the entry into chunks. Here I'm breaking it down into 500 calls per cycle.



The data that comes back from Google will get me a standarized entry, and Lat/Long points.


Those lat/long points can be put back into the array, and used as unique identifier, so that I know I'm not creating multiple household accounts.


I've also used Maps' nearby search to help better filter out data for my clients. One client in particular had a high number of highschool boarding students and college students in their database. We didn't know if the address provided was their home address or their school address. Again, Google Maps to the rescue, this time using the 'nearby' search to denote if the address given was likley their school address. The goal of this was to mark any address that might be a school, and then take a closer look at that data manually.


If the name of the house says 'School' then we'll mark it that way.
If there were more than 5 people with the same address, we cast a wide net to find any school within 350 meters, if less than 5, cast a smaller net.
If a school is found within the net, mark the data with the school name.


Phone

+1 (647) 367-1601

+502 5985-5412

Address

Port Hope, Ontario, Canada
Gutemala City, Guatemala

Email

coreybroderick@gmail.com