|
Migrating a Subset
Before we would even consider migrating all our data we needed to run tests using a small subset of the final data. There’s no point in migrating if you know that even a small chunk of data is going to give you lots of trouble.
While there are existing tools that can handle this we also had to transform some data (e.g. fields being renamed, types being different, etc) and as such had to write our own tools for this. These tools were mostly one-off Ruby scripts that each performed specific tasks such as moving over reviews, cleaning up encodings, correcting primary key sequences and so on.
The initial testing phase didn’t reveal any problems that might block the migration process, although there were some problems with some parts of our data. For example, certain user submitted content wasn’t always encoded correctly and as a result couldn’t be imported without being cleaned up first. Another interesting change that was required was changing the language names of reviews from their full names (“dutch”, “english”, etc) to language codes as our new sentiment analysis stack uses language codes instead of full names.
|
|