One of the biggest challenges of moving from one CMS to another is getting your existing data to the new platform. In our case, we were moving our existing WordPress platform, which had been around for years, over to the Sanity.io platform. At a high level, this involved getting the data out, transforming the data into something Sanity could read, and then actually importing it.
Getting the Data Out
The first step was to get the actual data out of WordPress. You can achieve this with many different methods, but because the overall process was over the course of weeks and the content would change, I went with creating a custom plugin. This plugin would grab all my data, transform (which we will talk about later), and spit out a file that could be imported by Sanity.
To make this easy, the plugin made use of Corcel, which gives your Laravel classes for your WordPress objects. Using Corcel, I'm able to make changes to the different Wordpress object attributes and grab those attributes. For instance, let's say in Sanity I have a post published field which format is different than the published date in Wordpress. I can modify that published field in my class and then anytime I grab it will be Sanity ready.
In all my Corcel classes, I have an export method that grabs all the needed data and throws it into an array. In the main plugin class, I loop thru all my Corcel classes and call that export function, which then gets added to an array holder. That array holder then gets thrown into a ndjson (newline delimited JSON) file that gets downloaded.
Transforming the Data
There are two big elephants in the room - shortcodes and Portable Text. With shortcodes, the question comes up of how do we deal with content in shortcodes and how do we mimic that styling in Sanity. With portable text, how do we get the Wordpress object content (so our blog post content) in the portable text format that Sanity uses? At the time of this writing, there is not a PHP library that converts HTML to Portable Text (there are libraries for other languages).
The concept I came up with for dealing with shortcodes was to convert the shortcode into a custom HTML element that contained all the data I need, either as attributes or as inner content. Then when that gets converted over with Portable Text, we have a custom deserializer for that element. Now we have all that content and still get to format it separately.
For example, let's say I have a custom shortcode called "intro" that has one attribute called "color" and has content that lives inside of it. In my BlogPost class, I am modifying the content attribute and looking for my intro shortcode. Once I've found a case of that shortcode, I'm going to replace the actual shortcode ([intro]) with a custom HTML element called "<intro>". I'm going to grab the color attribute and content as well so I'm left with
<intro color="red">Intro content went here</intro>
That would replace the shortcode in my BlogPost content and later on I can use a custom deserializer to convert that into my intro Sanity block (we will talk about that later).
For a real-world example, we used the gallery shortcode in our blog posts. The gallery shortcode contains the "ids" attribute which is just a string of IDs of the images that it will grab. Looking over to the Sanity side, while we don't have a gallery block, we have row/colum and image blocks, so we will want to convert the shortcode into something like this
<img src="http://via.studio/image1.png" />
<img src="http://via.studio/image2.png" />
<img src="http://via.studio/image3.png" />
Just like before, we look for our gallery shortcode in the blog post content. Once we find it, we grab the IDs attribute, fetch images for those IDs, and then remove the shortcode and replace it with the above code. On the Portable Text side, the default deserializer can handle image, so we just have to write a custom portion to handle the row tags.
We do this process across the board for all the shortcodes we want to keep. Wordpress offers a handy function that will spit out it's shortcode regex, so we use that to grab our shortcodes.
$regex = get_shortcode_regex();
From there we want to specific what shortcodes we want to keep, loop over those shortcodes, use the regex to find those shortcodes, and then do our content changes.
At the time of this writing and when we did this data transition, there was not a HTML to Portable Text PHP library (the inverse is available). However, there is a Node.js library for this in the @sanity/core block-tools. What we did was set up a Node server with Express.js running that accepted HTML (base64 encoded) and returned Portable Text. In this, we also had our custom deserializers for our shortcodes.
Importing the Data
Now that our data has been exported and transformed, we can import the ndjson file into Sanity. To do this, use the Sanity CLI tool. The tool will import images into your Sanity Media and will not timeout. With the "--replace" flag, it will overwrite existing documents, which is useful your doing testing.
sanity dataset import export.ndjson production --replace
At this point your data is now migrated. For us, it took a couple of imports to get it where we wanted it at, but in the end it worked out well.