In this blog post I teach you how to convert GoogleDoc records into Salesforce Files via Google Drive external data source.
Google Docs, Notes, & Attachments
Prior to April 2015, Salesforce customers who used Google Apps (later renamed to G Suite) were able to create and link documents, spreadsheets, and presentations to records as an alternative to uploading a physical attachment. This feature required enabling Google Apps in your org, and once enabled also renamed “Notes & Attachment” related list to “Google Docs, Notes, & Attachments”.
After April 2015, according to this help article, Salesforce changed the integration citing changes to Google’s API. You can now only create bookmarks to the URLs of Google Drive files this way.
To help compensate for this loss in functionality, in the Summer ’15 release Salesforce introduced capabilities for connecting Google Drive with Files Connect.
Converting to Files
I’m a big fan of Salesforce Files and have a dedicated page with Dreamforce sessions and multiple open source projects I’ve developed all around adopting Files & Enhanced Notes.
Jeff Issenberg, after successfully using my open source apps to migrate his org’s Notes & Attachments then asked about migrating GoogleDoc records to Files too. Specifically, Jeff wanted to migrate the GoogleDoc records into external file references via Google Drive using the Files Connect feature.
The GoogleDoc object is very simple with few fields, the important ones are:
- Name – name of the file/bookmark the user gave it
- URL – link to the Google Drive document
- ParentId – Salesforce record the document is related to
The ContentVersion object, which represents Salesforce Files, can be used to either store actual files uploaded to Salesforce or it can be used as a bookmark to a file that lives in an external repository like Google Drive or SharePoint. In this blog post we want to do just that, migrate the GoogleDoc bookmarks into ContentVersion external file references and take advantage of all that Salesforce Files and Files Connect provides us. The important fields for this are:
- ContentLocation – use value “E” to indicate external file
- Origin – use value “H” to indicate this is a Salesforce File
- Title – name of the file (GoogleDoc.Name)
- PathOnClient – name of the file (GoogleDoc.Name)
- ExternalDataSourceId – ID of an External Data Source for Google Drive
- ExternalDocumentInfo1 – link to the Google Drive document (GoogleDoc.URL)
- ExternalDocumentInfo2 – ID of the Google Drive document (parsed from URL)
- FirstPublishLocationId – Salesforce record ID to relate file (GoogleDoc.ParentId)
To do the migration we don’t need any Apex code, this can be accomplished entirely with Data Loader and spreadsheet functions.
Instructions
- Enable Files Connect and create External Data Source for Google Drive
- Export GoogleDoc records, the Name, URL, and ParentId fields, to .csv file
- Add the ContentVersion columns to the .csv file, mapping the GoogleDoc columns.
- Import the .csv file to ContentVersion object
- Verify that files linking to Google Drive now appear in the “Files” related list
The process is very straight forward data loader exercise, the only challenge was extracting the Google document ID from the URL. To save you the headache, you can refer to this Google spreadsheet template that includes the formulas and sample data.
If you did want to use Apex code for some reason, check out this gist.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class GoogleDocToFilesBatchable implements Database.Batchable<SObject>, Database.Stateful { | |
private ID externalDataSourceId { get; set; } | |
public GoogleDocToFilesBatchable( ID externalDataSourceId ) { | |
this.externalDataSourceId = externalDataSourceId; | |
} | |
public Database.QueryLocator start( Database.BatchableContext context ) { | |
return Database.getQueryLocator([ | |
SELECT | |
id, name, url, ownerId, parentId, | |
createdById, createdDate, lastModifiedById, lastModifiedDate | |
FROM | |
GoogleDoc | |
ORDER BY | |
parentId | |
]); | |
} | |
public void execute( Database.BatchableContext context, List<GoogleDoc> googleDocs ) { | |
List<ContentVersion> filesToInsert = new List<ContentVersion>(); | |
for ( GoogleDoc gd : googleDocs ) { | |
String gdocId = gd.url.substringAfter( '/d/' ).substringBefore( '/' ); | |
filesToInsert.add( new ContentVersion( | |
contentLocation = 'E', | |
origin = 'H', | |
ownerId = gd.ownerId, | |
title = gd.name, | |
pathOnClient = gd.name, | |
externalDataSourceId = this.externalDataSourceId, | |
externalDocumentInfo1 = gd.url, | |
externalDocumentInfo2 = gdocId, | |
firstPublishLocationId = gd.parentId, | |
// Enable "Create Audit Fields" in Setup then you can | |
// preserve the original create/update values and owner if original owner is inactive | |
// https://help.salesforce.com/articleView?id=Enable-Create-Audit-Fields&language=en_US&type=1 | |
createdById = gd.createdById, | |
createdDate = gd.createdDate, | |
lastModifiedById = gd.lastModifiedById, | |
lastModifiedDate = gd.lastModifiedDate | |
)); | |
} | |
insert filesToInsert; | |
// optionally, can delete the GoogleDoc bookmarks since now redundant of Files | |
// delete googleDocs; | |
} | |
public void finish( Database.BatchableContext context ) { | |
} | |
} |
With the retirement of Files Sync in Spring ’18, how do you anticipate you’ll manage your files next year? This is a feature I love as a remote employee keeping my files in-sync. I guess I’ll have to say “bye, bye, bye…”
LikeLiked by 1 person
Hi Karol,
My recommendation is your company consider using an external file repository like Google Drive or SharePoint. Both have desktop and mobile file syncing capabilities.
Then by using Files Connect you and others can browse, access, and share the files from Google Drive or SharePoint (or other supported repositories) to Salesforce records as desired.
Doug
LikeLike
Hi Doug,
Thank you again for writing this How To Convert GoogleDoc records to Salesforce Files! I’ve just successfully done the conversion.
I’d like to recommend adding the OwnerID field to the export and import. That way the file will be owned by and shared with the original owner in addition to the parent record.
Jeff
LikeLiked by 1 person
Great! Thanks for the feedback
LikeLike
Hi Doug,
I’ve got followup questions. I just ran an update to include the Owner ID field. There were some records that failed on the update because the owner is inactive. This isn’t really a problem in and of itself but those records are now owned by me. Do you have a suggestion for how to not have those records owned by me? Not sure if there’s a way around this because they need to be owned by someone.
Also, ALL the records that I inserted are shared with me now. I’m not sure if that would have been the case if my first insert had included the Owner ID. Do you know if that’s the case? And is there a way to undo the sharing with my user?
Thanks
Jeff
LikeLiked by 1 person
Hi Jeff,
You can enable Create Audit Fields and grant your user the permission to set owners to inactive users as well as preserve the original create/update dates.
I’ve updated the example apex code so that it will do just that provided that “Create Audit Fields” permission has been enabled in Setup.
Once you’ve setup that permission for yourself, you should be able to use Data Loader to change the owners to their original users who are inactive.
The owner of a file (ContentVersion object) has the file shared to them giving them collaborator access to the file (via ContentDocumentLink object).
Upon changing owner of the file then the file will be shared to them in addition to you (if the file was shared with you originally). To unshare the file to you explicitly then you can delete the ContentDocumentLink where LinkedEntityId field equals your user ID and ContentDocumentId field equals the file in question.
LikeLike
Hi Doug, how about a similar article using Sharepoint?
LikeLiked by 1 person
Hi Daryl,
What specifically about SharePoint? Do you all use the “Google Doc” records in Salesforce to store links to SharePoint Files?
If you’re looking to use Files Connect with SharePoint, here is the Setup article: https://help.salesforce.com/articleView?id=admin_files_connect_overview.htm&type=0
LikeLike
Thanks Doug, that article looks like what we need.
LikeLiked by 1 person
Hi Doug – thanks for this article and all your help around Files and Notes. FYI – to import the data through dataloader.io into our Winter ’18 org, I had to include a column called ContentDocumentId without any values. According to the Salesforce SOAP API Developer Guide on ContentDocument, this field is required (https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_objects_contentdocument.htm): “To create a document, create a new version via the ContentVersion object without setting the ContentDocumentId. This process automatically creates a parent document record. When adding a new version of the document, you must specify an existing ContentDocumentId which initiates the revision process for the document. When the latest version is published, the title, owner, and publish status fields are updated in the document.
LikeLiked by 2 people
Thanks Patrick!
LikeLike
Hi Doug,
Thanks for writing this blog. I was bit confused about the concept initially, but thanks to your blog. Successfully implemented the approach in one of our client’s implementation.
LikeLiked by 1 person
You’re very welcome, Saurabh!
LikeLike
Hello Doug
I did a test 2 weeks in my org and everything went well. But since the Summer 18 release last week end, I did another test and I keep having the issue with dataloader : “Required fields are missing: [VersionData]”.
Do you have any clue about that issue ? I thought VersionData was only required for documents, not for links to Google Drive files …
Thanks a lot in advance for your help.
Great blog by the way !
Yann
LikeLiked by 1 person
My mistake Doug, I did not map the Content Location correctly, It works now
LikeLiked by 1 person
Glad you got it working, thanks for the update and compliment!
LikeLiked by 1 person
Hello Doug,
I am ready to migrate my 2000 google docs.
However, I noticed in lightning that the file icon is not the same as the Google Docs in the Notes and Attachments related list. The new Icon for the Google drive linked document is grey without the “G” indicating it’s a google doc (in Notes and Attachments, Google docs have a blue icon with a “G”). Do you know if it’s normal or if we can have the same icon as before ?
thanks a lot for your help
Yann
LikeLiked by 1 person
My guess is confirm that the PathOnClient field ends with a file extension like “.txt” or “.pdf” etc. The file name extension on ContentVersion.PathOnClient field influences the icon displayed.
I don’t have access to a G-Suite account to test with anymore, but if the GDoc attachments do not have a file extension, then you may have to go without or try guessing =/
LikeLiked by 1 person
Thanks Doug.
One last question, after I insert my google docs in the Content Version Object, and If I try to see the new records in Workbench, they do not appear (as well as when I export through dataloader)…
Do you know why ?
Thanks a million for your help
LikeLiked by 1 person
Yeah, the sharing model for ContentVersion and ContentDocument records only returns records if it has been shared explicitly with your user via ContentDocumentLink record.
Otherwise, you need to indirectly query for the content via SELECT Id, ContentDocument.LatestPublishedVersion.Title, … FROM ContentDocumentLink WHERE LinkedEntityId IN ( SELECT Id FROM SomeObj )
You can use the Weekly Data Export option to export all ContentVersions though and then weed through the .csv export.
LikeLiked by 1 person
Hi Doug,
Thanks for sharing this information. I successfully ran through your instructions earlier in the year, during a trial migration on our sandbox.
However, now I’m attempting a live migration, I’m hitting an odd problem. The record is being inserted via dataloader in the right place, it links to the correct location and is owned by the correct person.
However, the preview window shows the error: ‘File not available. This file isn’t available anymore.’, as does the ‘Download’ link. But if i click the ‘open’ link at the top, the file opens just fine.
Is this likely to be a configuration or import issue? Or is Salesforce simply not able to show the preview of the imported file?
I could have sworn that the preview worked earlier in the year… whilst we might be able to live without, it’s still a very misleading error message, as the file does in fact exist.
Thanks again!
LikeLiked by 1 person
Hi Jon,
I think it’s hit or miss on whether a preview is available depending on where the external document is hosted. For example, SharePoint hosted files might not be supported for previews, but according to Spring ’16 release notes, Google Docs should be.
I also recommend you ask on the Official Files chatter group on the Trailblazer community for that team to follow up with you if you think you’re external data source provider should be providing previes.
Thanks
LikeLike
Hi, I have been trying to get this working. I have only been able to get dataloader to accept an import by changing the setting in file connect for filesharing from “copy” to “reference” – the entire point is that I want to upload actual copies of the files rather than links. When it’s on “copy” I get the error message “Can’t create a reference file because the Files Connect sharing mode is set to copy” – so I think I must have some sort of setting wrong. Can anyone advise what I might be doing wrong? Thank you
LikeLiked by 1 person
Hi Anna, if you aren’t loading external file references, then don’t specify the external fields on ContentVersion. Just import the files normally. Maybe that will work for you.
LikeLike
Hi Doug, I tried this today with only few records before migrating ~2500 google docs to files. In addition to your instructions, I added the audit fields (Created by, date of creation, last modified by, last modified date) and I’m getting the following error for only some of the files: This Content Document can’t be added to a Content Folder because you’re not the Content Document owner, and the Content Document isn’t shared with you from a person, group, or company-wide sharing setting.
It’s really weird, since I have edit access over the records I’m relating the files to and object is public read/write so there shouldn’t be any restrictions there.
Do you have any insight?
LikeLike
We’re looking to move away from Google (…) and the batchable class seems like a great way to identify the files that are currently associated in Salesforce, and to determine the level of effort in moving them to the next great repository. The externalDataSourceId is eluding me, though — where should I be looking to access that? I’m guessing that that’s tied to our Company’s Google implementation..?
LikeLiked by 1 person
Hi Duncan,
Some of the fields on the ContentVersion object, like ExternalDataSourceId, are only visible via SOQL queries, which you can run using tools like the Developer Console or Workbench.
LikeLike
Thank you Doug! This was very helpful. I was able to successfully migrate Google Docs to Files. I really appreciate your post and the sample import file.
LikeLiked by 1 person
You’re very welcome, Stephanie. Thanks for the kind feedback 🙂
LikeLike
Hi Doug,
I came across your blog this morning regarding converting GoogleDoc records to Salesforce files. However, I believe our problem is different from the one you’ve solved in this article. We’ve attached files to records in Salesforce using the Add Google Doc button in the past – but now we are moving away from GSuite to a Microsoft environment including Sharepoint. Our google drives will no longer exist. What should I do to convert all these 400+ documents to Salesforce files? Thanks for any guidance you can offer.
LikeLiked by 1 person
Hi Pam,
If you’ll have SharePoint, you might want to engage an implementation partner to assist migrating your Google Drive folders/files to SharePoint libraries.
Salesforce Files Connect also works with SharePoint Online so that your files can exist in SharePoint but be searchable and added to records and used in Chatter as you’ve done with GSuite.
If you don’t migrate the files to SharePoint, I still recommend working with an implementation partner, someone who has experience with the Google Drive API and Salesforce Files API to export/import the data. Considering each user has their own folder/file sharing settings, it’s not exactly a trivial thing to do.
Hope that helps! Good luck!
LikeLike
Hi Doug – can I ask how you are getting the CSV export of the existing Google docs in Attachments and Notes?
LikeLiked by 1 person
Hi Aron, I’m using Data Loader to query records from the GoogleDoc object.
LikeLike
Thanks Doug – when inserting with Dataloader into the Contentversion object I am getting an error with an invalid X-ref ID on every record. Any chance you have run into that?
LikeLike
My GoogleDocs included a wild variety of url patterns. I wrote a python script to do the migration, using the info Doug provided here. You can see it at migrate-google-docs.py https://gist.github.com/bolaurent/802d4e9cd7941e13cb5051bece0682c1
LikeLiked by 1 person
I don’t know how new this is, but when I look at the Google Docs I add through the UI, the gdocId has a prefix, the type of document, which is actually in the URL, but this program is throwing away. I modified your substring statement to use regular expressions, but here’s the line:
String gdocId = gd.Url.replaceAll(‘\/open\?id=’,’\/file\/d\/’)
.replaceAll(‘https.com\/’,”)
.replaceAll(‘s?\/d\/’,’:’)
.replaceAll(‘\/.$’,”);
The first term accounts for some old URLs that were in a different format.
LikeLike
Hi Doug!
Thanks for your article. When I’m trying to upload via dataloader.io it says that there is a field called ‘ContentDocument ID’ that is required. Do you know what that should map to?
Many thanks!
Gen
LikeLike