A solution where legacy content could be left on the existing file servers while still being available in SharePoint would be fantastic. Hold on a minute, I think I know one! 🙂
The challenges
Change management: Each user has its own pace in adopting new technology.
Certain will prefer accessing their documents through their traditional way which is often file shares exposed automatically on their Windows Explorer as drive letter.
Others will embrace faster new ways such as mapping SharePoint document library in Windows Explorer or the power of the SharePoint search engine.
Information governance: where the users should put their document? File share or SharePoint?
Many enterprises face this question at some stage.
Do you remember the 4 keys elements of any governance?
- People
- Process
- Policy
- Technology.
At the end of the day, most of the times, it’s the users who drives the preference of where of the content is hosted, cf. my first point on change management.
User experience: Why can’t I use SharePoint great functionalities for documents hosted on file share? 🙁
Documents hosted in file servers can be exposed through the SharePoint search centre. This is great to access file share data as NTFS permissions are kept and results only show what the user is allowed to see.
Even so, all the great functionalities of SharePoint are not available out of the box for documents hosted in file servers.
Those are for example:
- Versioning
- Metadata
- Approbation workflow
- Pre-visualisation in Office Web Apps…etc
Other fancy business needs can also be accessing documents in file servers through Android tablets or iPad…
When business managers start playing with technical toys, their challenges are endless! 😉
Architecture: a 20 TB+ SharePoint farm brings complex challenges
Many years ago when I was participating in the architecture of a mega SharePoint 2010 farm for British Telecom, I created a spreadsheet with different kind of architecture scenario: databases without mirroring, with mirroring, in SQL cluster…etc.
What I found out quickly is that the actual space used for data was never more than around 20% of the total space of the farm, this included for instance:
- SharePoint servers with the search index
- SQL databases
- SQL transaction logs
- SQL backup partition
- X 2 if databases were mirrored
To put it simply, to host 20 TB of useful data, you need virtually 100 TB of storage.
I found it astonishing that so much space was wasted for just the exploitation.
Of course nowadays, thin provisioning within storage baies can save more space but the ratio is still important.
Also, the fact that content was hosted in SQL Server databases did not help. RBS was not an option at the time as high availability was key.
Exploitation: have you ever backed up 20 TB+ of data in a SharePoint farm? Not easy!
Backing up the databases is the only way for such a volume of data. Ideally, multiple SQL Servers will back up their own databases in parallel.
PowerShell commands such as Backup-SPFarm / Restore-SPFarm are useless here as the time taken to run them will be often too long or it will not be supported by Microsoft as Content Databases will eventually get greater than 200 GB.
Legacy data: what about all that existing data of the file servers?
Part of the information governance is taking into account the data retention policy and availability.
Often companies have Terabytes of data on their file servers and don’t want to change this infrastructure for multiple reasons but mainly because of cost.
Migrating those Terabytes of data from the files shares into SharePoint is usually not done. This content is left behind with the replacement of the infrastructure and eventually the old file server infrastructure is decommissioned.
What if all the existing file servers could be exposed through SharePoint?
When you take into account all those challenges above, a solution where existing content could be left in the file servers while still being available in SharePoint would be fantastic.
You’ll tell me that using RBS (Remote Blob Storage) in SQL Server could solve some of those problems, however not all 3rd party RBS solutions can leave files readable.
There are many different solutions on the market but I believe the one above can really answer those problems.
Introducing AvePoint’s DocAve Connector (File Share Integration)
No, I don’t have any stocks in AvePoint and I don’t work personally for them, this is just free publicity because I believe this product can solve many problems at once. 😉
[Update 05/03/2015]
There are quite a few other products which implement RBS in SQL Server and are used for SharePoint platform. Another great tool with very similar functionality as the one described in this article is: Metalogix StoragePoint File Share Librarian.
I leave you to discover it for yourself
How does it work?
AvePoint has taken the normal SharePoint document library and has overloaded it with additional methods.
SharePoint features are deployed at the farm and site collection level to use this special document library.
In the backend, an implementation of RBS (Remote Blob Storage) by AvePoint is used. The prerequisite for this is the Enterprise Edition SQL Server in order to use RBS.
A new stub database is created with pointers to each documents hosted on the file servers.
A first synchronisation import all the file system metadata of the documents into the library. Optionally NTFS permissions of files and folders in the file shares can be transformed into SharePoint permissions however this is not advised as breaking permissions inheritance for a large number of documents is known to cause performance issues.
Subsequent synchronisations can be planned to make any changes in the file servers or SharePoint available on the other platform.
The synchronisation is using an Active Directory service account which have access to the entire file shares.
The documents coming from the File Shares are seen as normal SharePoint documents with all the great functionalities available to them:
- Versioning
- Metadata
- Approbation workflow
- Pre-visualisation in Office Web Apps…etc
The UNC path is not exposed on the SharePoint web interface.
For the end-users, there is no difference between the normal SharePoint document libraries and those hosting data on file shares.
An exception to this is files larger than the known 2 GB limit. For those, a link under the form of an UNC path is provided in the library. In this case, the normal SharePoint functionalities explained above are not available.
The challenges left
Information governance
The synchronisation allow the documents to be changed both on the file shares and also SharePoint, however it may be a good idea for certain part of the file share infrastructure to incite users to use SharePoint to do modifications and make file shares accessible as read only.
What about permissions? Should I continue to maintain NTFS permissions with the existing procedure or should I switch to SharePoint right management?
Simplify the granularity of the permissions!
Make great chunks of your file shares as read only and management contributor access through SharePoint. This way, the normal functionalities such as file versioning, check in, check out, approbation workflow will be used.
Backup/restore and Disaster recovery
The procedures created originally for File servers and SharePoint farms recovery will need to be revised in order to take them into account.
References
For more information about the DocAve Connector, cf. the product page:
http://www.avepoint.com/products/sharepoint-infrastructure-management/sharepoint-file-share-integration/
Same, same but different: AvePoint File Share Navigator Online.
This seams to offer very similar functionality as the DocAve Connector (File Share Integration) but it is designed to exposed file share via Office 365/SharePoint Online.