Here are some ideas when making SharePoint operational maintenance a reality
Quick reminder of the difference between monitoring and reporting
- Monitoring provides information about a specific component in near real time. Use to performance counters with threshold to prevent failures.
- Reporting provides information related to an entire platform and not just individual elements. Usually, this information is gathered after a lapse of time. Examples are web statistics reports.
Start by defining maintenance tasks
For each task, agree on a frequency and estimated duration.
Example of maintenance tasks in a SharePoint environment:
- Basic check of the SharePoint functionality (home page)
- Monitor Windows Event logs
- Monitor IIS logs
- Monitor ULS logs
- Monitor SQL logs
- Monitor indexing logs
- Monitor search logs
- Check the physical environment where servers are located (Access to the server room, Temperature and humidity, network hardware)
- Check that backups have been successful
- Monitor free disk space on servers
- Monitor system resources (CPU, RAM …etc.)
- Monitor network state
- Review SLAs from the previous week
- Check SQL Server maintenance plans
- Manage second level recycle bin
- Check index fragmentation and run DBCC CHECKDB
- Check updates applying to SharePoint and Windows
- Execute the SharePoint Health Analyser and verify reports
- Check usage reports
- Check Web Analytics reports
- Capacity planning: check if the platform can take the load and forecast infrastructure updates
- Optimisation of the search application service
- Management of the security policies at the Web Application level
- Test restore procedures
- Update disaster recovery procedures (contact details of the employees, contractors and external parties, used program versions, Services Pack, Hotfix, communication plan …etc.)
An operational job plan can be defined as all automatic or manual jobs running during a given period, for example a week.
Once all jobs have been defined, it is a good idea to plan them so that they don’t impact each other.
This type of production planning can just be a spreadsheet to give an overview of what’s going on in the farm.
Examples of operational jobs:
- Incremental/continuous crawl of the search index
- Antivirus scans and signature updates
- Bare metal/File system backups
- SQL Server backup (full, transaction log or differential)
- PowerShell backup
- Logs backup
- Daily application pool recycling on SharePoint servers
- Warm-up script: PowerShell script used to crawl a defined or calculated list of high level site pages in order to compile and cache the SharePoint pages in IIS (.Net based technology).
- Definition of frequency of server reboot
- SharePoint WFE server reboot
- SharePoint application server reboot
- SQL Server rebootFull crawl of the search index
- Monitoring going off-line during certain periods to avoid unnecessary alerts.
Establish the different roles involved
The IT system teams consists of a network of support professionals.
You should be able to define the escalation path for end-user support and incident resolution, as well as a list of the different functions for each IT teams involved around the SharePoint farms.
Below is an example of functions:
- End-user support
- Service incident resolution
- SharePoint farm exploitation
- Third-party Application Maintenance (in French: TMA) of customised development (includes minor evolution)
- Major evolution (customised development and infrastructure)
- Change management
- User training
Use a RACI chart to give responsibility and scope of each team
|RACI Chart identifies who is Responsible, Accountable, Consulted and Informed|
|Responsible||Those who do work to achieve the task, there can be multiple resources responsible|
|Accountable||The resource ultimately accountable for the completion of the task- there must be exactly one A specified for each task|
|Consulted||Those whose opinions are sought. 2 way communication|
|Informed||Those that are kept up-to-date on progress. 1 way communication|
Your RACI chart can either be based on:
- Your own set of roles
- ITIL only
- Or a mix of both!
Usually it involves go-live activities, release management as well as run/support tasks.
Example of standard ITIL roles includes
|Business Relationship Manager|
|IT Steering Group (ISG)|
|Service Portfolio Manager|
|Service Strategy Manager|
|Information Security Manager|
|IT Service Continuity Manager|
|Service Catalogue Manager|
|Service Design Manager|
|Service Level Manager|
|Change Advisory Board (CAB)|
|Emergency Change Advisory Board (ECAB)|
|1st Level Support (operators and helpdesk)|
|2nd Level Support (= level 3 experts)|
|3rd Level Support (= suppliers)|
|IT Operations Manager|
|Major Incident Team|
|Service Request Fulfilment Group|