Production support

Last updated

Production support covers the practices and disciplines of supporting the IT systems/applications which are currently being used by the end users. A production support person/team is responsible for monitoring the production servers, scheduled jobs, incident management and receiving incidents and requests from end-users, analyzing these and either responding to the end user with a solution or escalating it to the other IT teams. These teams may include developers, system engineers and database administrators.

Contents

The importance of production support

In order to understand the importance of production support, one needs to take a few factors into account.

From the factors listed above, one can see that the way in which production support is managed is extremely crucial.

Production Support Steps

The major steps for Production Support are as below. These Production Support steps are in context of the Batch processing.

Recording Production Error

Usually a batch job or group of related batch jobs (schedule/stream) runs to accomplish one or more business functions. These batch jobs run unattended and normally complete without any errors or issues. However, sometimes the batch job can have a break/interruption/abend/abort. There could be several reasons why a job could abend.

When a job abends, it can send out an automated alert notification via e-mail, page, text. Also, data center or operations team is also actively monitoring the jobs. They also send alert notification using e-mail, page, text or they can call the on call person responsible for the recovery of the abended job.

The on call person acknowledges the e-mail, page, text or phone call for the abended job. The on call person also records the abended job details in a production issue tracking system. Sometimes, the abended job automatically records the job abend details along with job standard list (job log) in a production issue tracking system. The abended job details (job standard list, error log files, etc.) are available in the production job scheduler tool. The Production issue tracking tool creates a request number and this request number is given to the support team. This request number is used to track the progress of the production support issue. The request is assigned to on call support team person.

Notification of Production Error

For critical Production Errors (e.g. Production job is in critical path and is likely to delay the batch completion SLAs and if the Production error is impacting business data), an e-mail is sent to entire organization or impacted teams so that they are aware of the issue. They are also provided with the estimated time for Production error recovery.

Investigation or Analysis of Production Error

The Production support team on call person collects all the necessary information about the Production error. This information is then recorded in the Production error tracking tool using the correct support request number previously assigned. All the details such as data, environment, process, program logic that failed is used in the investigation. Production batch job, program used or any tool/utility used is reviewed for any possible errors.

Resolution of Production Error

If similar Production error occurred in the past then the issue resolution steps are retrieved from the support knowledge base and error is resolved using those steps. If it is a new Production error then new Production error resolution steps are created and Production error is resolved. The new Production error resolution steps are recorded in the knowledge base for the future usage. For major Production errors (critical infrastructure or application failures), a phone conference call is initiated and all required support persons/teams join the call and they all work together to resolve the error. This is also called as an Incident Management. If a problem occurs repeatedly then it is recorded and tracked using appropriate tools and processes until it is resolved permanently. This is also called as Problem Management. The issue is closed only after the customer or end user agrees that the problem is resolved.

Production job/program code correction

If the Production error occurred due to programming errors then a request is created for the Development team to correct programming errors. Problem is identified, defined and root cause analysis is performed. The programming error is fixed using normal SDLC process - analysis/design/programming/QA/testing/release. The new version of the Production job/program is deployed and verified/validated.

Production Process correction

If the Production error occurred due to job/schedule dependency issues or sequence issues then further analysis is done to find the correct sequence/dependencies. The new sequence/dependencies are verified and validated in test environment before Production deployment.

Infrastructure Issue correction

If the Production error occurred due to infrastructure issues then the specific infrastructure team is notified. The infrastructure team then implements permanent fix for the issue and monitors the infrastructure to avoid same error again.

Production Support Billing

If the Production error occurred due to unexpected consequences of infrastructure changes then most often the infrastructure team is not able to bill the time spent in resolving of the issue at the full rate. In some cases hours are completely disqualified from being billed.

Production Support - Follow up and Reporting

The Production error tracking system is used to review all issues periodically (daily, weekly and monthly) and reports are generated to monitor resolved issues, repeating issues, pending issues. Reports are also generated for the IT/IS management for improvement and management of Production jobs.

See also

Related Research Articles

Help desk resource intended to provide information and support related to an organizations products and services

A help desk is a resource intended to provide the customer or end user with information and support related to a company's or institution's products and services. The purpose of a help desk is usually to troubleshoot problems or provide guidance about products such as computers, electronic equipment, food, apparel, or software. Corporations usually provide help desk support to their customers through various channels such as toll-free numbers, websites, instant messaging, or email. There are also in-house help desks designed to provide assistance to employees.

Source Mage Linux distribution

Source Mage is a Linux distribution. As a package is being installed, its source code is automatically downloaded, compiled, and installed. Source Mage is descended from Sorcerer.

Technical support service of resolving technical problems for end users of an organizations products or services, often remotely

Technical support refers to services that entities provide to users of technology products or services. In general, technical support provides help regarding specific problems with a product or service, rather than providing training, provision or customization of product, or other support services. Most companies offer technical support for the services or products they sell, either included in the cost or for an additional fee. Technical support may be delivered by phone, e-mail, live support software on a website, or other tools where users can log an incident. Larger organizations frequently have internal technical support available to their staff for computer-related problems. The Internet can also be a good source for freely available tech support, where experienced users help users find solutions to their problems. In addition, some fee-based service companies charge for premium technical support services.

Dependency hell is a colloquial term for the frustration of some software users who have installed software packages which have dependencies on specific versions of other software packages.

A bug tracking system or defect tracking system is a software application that keeps track of reported software bugs in software development projects. It may be regarded as a type of issue tracking system.

Business software is any software or set of computer programs used by business users to perform various business functions. These business applications are used to increase productivity, to measure productivity and to perform other business functions accurately.

A network administrator is the person designated in an organization whose responsibility includes maintaining computer infrastructures with emphasis on networking. Responsibilities may vary between organizations, but on-site servers, software-network interactions as well as network integrity/resilience are the key areas of focus.

Mantis Bug Tracker free and open source, web-based bug tracking system

Mantis Bug Tracker is a free and open source, web-based bug tracking system. The most common use of MantisBT is to track software defects. However, MantisBT is often configured by users to serve as a more generic issue tracking system and project management tool.

An issue tracking system is a computer software package that manages and maintains lists of issues. Issue tracking systems are generally used in collaborative settings—especially in large or distributed collaborations—but can also be employed by individuals as part of a time management or personal productivity regime. These systems often encompass resource allocation, time accounting, priority management, and oversight workflow in addition to implementing a centralized issue registry.

A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

Capacity management's primary goal is to ensure that information technology resources are right-sized to meet current and future business requirements in a cost-effective manner. One common interpretation of capacity management is described in the ITIL framework. ITIL version 3 views capacity management as comprising three sub-processes: business capacity management, service capacity management, and component capacity management.

Software project management is an art and science of planning and leading software projects. It is a sub-discipline of project management in which software projects are planned, implemented, monitored and controlled.

The Spring Framework is an application framework and inversion of control container for the Java platform. The framework's core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform. Although the framework does not impose any specific programming model, it has become popular in the Java community as an addition to the Enterprise JavaBeans (EJB) model. The Spring Framework is open source.

Incident management (IM) is an IT service management (ITSM) process area. The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. 'Normal service operation' is defined here as service operation within service-level agreement (SLA). It is one process area within the broader ITIL and ISO 20000 environment.

Performance engineering encompasses the techniques applied during a systems development life cycle to ensure the non-functional requirements for performance will be met. It may be alternatively referred to as systems performance engineering within systems engineering, and software performance engineering or application performance engineering within software engineering.

A process is a program in execution. An integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, the OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the OS to exert control over each process.

Problem management is the process responsible for managing the lifecycle of all problems that happen or could happen in an IT service. The primary objectives of problem management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents, and to minimize the impact of incidents that cannot be prevented. The Information Technology Infrastructure Library defines a problem as the cause of one or more incidents.

SupportDesk is a service desk software tool for both internal business use and for supporting external customer bases. It offers a set of tools for IT support staff and help desk engineers, allowing them to log and categorise calls, assign tasks, monitor progress and track purchase orders. It has an open design which can be extended.

BOSH (software) application life-cycle management and deployment tool

BOSH is an open-source software project that offers a toolchain for release engineering, software deployment and application lifecycle management of large-scale distributed services. The toolchain is made up of a server and a command line tool. BOSH is typically used to package, deploy and manage cloud software. While BOSH was initially developed by VMware in 2010 to deploy Cloud Foundry PaaS, it can be used to deploy other software. BOSH is designed to manage the whole lifecycle of large distributed systems.

Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center This includes Business service management and planning for the future.