Posts Tagged ‘outage’

HP Can’t Pay Its Salespeople – Another CIO Failure?

Monday, September 14th, 2009

HP's Sales People Have Not Been Getting Paid On Time <p> <div xmlns:cc="http://creativecommons.org/ns#" about="http://www.flickr.com/photos/pixieclipx/308927905/"><a rel="cc:attributionURL" href=Being a salesperson is a hard job. More often than not they live from quarter to quarter and if you don’t make your numbers, then you end up getting shown to the door. Hewlett-Packard is a huge IT products and services company that lives and dies by the actions of its sales teams. Making sure that the sales teams get paid should be a simple task right? Think again…

The Problem With Pay At HP

HP’s CIO Randy Mott has a problem on his hands. Within HP, payments for the HP’s business-technology group sales teams are handled by a software application called Omega (which is sorta funny when you think about the fact that Omega is the last letter in the Greek alphabet – it sure doesn’t seem to have the last word on calculating pay).

Omega has been around for a long time. It was born back at Digital Equipment Corporation (anyone remember DEC?), got picked up and used by Compaq Computer Corporation, and then finally found its way into HP. It must either be a well designed program to have had that long of a life or else it’s so dang complex that nobody is willing to tinker with it.

The problem with Omega is that it is struggling to deal with HP’s growing amounts of sales data. It was never originally designed to deal with this much volume. What’s happening is that Omega is now malfunctioning. Some HP salespersons are not getting paid on time – they are ending up waiting up to seven months to get paid!

Fixing The Omega Problem

HP is aware of the problem and they’ve tried unsuccessfully to fix it. As a stopgap measure, HP has been automatically paying its salespeople 60% – 70% of what they should be getting for meeting their sales quotas. One suspects that this is being done in order to allow employees to meet house payments and keep food on the table.

In the past HP has tried to fix Omega by adding new software that was intended to smooth out the flow of sales data. However, back in November of 2008 when HP closed its books for the year, they discovered that some of the data in Omega was both incorrect and incomplete. Clearly the band-aid approach to dealing with the problem had not worked.

What Should HP Do?

The Wall Street Journal is reporting that at a HP sales meeting in 2008, one of HP’s senior vice presidents of sales, Randy Runk, got up on stage and promised the sales teams that they would all be paid on time. Clearly that is not happening.

As CIO, Randy Mott (are all HP senior executives named Randy?) is responsible for fixing this problem – he’s already let it go on for far too long. Let’s say what HP is clearly not willing to say themselves: Omega’s time has come and gone and it must be replaced.

Mott needs to do two things immediately: he needs to campaign to have all HP sales reps who’s compensation is handled by Omega to be automatically paid at the level that they would be if they met their sales quotas. This should continue until the Omega issue is resolved. Any bonus will be calculated and paid with interest once the Omega issue has been resolved. Doing this will calm the sales forces and prevent them from leaving en-mass.

Next, Mott needs to start building a new replacement system for Omega. Why this has not been started already is beyond me. You would think that when HP took a look at the 6,500 that they were using and decided to slim it down to just a mere 1,500 systems, that Omega replacement would have been identified as being a high priority.

Based on the age of the Omega application, I’m willing to bet that its a single monolithic solution. Clearly a modular design is called for. I’m also willing to bet that HP doesn’t have a clear idea of all of the things that Omega does. No problem, if Mott and his IT team sits down with HP’s finance team they can come up with system requirements that may be much simpler than the twisted requirements that Omega now implements.

Final Thoughts

Randy Mott has been doing some amazing things at HP. However, somehow the way that he’s been prioritizing what his IT teams need to be working on skipped over the Omega problem. Clearly the prioritization of IT projects needs to be re-looked at.

Mott needs to take immediate steps to resolve the problems that this IT issue has created and then he needs to fix the problem once and for all by creating a replacement solution for the out-of-date and overworked Omega system. If he can do this quickly, then he will have found a way to apply IT to enable the rest of the company to grow quicker, move faster, and do more.

Click here to get automatic updates when The Accidental Successful CIO Blog is updated.

What We’ll Be Talking About Next Time

Innovation, innovation, innovation – everyone wants it, but nobody seems know know how to get and keep it. CIOs are under a lot of pressure to do more with less these days and being able to nurture an environment of innovation sure would help. The trick is HOW to do this…

PayPal Outage Points To CIO Failure

Wednesday, September 2nd, 2009
Paypal's CIO Hasn't Been Doing His Job Correctly

Paypal's CIO Hasn't Been Doing His Job Correctly

The basic job of a CIO is to ensure that a company’s IT infrastructure operates smoothly and allows the company to conduct business. On Monday, August 3, 2009, PayPal’s CIO failed at this most basic of jobs.

A quick check of PayPal’s senior management structure reveals that they don’t have a CIO position (which in of itself is rather amazing), but Ryan D. Downs is their Senior Vice President, Worldwide Operations and so he’s their de facto CIO. What went wrong Ryan?

The Facts Behind The Failure

On Monday, August 3rd, Paypal experienced a world-wide outage that affected all of their customer facing systems. The effect of this outage is that millions of Paypal’s customers who rely on them to approve and complete financial transactions were unable to do so. This was a long outage – it started at 1:30 pm EST and lasted to until at least 6:30 pm EST.

Paypal is attributing this outage to “internal” issues.

Paypal is a huge business. In the most recent quarter, Paypal handled $16.7B in customer online commerce transactions. In the past the company has stated that they normally handle $2,000 in online transactions every second. Just in case you are doing the math, this means that this outage prevented at least $36M worth of business from happening.

What The CIO Did Wrong

I have no magic insights into what went wrong at Paypal, but it’s pretty easy to make a guess. Back in 2005, customers got shut out of Paypal for about 5 days when a software update went very, very wrong. I’m willing to bet that some sort of update process got away from them once again. This is just sloppy IT work.

This is exactly the type of basic “blocking & tackling” that CIOs have to get taken care of as part of building a solid IT foundation. Clearly this has not been done at Paypal.

The reason that this is such a scandal is that its happened at Paypal before. Once a problem is known, the CIO needs to step in and make sure that it will never happen again. We’re not just talking about establishing a fail-safe update process, but also making whatever changes are needed to the Paypal infrastructure in order to make sure that problems like this can’t ripple throughout the system.

Additionally, creating a process for rolling back changes is critical. If a bad change slips though the system and starts to go into production, you need to have the ability to get the system back to the way that it used to be.

Final Thoughts

Major outages like this reflect badly on all CIOs. There should be no reason that a outage like this should be allowed to happen especially since Paypal has had problems like this in the past. Paypal can’t claim that they didn’t have enough funding to prevent this problem – they are the fastest growing part of the eBay corporation.

In the end it all comes down to planning. Finding the time to gather the right people to run through “what if” scenarios and then following through with the recommendations that come out of these meetings is what every CIO needs to do. If Ryan takes the time to do this, then he will have found a way to apply IT to enable the rest of the company to grow quicker, move faster, and do more.

Click here to get automatic updates when The Accidental Successful CIO Blog is updated.

What We’ll Be Talking About Next Time

Hewlett-Packard is a huge IT products and services company that lives and dies by the actions of its sales teams. Making sure that the sales teams get paid should be a simple task right? Think again…

London Stock Exchange Glitch – Could Cloud Computing Have Saved The Day?

Monday, September 15th, 2008
A software error shut down the London Stock Exchange for a day.

A software error shut down the London Stock Exchange for a day.

Just when you think that you have the worst job in IT, a story like this comes along! Last Monday the London Stock Exchange (LSE) experienced a full day outage. Traders who were ready to trade were unable to connect to one of the LSE’s main trading applications. No connect, no trading.

If you’ll think back about a week or so, you’ll realize that Monday was a very important day in stock trading land. The U.S. Government had just stepped in to shore up Fannie Mae and Freddie Mac. What this meant is that over in London, there were lots of traders who wanted to buy/sell British bank stocks because of what they thought the impact of this move would have on British stocks. However, for a full day nobody could trade anything!

The LSE uses a trading program called TradElect which is a 15 month old proprietary application that they’ve build using Microsoft technology. It appears that the traders were unable to connect to this application and that is why everyone experienced the outage.

The big question is why? Their trading volume grew too quickly and caused their software/hardware capabilities to be exceeded. Although the LSE is not talking, we can probably take some educated guesses at to what went wrong here. Since TradElect has been in service for 15 months, it’s probably not the fault of the functionality of the application. Additionally, since the problem lasted the entire day, clearly the IT team was unable to revert to a previous version of the application in order to fix the problem – so no “upgrade gone wrong” problem here. My guess is that this is an old fashion “too much volume” problem.

I almost hate to use the term, but could “cloud computing” be the solution for the LSE? Specifically, should they design their apps to run on their servers in their data center but build in an option to expand to additional servers located in some secure cloud in the event that there is a surge in trading like (tried to) happened on Monday? You can never guess at exactly how much computing capacity that you’ll need and perhaps this is where the brave new world of cloud computing can shine. Maybe this is a question that the next LSE CIO will have an opportunity to answer…

Have you ever had a problem where one of your applications get overwhelmed with too much user volume? Did the app go down or just stumble? What did you do? Probably even more importantly, what changes did you make later on to prevent the situation from happening again? Leave a comment and let me know what you think.