The blackout…revisited

This year has started with some frights for all of us who have responsibilities in secure operations in electric power grids. There is, on one hand, the Israel Electric Authority event. On January 27th we find headlines like these, from Fox News:

img5

Apparently the day came when someone had activated, at last, the Doomsday button and sent Israel, or was close to, to the middle ages. However, reality ended up being more prosaic and Apocalypse prophets had to sheathe again their keyboards once it was confirmed that, in the end, it was a case of ransomware in equipment belonging to a typical IT network, infected by the not-so-elegant phishing technique. Furthermore, as I am reading, the partial loss of electric supply on some clients could be attributed to the deliberate decision of personnel in charge of the grid operations who would have preferred to disconnect some load, instead of facing a complete network collapse. Moreover, it has been stated that operators reacted that way under the conviction they were under attack in a moment when the demand was growing at a high rate because of the low temperatures.

[Read more…]

Stuxnet: lessons learned?

It is often admitted that ignorance is bliss. Conversely, another universal truth states that knowledge is power. I guess that we all have to find our place somewhere between both end members to keep our mental stability. However, when it comes to ICS cybersecurity knowledge is a must. The dark side lack of industrial processes know-how is the finger that plugs the hole in the dike, preventing the flood. But, as in the Dutch tale, this won’t last forever.

Security by obscurity paradigm is still firmly attached to many ICS managers’ views. This approach is absolutely unacceptable once you get even the slightest notion of the threat’s nature. Still, it is widely believed that the child will indefinitely keep on plugging the dike with his little finger.

Industrial processes knowledge and Stuxnet

Much has been said and written on Stuxnet, mainly because of its astounding refinement. However (please keep in mind that I’m an electrical engineer), what really shocks me is not its being the first malware specifically designed to attack an ICS, but the high degree of target’s physical processes knowledge involved.

Stuxnet was probably aimed to disrupt Iran’s uranium enrichment program. This material is used as fuel for nuclear power plants and in nuclear weapons construction. Uranium occurs in Nature as a mixture of two isotopes, 235U (lighter) and 238U (heavier). Natural concentration of 235U is under 1% whereas a concentration above 90% is required for military purposes (well over the 20% required for nuclear power plants). Enrichment process is usually performed inside centrifuges, machines that spin uranium gas at high speed. The heavier isotope experiences higher centrifugal acceleration and moves towards the spinning cylinder periphery while the lighter fraction (the useful one) remains by the rotation axis, where it is collected to be sequentially processed over and over again to reach the desired concentration. This is a very slow process because of the isotopes small atomic weight difference (3 atomic weight units to 238). In order to get enough material for the construction of a single device lots of time (or lots of money to provide a larger number of centrifuges) is required. To adjust the centrifuges speed frequency converters are used to drive the electric motors.

From this starting line, Stuxnet designers move on to tailor their strategy to the selected target: Iranian enrichment facilities. They tuned Stuxnet to pray on ICS with Siemens control systems (Simatic PLC and WinCC SCADA software) monitoring frequency inverters manufactured by Vacon (a finnish maker) and Fararo Paya (an iranian one). In addition, a number greater than 33 of this devices should be in place for the facility to qualify as a target. What is more, frequency settings should range from 807 Hz to 1,210 Hz (typical enrichment process settings).

Now comes my favorite part: choosing the attack mechanism. Once Stuxnet gains command and control of the system, it disrupts normal operation by forcing centrifuges spinning frequency (ie speed) to increase up to 1,410 Hz (above the maximum normal operation), then drop sharply to a value as low as 2 Hz (almost a complete halt, although machines rotating speed could depend on other factors), and then accelerate again to 1,064 Hz before returning to the normal operating sequence. This disruption repeats in predetermined cycles that involve, say, running at 1,410 Hz during 27 days, then a good dive down to 2 Hz and then 1,064 Hz again (resulting in good uranium shake) for another 27 days, and so on and on. This behavior does not affect all the devices in the same group, at least not simultaneously, but only a certain subset (to make malfunction detection harder, I guess). The goal is not only to disrupt the process. The goal is not to destroy the computer, or delete the program running on the PLC or any other action apparently definitive but, in practice, easily detected and fixed. These kind of actions would have warned the operator and the outcome would have been much restricted. It is obviously much more effective to cause the facility to produce uranium out of specifications for months for no apparent reason, something much more difficult to fix and therefore more harmful. The malware was intended to hide itself and its effects, so alarms logs and historians didn’t keep record of abnormal operation. Also, Stuxnet was able to reinfect a device in the event of original code being reloaded by the operators.

Lessons learned?

Beyond the initial shock (think of we poor ICS designers and operators and our broken safety feeling), there are some valuable lessons to be learned.

Much of the skepticism (real or fake) put up by ICS managers lays on their lack of understanding of the attack vectors and attacker’s goals. As far as they can see, a cyberattack is the digital equivalent of a carpet bombing, a blind and useless act of destruction. However, life ain’t that simple. As we can see from the Stuxnet case is that it is possible, given enough knowledge of an industrial process, to perform very sophisticated attacks aimed to cause much more damage in ways very difficult to fix: wastewater treatment plant discharges out of legal limits. An increase of non-compliant product resulting in economic cost and loss of reputation as it can be associated with missed deadlines or, worse, out-of-specs product reaching customers unnoticed (after all, statistical quality control samplings are designed to fit the ‘natural’ variability of our non disrupted process, so our standard sample size or sampling frequency may be inadequate to detect the situation). Or erroneous billing owing to altered data from meters (obviously up to destroy our reputation). What is worse, when we turn to our ICS it keeps telling that everything’s OK. And, let’s face it, how long will it take until you realize that maybe our control system has been hit by a cyberattack? After all, it is a risk that you have always dismissed…

You might regard yourself as a ‘good guy’.  You have lots of friends, a good reputation and no enemies. But, even if that’s true, in a global market where processes and machinery are increasingly standardized, who can tell for sure that you will not be victim of malware designed to hit a third party who just happens to share equipment or methods with you? Just one example: centrifugal machines originally used in olive oil production are now used to dewater sludges from (waste or fresh) water treatment plants.

Too often I hear arguments like ‘our system is not connected to the Internet’. Aside from the healthy skepticism that such an statement shall raise, we must remember that Stuxnet was designed to copy itself, whenever possible, by means of portable devices such as USB memory sticks or those PC used to load projects to PLC.

What every engineer knows

A short time ago the information used in industrial process engineering was paper-based and access to it was very limited: technical books purchase, suppliers technical data provided by salesmen, documentation gathered in specific technical courses, and so on: photocopies, photocopies, photocopies. The Internet has changed it all. Today we have almost unrestricted access to highly detailed and specific documentation from our very desktop PC. I remember the time when every engineer kept large amounts of information in various media. This personal treasure contained much of our knowledge and ability to design and its loss was a risk we could not afford. Today that is no longer necessary. The Internet is the ultimate file: pdf, pdf, pdf.

It’s amazing how much stuff about the processes and equipment in use in any given company lies out there, waiting to be discovered when appropriately searched. There is much more than technical documentation in the form of catalogs and data sheets. Providers often describe their references and success stories, sometimes in great detail, in multiple places: their websites, technical magazines, at conferences and fairs…The companies themselves sometimes release papers on their own state-of-the-art developments (or anything they are proud of). Constructors companies do just the same. When it comes to public infrastructures, Administrations grant tenders access to lots of detailed stuff. At least in Spain, some public projects are made available to every citizen to allow people to made allegations against the future infrastructure. You can find even undergraduate and post-doc academic works sponsored by companies regarded as critical operators: I have found academic papers describing the protocols and communication networks and control systems in real power grid substations of named companies. Sooner or later all this information makes it to the Internet.

Keeping this in mind, the next question is: do you really want to build your cybersecurity any longer on the idea that no one knows your industrial process?

Final note: specific data on Stuxnet and its performance is taken from the report published by Symantec (authors Nicolas Falliere, Liam O’Murchu, and Eric Chien) available here.

iSOC: A new concept in cibersecurity

I’m sure we all have sketched a smile when seeing photographs or videos of strange artifacts powered by steam or internal combustion engines. Most of these inventions failed because the technology used didn’t fit the purpose they were meant for. Seen with modern eyes it’s just so naïve to try to fly a plane fitted with a steam engine. However, those inventors were not as dumb as their failure suggests. They were intelligent people (or, let’s say, not less intelligent than average) trying to solve the problems of their time with the technology available.

Steam powered the beginning of industrialization, granting people access to energy at a scale never seen before. This brand new technology lead to an engineering and cultural revolution that paved the way to our modern society. The Industrial Revolution beginnings were times of faith in progress, of unleashed collective optimism. Steam-powered machines were regarded as the key to all kind of engineering problems that remained unsolved so far.

[Read more…]

Aurora vulnerability or how to exploit knowledge of physical processes

Trying to raise awareness of cybersecurity issues among my fellow process & control engineers is a challenging task. We’ve talked about it before, making it clear how the lack of the basic notions on ICT environments and procedures turn the risks and mechanisms of attack almost inconceivable for these engineers. I mean ‘inconceivable’ sensu stricto: not something with a very low assigned probability, but something you cannot even think about because you lack the cultural background and experience to do so.

The most common response is denial, built on several fallacies that often explain this sense of security. One is the confidence in the mechanisms laid to provide physical protection of equipment: i.e. safety interlocks by mechanical or electrical devices that operate autonomously without processing or communication capabilities and, therefore, are regarded as cyberattack-proof. Somehow, in a control engineer state of mind (myself included), these systems are regarded as the last line of defense, absolutely isolated and independent of processor-based systems malfunction (even when those processors are human) and are laid to avoid damage to physical equipment caused by improper process operation.

In my own experience, design of control systems has always relied on a two-fold strategy:

  • Deployment of a higher control level based on electronic instrumentation and processing algorithms which, by their very nature, allow for a finer tuning and higher efficiency. This is a processor-based level.
  • Deployment of a lower level based on relays and electrical and mechanical actuators that enable system operation in case of control system crash-down or severe malfunction. This level is not processor-based and, as has been stated above, prevents the physical system operation under improper conditions. It relies on built-in and hard-wired electromechanical equipment.

This second level supports the claims for the virtual impossibility of physical equipment suffering severe damage, even if a malicious individual or organization takes control of the system. However, there are two facts that undermine this security paradigm:

  • I have noticed that in many brand new control systems safety interlocks are implemented through digital instrumentation readings, communication networks and control network PLCs. The aim is twofold: first, lower costs in wiring and devices regarded as redundant and, secondly, a will to leverage the greater accuracy and adaptability of digital systems. I know of some epic fail cases which rank in the tens to hundreds of thousand Euros because of this practice.
  • Interlocks and protection systems are designed to prevent damage if the process runs beyond the allowable operating conditions. But since physical systems are not explained on a 1 and 0 basis (there is a continuum of intermediate states) one should always allow a regulation deadband to prevent annoying tripping of protection devices and to account for normal measurement variability. This is achieved by setting deadband controls, hysteresis loops, tripping delays, etc…

In the first case physical protection devices are seriously compromised by their being software and network dependant. But even in the latter case it is possible, in principle, to conduct an attack planned to take advantage of this design logic and aimed to force working conditions that result in damage to physical systems. Too complicated? Vain speculation? Not really. There is at least one documented case in which this strategy was used with spectacular results: The so called Aurora vulnerability.

This is an experiment conducted at the INL (Idaho National Laboratory) in 2007 and, as far as I can see, has fallen into that limbo that lies between professionals involved in control systems and those who are engaged in information and communication technologies security: after all, to get a full understanding of the attack one must have, so to speak, a foot in each half of the field. This could be the reason that explains why news of the experiment went almost unnoticed (beyond a video broadcast by CNN that, possibly because of its spectacular nature, triggered the typical reaction of denial in those who may be directly concerned). Even the veracity of the facts shown has been intensely questioned, suggesting that pyrotechnic devices were used to enhance the visual effect!

What is Aurora all about? To put it simply: Aurora is an attack designed specifically to cause damage to an electric power generator. The thing goes like this: all generator units are (or should be) protected to avoid out-of-synchronism connection to a power grid. This is achieved by checking the waveform being generated to asses that it matches that of the power grid (within certain limits). To do that voltage, frequency and phase are monitored. Why? Because connecting to a power grid in out-of-synchronism condition will cause the generator to synchronize almost instantaneously, resulting in an extraordinary mechanical torque at the shaft of the generator, stress this device is not designed to bear. Repetition of this anomalous operating condition will cause the equipment to fail. Let’s imagine someone willing to jump onboard a moving train: We can see him running along the tracks trying to match the train’ speed and then jumping inside. If he’s lucky enough he will get a soft landing on the wagon’s floor. An alternative but no advisable method is to stand beside the tracks and grab the ladder handrail as it passes right in front of you. It is easy to see that the resulting pull is something you don’t want to experience.

However, the protective relays allow for a certain delay between the out-of-syncronism condition recognition and the protection devices action, delay set to avoid annoyance tripping. This offers a window of opportunity to force undesirable mechanical stress in the generator without power grid disconnection. You can find a detailed technical analysis of the attack and possible mitigating measures.

True, for an attack of this kind to be successful a number of pre-conditions must be met: physical system knowledge, remote access to a series of devices, certain operating conditions of the electrical system, knowledge of existing protections and their settings … These are the arguments that will arise in the denial phase. But that’s not the point.

The point is: given the degree of exposure of industrial control systems to cyber attacks (owing to several reasons: historical, cultural, organizational and technical issues), the only thing needed to wreak havoc upon them is knowledge of physical systems and their control devices. Aurora Vulnerability is a very specific case. But it should be enough to show that confidence in physical protection of equipment has its limits, limits waiting to be discovered. Regarding them as our only line of defense is a risk that no one can afford.

Can we?

By the way, the original Aurora vulnerability video can be seen below:

Industrial Control Technologies Cybersecurity. Time to wake up.

Sometimes one has to make an effort to balance opposing feelings. This is the case since I work in cybersecurity issues. I have devoted much of my career to work on public infrastructures design and construction, mainly water treatment plants. As an engineer I was in charge of industrial processes and associated control systems design: physical processes, electrical system wiring diagrams (power and control), network architectures and control components, etc. In short, the process and associated SCADA systems. I‘d like to think I did a good job.

I have witnessed the evolution undergone by those systems in the last years, which could be exemplified in something iconic: the end of traditional control panels with their red and green lights and analog gauges. I remember when I saw, for the first time, one of those old fashioned panels replaced by a 42” screen, nearly as big as it could be those days: an amazing thing to see, for sure. Now, surrounded by computer engineers, it feels like swallowing the celebrated ‘The Matrix’ red pill. From my new assignment, I can see in new light those times in which we engineers adopted all that computer technology with a kind of ‘Victorian era’ faith in progress. It’s hard to explain how it feels as I realize that, in most cases, we’ve been building castles on sand foundations. I’m becoming aware of the situation as we find more and more equipment and control systems exposed to the Internet without minimal security measures. I’m not kidding you. I’ve seen them. It’s kind a terrific moment when you fully understand that you have in your hands the power to completely stop a factory’s manufacturing process from your very desk (real case). But who can be blamed for not stopping in a red light when one has never seen a traffic light?

Now it is time to wake up. The threat looming on thousands of systems is just too real and there are no excuses allowed. Nevertheless, in most cases, the first reaction is denial or disbelief. It is easy to understand since attack mechanisms are, in most cases, almost unthinkable for those in charge of these facilities. So, where to start? Here are some tips to my fellow engineers working on the field. May be repeated like a mantra every morning:

1. The risk is real. Yes, also to me.
2. Maybe I can’t think of any reason for an attacker to aim to us. Never mind. It’s not my reasons that matters, but his reasons.
3. The size of my organization or system won’t help me, and even less compared to others. If my system is attacked I will sustain 100% damage, irrespective of my size.
4. In these cases it is worth remembering the joke about the two guys running away from an angry bear. One of them puts on his footwear in order to run faster. The other guy regards it as useless, deeming impossible to outrun the animal. Then the first guy states: “I do not want to outrun the bear, but to outrun you.”. Our first goal is not to be the easiest target of the shooting range.
5. Asking questions is a good first step. Start with this: What is the current status of my system?
6. Finally, remember: we are all responsible, in varying degrees, of the cybersecurity of the systems we work on. Think of what you do, but also of what you don’t.

Don’t keep waiting for the first blow to come. In the words of Bob Marley: ‘Wake up, stand up …