Security Ranch Security Ranch

February 15, 2021

Firewall Policies for Industrial Control Systems

Filed under: Uncategorized — Tags: , — Ken @ 8:04 pm

Firewall Policies for Industrial Control Systems

            For this paper, the term Industrial Control Systems will be used as a generalized term for Supervisory Control and Data Acquisition Systems (SCADA), Programmable Logic Controllers (PLC), and Human Machine Interface (HMI).  ICS is the technology that connects the Information Technology (IT) world to the Operational Technology (OT) world (Bodungen, 2017).  It is being used every day to run power companies, oil refineries, space technology, and manufacturing plants.

Before this technology, when a company wanted to adjust a machine they would need an employee to manually change a machine until the desired outcome was reached.  Eventually in the 1960’s some of this technology was being connected to the popular mainframe computers from that time.  When personal computers started becoming popular in the 1990’s companies started to want to have more control and management of the OT technology.  ICS were used to connect the IT/OT on internal networks and later the internet.  With those connections came a host of new problems that Control Systems Engineers and Computer Engineers never had to deal with before.  All of the threats and vulnerabilities on the internet were introduced to this new technology.  On the one hand, you had vulnerabilities from the web, and on the other hand, you had all of the vulnerabilities that were on the ICS exposed to the world.  To be sure, those vulnerabilities have been there from the get-go, but they were able to be mitigated by workarounds and the fact that they were isolated from most external threats.

Another unique issue with ICS technology is that a lot of the systems are still using the original technology that was used when it was created.  Those systems have never been updated or upgraded.  Most of the time this is due to the nature of the OT requiring extremely high availability.  Imagine if a power plant had to shut down for even a little while to upgrade its equipment.  In reality most, companies never upgraded their equipment because it “just works.”

So, when the equipment that is used by companies become 20 or 25 years old and get connected to the internet, there are some interesting considerations that IT security professionals need to understand to secure those technologies and not brick them in the process.  If you had a computer from 20 years ago, it would be considered ancient today and would likely be almost unusable for what most people use computers for now.  One example of how these systems are unique is that most of the older technology has just enough power to run what it is supposed to do and no more.  Most “best practices” would say that you want to encrypt all of your communications between your devices and computers.  With ICS that may not be possible.  Industrial Control Systems are unique and therefore will need special considerations when developing security policies. This paper will discuss four different types of security policies out of the possibly hundreds of types out there.  The first policies are firewall policies.

Firewall are a major component of any network, and for ICS network it is no different.  Firewalls are used to filter the desired traffic from the undesired traffic.  One of the primary uses of firewalls is to provide for network segmentation and plays into the broader defense in depth strategy.  A popular model on how to segment ICS networks logically is the Purdue Enterprise Reference Architecture (PERA) or more commonly the Purdue Model (Bodungen, 2017).  The Purdue Model divide a network into six different levels labeled Level 5 thru Level 0.  Layer 5 is the top layer and is the Enterprise Layer.  This is the level that a corporate office and its network operate on.  Level 4 is also an enterprise level, but it is more for branch offices and the physical locations of where the equipment is located.  Layer 3 is the ICS-DMZ.  The Demilitarized zone is used the same way that a DMZ would be used on web applications.  This is the level where SCADA systems are located so they can be accessed by both the enterprise offices and they can communicate with the components below them.  Level 2 is the Area Supervisory Control and is where some components like PLCs and HMIs are located.  Level 1 is where most of the PLCs are located and where most of the actual controlling of the OT takes place.  Level 0 is the OT equipment (Bodungen, 2017).

There are eight overall goals when developing the security policies and rules for firewalls.  The first goal is to eliminate all direct connections from the internet to the process control network (PCN)/SCADA network otherwise also as the Level 3 ICS-DMZ in the Purdue Model (CPNI, 2005).  The second goal is to restrict access from Level 5 to Level 3 and below.  Very few, if any employees should have access to the lower levels of the ICS network.  As a best practice, this is an example of “least privilege” and “need to know” (CPNI, 2005).  Goal three is to allow but restrict access to the Level 3 by the Enterprise Level 4 and 5.  That access should also be restricted to only the servers and devices that are needed for compliance reasons like data historians and maintenance databases (CPNI, 2005).  Goal four is to secure remote access to the control systems.  Occasionally, third parties such as vendors or contractors will need access to the control systems.  This could be for emergency maintenance reason or for upgrading systems.  There should also be separate policies and firewall rules for vendors and contractors.  The fifth goal is to secure all wireless connections.  The sixth goal is to develop well-defined rules for what traffic will be allowed thru the firewall and what protocols are allowed.  The IT department will need to create Access Control Lists (ACLs) and ensure that the principle of “least privilege” and “need to know” are used (     ).  The seventh goal is to secure the connection between the firewall and management.  This is so that system administrators can monitor all traffic over a secure connection by using highly restricted management servers.  The last rule is to monitor all traffic and scan for any unauthorized protocols or unusual activity.  This can be achieved by using a firewall or an Intrusion Prevention/Detection System (IPS/IDS).

IDSs and IPSs are useful because they can give you greater flexibility in what to do with the traffic.  Firewalls generally will only deny, drop, or allow traffic based on the rules that were written.  If the telnet protocol is blocked, then no telnet traffic will be allowed to pass the firewall.  IDS/IPS, on the other hand, can block the connection and alert it IT department if any authorized activity is attempted on the network.  There are three primary detection methodologies used by IPS/IDSs.  The first detection method is the Signature-based detection method.  The signature can come in many forms, but some commons signatures are unauthorized protocols or unauthorized names.  For example, the name root should probably never be used as it is a well-known name and many hackers would try to use that name when logging in.  Other signatures could be from file names that would likely be filled with malware.  One area of information security that studies and creates signatures is Threat Intelligence.  Threat analyst will study malware or cyber attacks and create Indicators of compromise (IOC) that can be programmed into firewalls and IDS/IPS to stop attacks before they can happen.  This detection method is good for known threats (Scarfone, 2007).

The second detection method is the anomaly-based detection.  Anomaly-based detection can detect unusual activity that deviates from an initial baseline profile.  If the regular working hours are from 8 a.m. to 5 p.m. and activity is detected at midnight an alert would be triggered and either record the event in a log file or block the connection entirely (Scarfone, 2007).

The third detection method is Stateful protocol analysis.  Stateful protocol analysis is the latest detection method and is suitable for deep packet inspection as it can remember the “state” of the connection while inspecting the packets.  This is good for connections that have to be authorized.  When a user attempts to make a connection and is authenticated, the network device will remember that the connection was authorized and what was authorized.  While the stateful analysis IDS/IPS are the most capable the downside of using this is that it is extremely resource intensive and could slow network traffic down if there is too much traffic (Scarfone, 2007).

The National Institute of Standards and Technology (NIST) created two critical documents that can help with creating and applying security controls and policies.  NIST SP 800-82 r2 is the Guide to Industrial Control Systems Security and contains the best practices for security policies and controls.  These controls are based on the controls presented in another vital document, the NIST SP 800-53 r4 Security and Privacy Controls for Federal Information Systems and Organizations.  SP 800-53 contains most of the security and privacy controls that you would need to build your security policies from.  IT administrators should attempt to implement the controls in SP 800-52 first and where the controls are not possible, or feasible SP 800-53 contain compensating controls that can still achieve the same or nearly the same results.

As with the Internet of Things (IoT), more devices than ever are being connected online.  This trend will likely continue accelerating for the foreseeable future.  In fact, a new buzzword is called the Industrial Internet of Things (IIoT) and is the same things except that control systems are connected to the internet.  Another new trend in the ICS world is virtualization and cloud services.  For the same reasons that businesses are all started to move to the cloud industrial control systems are going to start moving there as well.  As this happens some of the same vulnerabilities that exist in cloud services will start appearing in the ICS networks.  With those new vulnerabilities security professionals will have to find new ways to secure the ICS networks.

References

CPNI.  (February 2005).  Firewall Deployment for SCADA and Process Control Networks.  Retrieved from https://www.ncsc.gov.uk/content/files/protected_files/guidance_files/2005022-gpg_scada_firewall.pdf

Bodungen, Clint. Singer, Bryan. Shbeeb, Aaron, Hilt, et al. (2017). Hacking Exposed: ICS and SCADA Security Secrets & Solutions.  McGraw Hill Education New York: NY

Scarfone, Karen. Mell, Peter.  (February 2007).  Guide to Intrusion Detection and Prevention Systems (IDPS).  Retrieved from http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-94.pdf

Stouffer, Keith. Pillitteri, Victoria. Et. Al. (May 2015).  Guide to Industrial Control Systems Security Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r2.pdf

Joint Task Force Transformation Initiative.  (April 2013). Security and Privacy Controls for Federal Information Systems and Organizations.  Retrieved from  http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf

Cruz, Tiago.  Simoes, Paulo, Et. Al.  (July 2016).  Security implications of SCADA ICS virtualization: survey and future trends.  Retrieved from https://www.researchgate.net/publication/305725280_Security_implications_of_SCADA_ICS_virtualization_survey_and_future_trends

Auditing Industrial Control Systems

Filed under: Uncategorized — Tags: , — Ken @ 8:00 pm

Auditing Industrial Control Systems

            Some of the first Industrial Controls Systems (ICS) came about in the 1960’s when computers were first starting to become popular.  These ICS were mainly used by the old mainframe computers and were used to control machines and sensors for industries such as the oil and gas industry and the electric grid industry.  As technology increased and the size and power of computer decrease ICS evolved by becoming more integrated into just about every aspect of life.

ICS are most a general term for several different types of systems.  ICS is made up of Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), and Programmable Logic Controllers (PLC).  These three systems, along with sensors companies the power to control anything from the generation of power in the electric grid, drilling in the oil and gas industry, and in the manufacturing of raw materials like metals or plastics.

Lately, the security and weakness of the United States’ power grid have come into the spotlight due to the prevalence of cyber attacks.  An excellent example of what could happen during a cyberattack on the U.S. is the Aurora Generator Test.  In 2007, at the Idaho National Laboratory, a test was conducted to simulate a cyber attack on an electric grid.  An Aurora Generator was used during the test and was connected to a power substation.  To simulate the cyber attack, specially design code was sent to the generator to open and close circuit breakers out of sync.  When the circuit breakers were opened and closed, it created an enormous amount of stress that was enough to break parts off of the generator.  Within about three minutes of the test starting the generator had been destroyed and left smoking (Swearingen, Brunasso, 2013).  While a cyber attack destroying only one generator does not seem like a big deal, it is significant.  This experiment only applied to one generator.  Imagine if the cyber attack occurred on tens or hundreds of generators at the same time.  Within three minutes entire cities could be blacked out.  This could also cause a surge in demand on other energy stations that would short them out as well.

To prevent this from happening it is imperative to continually test the ICS for vulnerabilities and correct them as soon as possible.  Auditing can help by keeping the companies honest and preventing them from becoming complacent.  In the Marines Corps, there is a famous slogan that says, “Complacency Kills.”  What this means is that when a person gets lazy doing the same task over and over again, they begin to take shortcuts and skip steps in the daily task.  When that happens, accidents happen and people sometimes get killed.  This is the same for industrial control systems.  Auditing is becoming essential for the U.S. government and society as the U.S. relies more and more on the benefits of using ICS.

There are two organizations that specialize in protecting ICS.  The Industrial Control System-Cyber Emergency Response Team (ICS-CERT) operates within the National Cybersecurity and Integration Center (NCCIC), which is a division within the Department of Homeland Security (ICS-CERT, n.d.).  ICS-CERT, along with the NCCIC has created a document called the “Seven Strategies to Defend ICSs.

The first step calls for implementing Application Whitelisting (NCCIC, 2015).  By whitelisting what applications are allowed on the network, it can help detect malware uploaded by hackers.  This is especially helpful when the nature of the network is static and does not change much.

Step two is to ensure proper configuration and patch management.  By updating and patching the systems in a timely fashion, companies can avoid attacks that would have been easily prevented had they been patched (NCCIC, 2015).  In fact, the recent attack on Equifax was due to a vulnerability that had been patched three months before their systems were attacked (Newman, 2017).

The third step is to reduce your attack surface area.  By disabling or uninstalling any services or programs that you do not use you limit what is available to the hacker to exploit.  Companies should also isolate the ICS networks from all untrusted network to include the internet (NCCIC, 2015).

The fourth step is to build a defendable network.  By segmenting networks companies can limit the damage to networks if they are compromised.  If the whole network has access to itself, it will only take one compromised computer to affect the entire network.  However, if the network is separated into separate smaller networks, the damage will be limited to only the smaller network (NCCIC, 2015).

The fifth step is to manage authentication.  To accomplish this company should use best practices for managing authentication such as strong password policies, multi-factor authentication, and “least privilege” (NCCIC, 2015).

The sixth strategy is to implement secure remote access.  The best strategy is to not allow remote access.  If that is not possible the next best solution is to only allow for monitoring and not executing.  If users must have executable permissions, then access should be restricted to a single access point.  All other pathways should be restricted.  Again, companies should use multifactor authentication (NCCIC, 2015).

The final seventh step is to monitor and respond.  Installing Intrusion Detection Systems (IDS), Intrusion Prevention Systems (IPS), and continuously monitoring log files, companies can catch problems before they become security incident.  Companies should also develop response plans and regularly test those plans (NCCIC, 2015).

The National Institute of Standards and Technology (NIST) is also an important organization that has produced some unique reports about cybersecurity.  Specifically, the NIST SP 800-82 Revision 2 Guide to Industrial Control Systems (ICS) Security and the Framework for Improving Critical Infrastructure Cybersecurity.  The NIST SP 800-82 R2 guide discusses ICS Risk Management, ICS Security Architectures, and how to apply security controls to ICS (NIST, 2015).

NIST’s Risk Management Framework (RMF) is a six-step process that includes the steps: Categorize Information Systems, Select Security Controls, Implement Security Controls, Assess Security Controls, Authorize Information Systems, and Monitor Security Controls.  By implementing the six-step cycle RMF, you can identify vulnerabilities and select controls to mitigate those vulnerabilities based on the company’s priorities (NIST, 2015).

NIST’s Framework for Improving Critical Infrastructure Cybersecurity is also another essential document for improving security for ICS’s.  The basic framework consists of the five functions of Identify, Protect, Detect, Respond, and Recover (NIST, 2014).  Each function also consists of several categories and sub-categories that get more specific and technical.  The best part of the framework is the Informative References where a specific section of standards, guidelines, and practices are tied to each sub-category.  These references include COBIT, ISA, ISO/IEC 27001, and NIST SP 800-53 R4 and the specific sections that they apply to (NIST, 2014).  This is an indispensable guide for ICS auditors.  The framework also has a tiered strategy that can help companies understand how developed and mature their risk management practices are.  The four tiers are Partial (Tier 1), Risk-Informed (Tier 2), Repeatable (Tier 3), and Adaptive (Tier 4).  Tier 1 is primarily for companies that only have an understanding of risk management and are mostly reactive to any problems.  Tier 2 is slightly more mature where the company has formal processes in place.  Tier 3 is actively using and monitoring their risk management processes and making improvements when needed.  Finally, tier 4 is when a company has fully mature risk management processes where they have learned from theirs and other mistakes and are continually adapting their processes as the security situation changes (NIST, 2014).

There are some special considerations when working and auditing Industrial Control Systems.  It is essential to understand that many of these systems were designed and installed before the internet became common.  Often what happens is that when technology advanced enough components were “bolted on” to connect the control systems to the internet.  When that happened, the systems that were not designed to be connected to the internet became utterly vulnerable.  Some problems that became apparent were that most of the communications were sent in plain text.  Other problems were that the systems were hard-coded with default passwords that could not be changed.  While this may look like a glaring error today, five to ten years ago it was not a problem because these systems were not connected to the internet.

Even today there are issues with medical devices that are connected to the networks that have hard-coded passwords.  An alert by ICS-CERT named ICS-ALERT-13-164-01 stated that there was a vulnerability in over 300 medical devices spanning 40 vendors that had hard-coded passwords (ICS-CERT, 2013).  Another issue with industrial control systems is that the hardware specs are often only enough to run the software.  There is no room for improvement to add encryption.  Encryption and Control Systems do not mix well (Sample, 2006).  Encryption often using more bandwidth and memory than what the ICS can provide.  The final issue that also comes with auditing control systems is that they often use vulnerable software and protocols (Sample, 2006).  Windows XP stopped being support several years ago. However, many ICS still use Windows XP and are incapable of upgrading to more secure versions of windows.

Many of these vulnerabilities can be mitigated by using a proper security architecture that blocks insecure control systems from untrusted networks or the internet.  Port Management can be implemented, and any other controls can be added that makes it more difficult for hackers to gain access.  Companies can look into upgrading their control systems and where they cannot they can look into upgrading network components and software that have better capabilities.

After understanding about some of the standards and vulnerabilities of ICS a review can be done of an actual audit.  In February 2017, the NASA Office of Inspector General (IG) conducted an audit of NASA’s critical and supporting infrastructure.  The IG office has conducted this audit as well as 21 other audit reports because NASA has moved steadily moved from older isolated and manually controlled operational technology to more modern technology where systems are controlled over networks (Leatherwood, 2017).  What the IG office has found is that NASA still has several deficiencies and significant issues concerning its critical infrastructure.  There are two main concerns in the audit report.  The first concern is that NASA lacks comprehensive security planning for managing risk to its Operation Technology Systems (OT).  Examples of these OT systems are HVAC systems, tracking and telemetry systems, command and control systems.  The second concern is that NASA’s critical infrastructure assessment and protection could benefit from improved OT security (Leatherwood, 2017).

For NASA’s comprehensive security planning for managing risk to its OT systems, there are sever issues where NASA could use improvement.  First is that NASA needs to do better at defining what OT systems it has.  There was inconsistency in how the OT systems were defined when looking at several different NASA Centers.  Often these OT systems would be defined as critical infrastructure at one location and other locations not listed at all.  There needs to be consistency in how the OT systems are defined (Leatherwood, 2017).

Another finding is that NASA did not follow the NIST guidance on how to categorize its OT systems.  NASA failed to make distinctions in OT systems and IT systems.  When they fail to make distinctions, NASA ends up grouping systems with different security risks into a single group.  This makes it more challenging to make risk assessments and implement the right security controls (Leatherwood, 2017).

Awareness and training is an area that NASA could improve in.  The auditors visited five NASA centers including the NASA Headquarters and interviewed more than two dozen employees.  The auditors discovered that NASA does not require role-based OT specific training.  Most employees receive IT training, however.  By not including OT training along with the regular IT security training employee will be less able to identify vulnerable systems.  An example would be a building HVAC system.  If an employee did not recognize the HVAC system as a high-risk system, they might not take the proper steps to prevent the system from being compromised.  In this case, the employee could fail to lock or security a door that would lead to the HVAC controls.  If a hacker gained access to the HVAC system, they could shut it off potentially putting the IT systems at risk (Leatherwood, 2017).

Lastly, NASA had several very easily exploitable risks that could be prevented with administrative controls.  For the OT systems, there was a lack of internal monitoring, auditing, and log file management.  Most of the systems were checked manually by NASA personnel.  There were no controls in place to monitor physical or logical isolation from the main networks.  NASA also used group accounts.  This created vulnerability in two ways.  The first way was that by using group accounts, there was no way to know who accessed a system.  If anything went wrong, there was no way to attribute the risk to a single employee.  The second vulnerability was an insider attack.  If an employee was fired and the credentials were not changed that fired employee still can gain access to the OT systems.  Most of these issues can be identified and corrected by using some of the known best practices by implementing the proper security controls.  There just needs to be a centralized and control effort so that all of the NASA offices are using the same language and playbook (Leatherwood, 2017).

Industrial Control Systems have come a long way since the 1960s.  These systems will continue to evolve and get more complicated as time goes on.  Luckily, there are organizations that provide several excellent documents in how to protect the ICS from hackers.  There are plenty of examples of what could go wrong.  Companies just need to use what information is available and implement it in their networks.  If companies fail to take ICS security seriously, there will eventually be a significant attack on the nations critical infrastructure that could put thousands of lives at risk.

References

Swearingen, Michael. Brunasso, Steven. Et.Al.  (September 2013).  What you need to know (and don’t) about the Aurora Vulnerability.  Retrieved from http://www.powermag.com/what-you-need-to-know-and-dont-about-the-aurora-vulnerability/?printmode=1

ICS-CERT. (n.d.).  About the Industrial Control Systems Cyber Emergency Response Team.  Retrieved from https://ics-cert.us-cert.gov/About-Industrial-Control-Systems-Cyber-Emergency-Response-Team

NCCIC.  (December 2015).  Seven Strategies to Defend ICSs.  Retrieved from https://ics-cert.us-cert.gov/sites/default/files/documents/Seven%20Steps%20to%20Effectively%20Defend%20Industrial%20Control%20Systems_S508C.pdf

Newman, Lily Hay.  (September 2017).  Equifax Officially Has No Excuse.  Retrieved from https://www.wired.com/story/equifax-breach-no-excuse/

NIST.  (February 2014).  Framework for Improving Critical Infrastructure Cybersecurity.  Retrieved from

https://www.nist.gov/sites/default/files/documents/cyberframework/cybersecurity-framework-021214.pdfNIST. (May 2015).  NIST Special Publication 800-82 Revision 2:  Guide to Industrial Control Systems (ICS) Security.  Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r2.pdf

ICS-CERT. (June 2013).  Alert (ICS-ALERT-13-164-01) Medical Devices Hard-Coded Passwords.  Retrieved from https://ics-cert.us-cert.gov/alerts/ICS-ALERT-13-164-01

Sample, James.  (2006).  Challenges of Securing and Auditing Control Systems.  Retrieved from http://www.isacala.org/doc/ISACALA_SCADA_Presentation_FinalJamey.pdf

Leatherwood, James.  (February 2017).  Industrial Control System Security Within NASA’s Critical And Supporting Infrastructure.  Retrieved from https://oig.nasa.gov/audits/reports/FY17/IG-17-011.pdf

February 10, 2021

Business Continuity Planning for Industrial Control Systems

Filed under: Uncategorized — Tags: , , — Ken @ 12:06 am

This is a repost from an older blog that ended up getting the most traffic for some reason. The post is pretty specific and I think most of the traffic was academic related. Anyway, here it is. Let me know what you think.

Business Continuity Planning for Industrial Control Systems

            For anyone that is involved in Information Security or is enthusiastic about the technology, they have probably heard about the 2010 attack called Stuxnet.  Stuxnet was a cyber attack on Iran’s nuclear program and is generally acknowledged to be the world’s first cyber weapon that caused physical damage (Zetter, 2014).   A lesser-known attack, possibly the second cyber-attack to cause physical damage, occurred in 2013 in Germany.  Just before Christmas, hackers attacked the control systems of a German Steel Mill company and disrupted the control systems enough to cause a massive explosion (Zetter, 2015).  This attack is impressive because it illustrates the enormous responsibilities that companies have when they start connecting their control networks to the internet.

Industrial Control Systems (ICS) is a general term for components that connect the Information Technology (IT) world to the Operational Technology (OT) world.  Today, everyone knows what IT is but OT is a little less familiar.  OT is made up of different types of components that connect computers and servers to the devices that control operational technology.  An example of an ICS environment is the oil and gas refinery industry.  Refineries are made up of a significant number of pipes, tubes, and boilers that heat, filter, and extract crude oil and gas until it is refined enough to be used in chemicals and gasoline that we use in everyday life.  Fifty or sixty years ago before the computers and mainframes became popular refineries were running by employee’s who would manual turn dials or manipulate the control systems until the desired outcome was received.  When mainframe computer started becoming popular, these systems started being connected to internal networks.  Supervisory Control and Data Acquisition (SCADA) devices were connected to monitor and supervise the control processors that were connected to it (Assante, 2014).  Fast forward a few more years when personal computers and the internet became more readily available, and companies started to want their systems to be connected and accessed from the internet (Assante, 2014).

Just like any technology today, the control systems have always had design problems and vulnerabilities.  What changed when these systems were connected to the internet was that these vulnerabilities were now exposed to the world along with all of the common vulnerabilities that come with being connected to the internet.  ICS security is hard.  Moreover, it is hard because control systems engineers and computer security engineers have two competing priorities.  Control system engineers are solely focused on safety and uptime as priorities.  Anything that gets in the way of that is a problem for these engineers.  When security professionals want to change things, and bolt-on security software and appliances that the ICS units were never designed for there are problems (Antova, 2017).  In the years since the 2010 Stuxnet attack, these industries that are dependent on industrial control systems have taken notice and are starting to make changes to make their networks more secure.

In today’s high demand economy, manufactures, refineries and other critical infrastructure industries are running 24 hours a day, seven days a week.  These systems require more than just 99% uptime, a goal of which most companies strive for.  These companies require 99.9999% uptime.  While that might seem insignificant, the difference over the course of a year is 3.65 days to less than an hour.  Less then four days of downtime over the course of a year would not be too be for most businesses but imagine what would happen at an airport if the air traffic control tower that controls upwards of 1000 flights are more per day.  Using the example above you can understand just how vital it is for companies that need incredibly high availability to have a robust Business Continuity Plan (BCP) and Disaster Recovery Plan (DRP). Every hour of a disruption in their normal business functions could cost hundreds of thousands of dollars or more.  For the purpose of this paper, the scope will focus on oil and gas refineries in the context of BC/DR planning.

One of the first steps needed before starting to plan the BCP or the DRP is to conduct a Business Impact Analysis (BIA).  The BIA should identify all of the primary business functions and list any interruptions that could happen that would affect those operations.  Anything from supply chain issues and power outages to terrorist attacks or sabotage.  One difference between ICS environments and a typical business environment is that when things go wrong in a typical business, it would likely only effect that business.  If a critical operational function is affected in an ICS environment not only is that operation equipment is affected but depending on what that equipment is doing there could be environmental and public safety issues to consider.  Refineries often have to inform local government and give press releases describing what they are doing, why they are doing it, and how it affects the residents.

During the BIA process, the company needs to evaluate two key figures.  The first figure is the Recovery Time Objective (RTO).  The RTO defines the time required to recover the communication links and processing capabilities (Stouffer, 2015).  If the network goes down while a refinery is running the refinery will not suddenly stop with it.  It will keep running, but things will slowly start degrading from a lack of communication and control.  On a long enough timeline, there could be catastrophic consequences so the RTO is critical to pay attention to and will be used later to trigger specific events in the BCP/DRP later.  The next key figure is the Recovery Point Objective.  The RPO defines the longest point of time that a network can be down and tolerated before adverse conditions start occurring (Stouffer, 2015).

Once the BIA has been completed, and the RTO and RPO are defined the BCP planning can begin.  The BCP should take into consideration any interruption that affects operations in whatever form it may come in.  It could be natural disasters, human mistakes or equipment failures, or even terrorist events.  Each event needs to be evaluated in by a team that had a member from each discipline that is involved.  There may be a control system engineer for the ICS equipment, an IT engineer for the network equipment, a chemical or petroleum engineer for the oil and chemicals, and possibly even a safety engineer or public relations specialist.  All of these team members should report to the BCP manager who reports to an information security manager or BCP executive.  Because of the nature of oil refineries, the legal department and compliance officers will often be involved because the oil refineries will have to release public information and inform the local, state, and federal governments when required.

BCP plans can and should contain smaller plans based on each worksite or business function.  In the case of ICS networks, it will also consist of different units or systems.  Each of these systems works together to complete a particular function, and there are very specific steps in which these units must go thru when powering down or powering up.  It can often take hours or days to entirely shut down operations on a refinery unit.

In many cases, refineries will not be able to maintain normal business functions as is the case during inclement weather such as hurricanes, but the goal will be to have the least amount of impact on the business as possible.  To achieve that goal, the refinery units must be shut down and powered up correctly so that there is no damage to the unit equipment.  That would also require the companies to work with Meteorologist so that they can get the most accurate prediction possible to properly time when they need to shut down operations.

Another subsection of the BCP area called Continuity of Operations Plan (COOP).  COOP plans primarily pertain to industrials and organizations that are identified in the Presidential Policy Directive 21 (PPD-21).  In that policy directive a large number of industrials that are critical for the continued operations of the country and the Federal government.  Some of these sectors are the energy sector, chemical sector, communications sector, dams sector, defense sector, and the nuclear reactors sector.  The country relies on these industries every day.  The National Infrastructure Protection Plan (NIPP, 2013) outlines how the federal government and private sector will work together to manage the risk and become more secure and resilient (NIPP, 2013).  The significant differences between a BCP and a COOP are that BCP is primarily to the private sector and for businesses whereas COOP plans are for government organizations and designated private sectors (Swanson, 2010).  COOP plans involve more government oversight the BCP.

The resources and information available to IT security professionals are diverse.  A well-known source for the guide and best practices is the National Institute of Standards and Technology.  The Special Publication 800 Series are all focused on cyber and information security.  The three primary guides that will be used to help develop BCPs or COOPs are the NIST SP 800-53 Security and Privacy Controls for Federal Information Systems and Organizations, NIST SP 800-82 Guide to Industrial Control Systems (ICS) Security, and the NIST SP 800-3 Contingency Planning Guide for Federal Information Systems.  All three guides are excellent for helping to develop business continuity and disaster recovery plans.

Industrial Control Systems-Cyber Emergency Response Teams (ICS-CERT) is another excellent resource for ICS.  ICS-CERT provides resources and assessments to help identify vulnerabilities in ICS networks.  The organization also releases alert and advisories that work similar to the Common Vulnerabilities and Exposures (CVE) and National Vulnerabilities Database (NVD) from NIST.  Now we will look at a case study of a cyber attack on the electrical grid in the Ukraine.

Just before Christmas on December 23, 2015, a Ukrainian electrical company reported power outages to their customers.  Over 30 electrical substations went offline for three hours and left over 200,000 customers without power.  In the aftermath of the event and the investigation, it was determined that someone has entered the network and had taken the three largest substations down one right after the other with about 30 minutes between each other.  After the investigation, the Ukrainian government reported it as a cyber attack.  The Department of Homeland Security (DHS) acknowledged it and issued a formal report on February 25, 2016, and listed the event as IR-ALERT-H-16-056-01 (ICS-CERT, 2015).  The power outage only lasted about three hours, but it took a month for the control systems to come 100% back online.   This is a historical type event because this cyber attack was the first attack that caused a loss of power (Bodungen, 2017).  The attack came from malware that was planted by hackers.  The malware is known a BlackEnergy and is classified as known crimeware.  Two files called Devlist.cim and Config.bak were installed on the operating systems that control the SCADA software.  Those two files are known to kill critical parts of the operating system.  Once the operating systems were compromised the ICS devices locked up, and the SCADA software stopped working causing the blackout (Bodungen, 2017).  The vector that the hackers took in gaining access to the systems is not known, at least not publicly.  However, there are an almost infinite possible number of ways that hackers could enter a system if the computers and other ICS devices are not updated.

While there is not much public information about this attack, there are two things to consider from a BCP perspective.  The first is the initial incident.  In the case of a power company losing power, it is bad for the company and their customers.  If the power company has to compete for customers because there are several companies to choose from this could be a major impact on their bottom line.

The next issue is related to the architecture of the network and how it is segmented.  It is well known in the IT world that if hackers can gain access to one computer or admin account they can move laterally thru the network and start chipping away at parts of the network that they do not have access to.  Eventually, if the hacker works long enough, they can compromise the entire system.

Some of the significant data breaches in the last few years had shown that hackers have had access to the networks for months before they started causing damage or they got compromised.  In the case of ICS networks, it works the same way but with longer lasting damages.  What usually happens if a hacker is attempting to bring down an ICS network is that they end up trying to make a change to the actual control systems so that they can damage or destroy the equipment.  Sometimes the outcome is immediate, and sometimes it can take days or weeks for the damage to get significant enough to be noticeable.   If hackers had been on the ICS networks for some time, they could have already caused lasting damage to the physical equipment before they were compromised or caused noticeable damage.  So, in the aftermath of an attack on an ICS network, the damage could be much more significant than was previously thought.  The entire segmented network will have to be assumed to be compromised and all of the equipment inspected.  Even if only a single pump were damaged, every pump on that network would have to be inspected for damage and potentially replaced even though the pumps did not show a catastrophic failure.

Since the Stuxnet attack, the field of ICS security has grown.  Unfortunately, the attacks will probably on get greater.  The unique thing about ICS network attacks is that the adversaries or hacker that are attacking are very knowledgeable and know what they are doing.  You will most likely not find a script kiddie attacking an ICS network.  The types of hackers that would attack ICS networks are state-sponsored hackers or some insider that has much knowledge about the networks.  These are the types of Advanced Persistent Threats (APT) that most security professionals worry about.

The second unique thing about ICS networks is that when damage is done to the control systems, there is physical damage.  Physical damage does not go away like the damage that can be done to computers and their data.  There is no way to backup control systems and manufacturing components.  When the damage is done, those units will likely have to replaced immediately or well before their life expectancy.

It is said that ICS network security is about a decade behind in the world of IT security (Antova, 2017).  The Morris Worm is considered the first malware that caused interruptions on the internet and it occurred in 1988 (Kehoe, n.d.).  The Morris Worm was a watershed moment in history for IT security.  Stuxnet only happened eight years ago, but it will probably be seen later as a watershed moment for the ICS network security.

Resources

Zetter, Kim.  (2014).  Countdown To Zero Day.  Broadway Books: New York

Zetter, Kim.  (January 2015).  A Cyberattack Has Caused Confirmed Physical Damage For the Second Time Ever.  Retrieved from https://www.wired.com/2015/01/german-steel-mill-hack-destruction/

Assante, Michael. Conway, Tim.  (August 2014).  An Abbreviated History of Automation & Industrial Controls Systems and Cybersecurity.  Retrieved from https://ics.sans.org/media/An-Abbreviated-History-of-Automation-and-ICS-Cybersecurity.pdf

Lee, Robert. Assante, Michael. Conway, Tim.  (March 2016).  TLP: White.  Analysis of the Cyber Attack on the Ukrainian Power Grid.  Retrieved from https://ics.sans.org/media/E-ISAC_SANS_Ukraine_DUC_5.pdf

ICS-CERT.  (February 2016).  Alert (IR-ALERT-H-16-056-01): Cyber-Attack Against Ukrainian Critical Infrastructure.  Retrieved from https://ics-cert.us-cert.gov/alerts/IR-ALERT-H-16-056-01

Bodungen, Clint. Singer, Bryan. Shbeeb, Aaron, Hilt, et al. (2017). Hacking Exposed: ICS and SCADA Security Secrets & Solutions.  McGraw Hill Education New York: NY

Stouffer, Keith. Pillitteri, Victoria. Et. Al. (May 2015).  Guide to Industrial Control Systems Security Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r2.pdf

NIPP. (2013).  NIPP 2013: Partnering for Critical Infrastructure Security and Resilience.  Retrieved from https://www.dhs.gov/sites/default/files/publications/national-infrastructure-protection-plan-2013-508.pdf

Joint Task Force Transformation Initiative.  (April 2013). Security and Privacy Controls for Federal Information Systems and Organizations.  Retrieved from  http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf

Cruz, Tiago.  Simoes, Paulo, Et. Al.  (July 2016).  Security implications of SCADA ICS virtualization: survey and future trends.  Retrieved from https://www.researchgate.net/publication/305725280_Security_implications_of_SCADA_ICS_virtualization_survey_and_future_trends

Swanson, Marianne. Bowen, Pauline. Et.al. (May 2010).  NIST Special Publication 800-34 Rev.1 Contingency Planning Guide for Federal Information Systems.  Retrieved from http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-34r1.pdf

Antova, Galina.  (August 2017).  Overcoming the Lost Decade of Information Security in ICS Networks.  Retrieved from http://www.securityweek.com/overcoming-lost-decade-information-security-ics-networks

Kehoe, Brendan.  (n.d.).  The Robert Morris Internet Worm.  Retrieved from http://groups.csail.mit.edu/mac/classes/6.805/articles/morris-worm.html

Powered by WordPress