Leszek Miś: May the data stay with you!

Another video from our EXATEL Security Day 2017 conference in June. This time we give you a speech by Leszek Miś focusing on contemporary techniques of data theft from ICT systems. Since the subject matter is extremely vast, the author concentrated on listing the most significant threats and on indicating critical moments. Despite the high substantive level, the presentation is very intelligible, even for people who do not combat cybercrime on a daily basis. We encourage you to watch it!

May the data stay with you! Network-based data exfiltration techniques

— Leszek Miś —

My presentation is an introduction to exfiltration. Data exfiltration is an extremely broad subject and I would need three days to properly cover it. However, I have only 30 minutes, so I just want to go through the techniques and elements about detecting this type of activity in the network, hence the title of my presentation: “May the data stay with you!”, a play on a famous movie quote.

Something about me… At the moment, I represent two companies. I am the founder of Defensive Security, within which I provide open source defensive security education services. For example in a three-day training, I cover topics on Web security, Linux security and Network security and those are really three topics that I’m focusing on at the moment. Besides that, I’m the head of the security team at Collective Sense, a start-up where we’re building a very cool product where we collect data and then perform machine learning and deep learning analytics. Somewhere in the meantime I managed to obtain OSCP and get quite a few Red Hat certifications, so I specialise in Linux and Security. For the past two years, threat hunting has also been quite an interesting part in my adventure with IT Security.

Here we have a slide on the fact that threats can come from any place. I don’t know if any of you are aware, but yesterday information was released on another critical bug – Stack Clash. It causes NetBSD, OpenBSD and all Linux systems to be vulnerable to local exploits. It would seem that we can also expect a remote exploitation, due to the place where the bug is located. So, it’s really critical. Threats are hiding everywhere, we exploit bugs in operating systems, services and applications. Often due to the fact that we do not pay enough attention to it, we simply have incorrect configuration that allows unauthorised access to systems, services and applications. It is only natural that even if we have security solutions implemented within the architecture, it is possible to bypass certain elements. It is possible to bypass firewalls, there are of course bypasses of IDS and IPS, especially because they are based on signatures, so we can bypass them. The exploitation is of course followed by the process of the so- called “lateral movement”*. Data exfiltration is one of the elements of such a process and this is the topic we shall discuss today.

We should pay special attention to the fact that we often (or even all the time) encounter incorrect network architecture, i.e. without segmentation of any network, which is also a threat that makes it easier to exploit or walk around the network after gaining access – e.g. via a client side bug, using cross-site scripting attack we can get inside the LAN and gain access to, for example, desktop via a flawed app and XSS. A good example is Framework called BeEF – you should definitely know this kind of tool. Then we have pockets full of bugs and that’s something that we forget about: we focus on the server layer, we focus on the network layer in the organisation, whereas in the end we take that company laptop – whether external policy allows it or not – and we plug that laptop into our home network. I’m going to assume that our home network is probably not as secure as the corporate one, and this constitutes a crucial risk. And then there’s social engineering. In fact, it’s just roughly a bunch of points that, mixed together, provides attackers with really powerful weapons.

How to defend ourselves against such attacks? The answer to this question is proactive analysis and assessment of threats, or what we call “threat hunting”. Threat hunting answers all the questions listed on this slide, i.e.: who tried to gain access, how, when, where, at what layer, and with what characteristics. In fact, we can perform such an analysis using many different tools and layers. The first, most basic one, which actually has been relatively popular for a long time, is active analysis of logs and system events, so we try to focus on analysing what actually happens within our logs. For example, in dmesg – that is a kernel buffer in a Linux system – information, or annotations about segfaults, kind of memory management problems for a given process. This element will be a trigger to perform further analysis of what is happening in the system. I would like the log analysis to be automatic, i.e. I would like not to dig in these logs every time I change a custom log structure for Apache or any other service. I would like it to be done automatically, e.g. with the help of patterns that we are working on at the moment.

Another element is analysis of user behaviour, i.e.: user X logs in for the last month to station Y, usually at 8:00 a.m. and their session usually lasts until 5:00 p.m. Suddenly, a session is commended at 3:00 a.m. on a Saturday and that user is trying to access a number of different systems. Such a situation can occur for example using smbexec or pass-the-hash and it is sort of a departure from the usual routine. I think we should be aware of and actively analyse these types of situations. Some time ago the author of Volatility – it’s a kind of RAM analysis framework which is intended for many platforms, but we mainly use it for Linux – said that “If you don’t perform periodic RAM analysis of your critical systems, it means that you are doing threat hunting or active response wrong”. I know that RAM analysis is difficult, especially when you have hundreds or thousands of systems, but it is definitely worth remembering that we also have such a method and such capabilities are available. In fact, it’s not rocket science, it’s just another system command – the most important thing here is the architecture, i.e. this RAM should be dumped not within the tested machine, but e.g. to external storage. We should remember that this storage should be completely separated from the rest of the production data.

Following this pattern, another very important component of threat assessment and analysis is multi-level network traffic analysis. We have several options. To have a full view and complete overview of what’s happening in the network, we should keep these data sources in mind. These data sources constitute different layers. We have Packet_Headers, which of course should aim for the so-called Full_Packet_Capture, while Full_Packet_Capture is expensive to maintain. We are aware of this, of course, so we can, for example, analyse Packet_Headers. We can combine them with Netflows, which is another source of data. Netflow is a kind of billing for network connections. We don’t have the payload, we don’t have what’s in the content, but we do know who, what, where, when, from where, the size of the transmission, the timing of it – let’s say, characteristics. In a perfect scenario this network traffic analysis should be supplied by passive DNS analysis. No matter how we look at it, DNS analysis along with http protocol analysis is something that we should have had and used for a long time and also it is another source of data. Passive TLS too, of course, even before the final handshake we are able to check what common names are in certificates, who is the issuer, from when the certificate is valid and when it expires. This type of issues should be especially important to analyse if you have your own PKI implemented, because it is very quickly possible to get information about what transmission or station uses encrypted connections that are not signed by your CA. This can be done by adding signatures, security feeds and reputation lists.

In our project, we also use the SNMP protocol very heavily and quite aggressively to actively query servers for CPU IO usage. It can be very nicely mixed with those elements that you can see here, for detecting the so-called chain attacks. This is something we are working on at the moment. Periodic port scanning is an extremely important matter. First of all, we should start with the analysis of public IP addresses, public addressing, but what is happening inside the network is also important – thanks to this we can quickly detect if a bind shell is listening on a server, and it was not there yesterday. GEO IP of course, Whois_records analysis is a very important thing, especially in terms of mixing Packet_Headers with Netflows and with Passive_DNS where we can check if a domain, for example, was registered yesterday or the day before yesterday and it’s relatively fresh, and this is a kind of a signal that can indicate that somebody is trying in some way to either phish us or use this kind of domain for some other purposes. We can throw these signal sources into one particular database and it allows us to actually perform machine learning, which is some low-level analysis and correlation, both of the behaviour and of these characteristics that are taking place. We can of course use Supervised Model or Unsupervised Model and detect anomalies – not relying on signatures or quite to the contrary, relaying on signatures, but only as one, small, specific signal or signal source. This is what I think. So it’s worth using it. Of course, it is said that the signatures are incorrect, but remember that it is still a data source. If we have, say, 30 signals, then a signature as one of them is perfectly advisable and I see no reason why we wouldn’t use it. However, this approach should not, of course, be trusted in 100%.

Moving on to the point: data exfiltration, an element of the so-called post-exploitation process. Attackers tend to tunnel traffic and do it in different ways. They do this by pivoting, forwarding traffic, adding static routing tables within the attacked station whether using Metasploit or meterpreters – I’m sure most of you have heard of those tools. Attackers often choose to rootkit and backdoor systems and services. An example of rootkiting would be e.g. an additional Apache module for some web or application server, which modifies http requests and especially http responses on the fly. These kinds of things occur, and we should keep them in mind. The most popular methods and protocols in use in terms of data exfiltration are DNS, ICMP, TCP/UDP, and others. The most difficult to detect are those based on cloud solutions, i.e. Gmail, Slack, Twitter. I will show you an example which should work and which will use the cloud environment to extract and send some data from the infected station or one that the attacker hacked somewhere outside.

When it comes to DNS data exfiltration, there are a few important things I’d like to point out. First of all, we should define internal DNS servers that are in use and only those on the whitelist should be used to provide services to our clients, or our stations within the network. So let’s analyse if this is the case, let’s check if there is indeed no DNS server within our network that is in China or any other place. Let’s see if any of our clients is using this type of server. Are you familiar with this domain? Of course you are. We have “union” and “exit”, these are typical TOR domains. Let’s perform passive DNS analysis with regard to the use of such domains. And the “Whois” records I mentioned earlier. That’s one thing. The second thing regarding DNS is about tracking queries in relation to specific records that the clients are querying. That is, it can’t be that the desktop station is querying for mx records, because why should it. It can’t be that a desktop station – although it is more likely in this case, but still rare – would query for txt records. Let’s check how many txt record queries there are within your network. These are sort of further departures from the routine based on DNS alone. We have plenty of tools – it can be a custom implementation of the malware, but we can also try out the following: Dnscat, Dns2tcp – those are cool tools that allow you to make DNS tunnels quickly.

Let’s discuss the left screen. I’ve run Dnscat on a server located somewhere in Asia, and now this command here, along with the key which will be used to encrypt or authenticate this connection. We will try to enter within the attacked host (which is infected with malware) a type of command where we change the IP from XXX to Here at the bottom, we’ll launch a local Sniffer. This can of course be a span port, however, we do not have span ports here so we will do this locally on the interface. We listen for port 53, the DNS, and launch that client. Notice what is happening – at this point, this client station is using txt records – CNAME and MX at the moment, of course we can restrict this – and this station is already behaving strangely. This is already an unusual approach, unusual behaviour and unusual characteristics. The question is, will the DLP solutions – the solutions that you have in place, which are IDS, IPS – detect this type of alert at this point? Will you be informed of such an event or not?

I wanted to show the shell, because at this point we’re actually using DNS to access the console within this server, but that’s not necessary. Below, I hope you noticed, we have MX records and a lot of them, actually. This type of situation should be analysed carefully.

Then we have another handy tool called DET. More about it in a minute. It should also be remembered that by using a regular built-in operating system command such as nslookup/dig, we are also able to inject DNS requests to our DNS server, which we have compromised as an attacker, and also in the logs of this server we will see the data which we can group, concatenate and for example gain access to a full file.

Another type of data, ICMP protocol, usually associated with a “ping” command (by most people), however there is a question: how many of you analyse ICMP and its usage on the fly, who uses it, from where and in what data volume? There are plenty of tools and one of them, as presented here, is ICMP_exif, but you can also use the DET I just mentioned and we’ll try to see that live as well. DET is a kind of toolkit to perform data exfiltration based on different types of transmissions. We launch it and it is listening, or should be listening at least. We can also access DET from the attacked station. For example, this command sends a database file into the ether using DNS. Please note what strange domains are used here. The question is, do your systems detect very long domains in use? This also is unusual. Keep in mind that defense in depth is all about paying attention to the smallest detail and focusing on it until you understand it perfectly. Therefore, you should understand DNS, ICMP, etc. perfectly. Then let’s see how easy it is to perform data exfiltration based on ICMP. All you need to do here is change “-p dns” to “icmp” and launch ICMP. We won’t be logging in. Why doesn’t it work? It is because there has to be a root to get into the raw ICMP socket, that’s why we have permission denied from the DET level. But that’s not important, you can imagine there are a lot of ICMP Echo Request and Echo Response flying around here, which we use to send data based on ICMP.

Regarding TCP/UDP, you need to answer the following question: Why does station X connect to a public IP address on a “rare” port? My approach is that I would like to, on the network I manage (it it possible of course), limit outbound traffic to port 80 and 443 only. Only these. If any station tries to access ports 5, 4, 3, 2, 1 on UDP or 1, 2, 3, 4, 5 on TCP, then something is wrong and this is probably a signal that the station is infected with malware or something wrong is going on. The same applies to basic TOR detection. Of course, we have public exit node lists available on the network, we can update them on a regular basis and have up-to-date information about which station within my network connects to TOR.

Http, another crucial protocol, very popular in fact, and here it is not so easy because it is nice that in http, a plaintext protocol, you can see everything that happens within it, while malware and all kinds of services use https-s. What is the solution here? In fact, there is only one – to have a PKI within your network and a cluster of forward SSL Proxy servers unbundling SSL and then giving the possibility to access the content or payload sent by this means. This way we will also gain information about specific queries, payloads sent within such connection, some rare queries, e.g. symptoms of open redirect attacks etc., which still happen and are quite often used e.g. in drive by download attacks. There are plenty of tools, and one of the most interesting ones is XSShell/XSStunell, which means that based on an XSS vulnerability we can exfiltrate data as well. So if any of you asks if it’s worth analysing cookies and if we should do it, the answer is yes – even though it’s a critical asset because that’s where our sessions and user sessions are. Cookies can also contain critical data, they can be used to send this critical data as well and they can be used to send malicious payload such as SQL injection in cookies – these types of operations.

There are other exfiltration techniques based on Powershells, that’s why it’s very important to analyse powershell logs and events (Powershell from version 4.0 probably gives the full audit option). We have access to exfiltrating techniques based on NATs – using Meterpreters – standard SSH of course, Local Port Forwarding or Remote Port Forwarding, very specific and used by administrators. Such an example could be RDP over SSH or HTTP over SMB (SMB – the network environment in Windows) and any other – like HTTPS on some high, rare port, which also shouldn’t really happen within your network.

When it comes to cloud services, we have Gmail, Slack, Twitter. Here a concrete example is presented, where we can exfilter the database.dump file. This exfiltration is performed by first encrypting the file we want to exfiltrate and then sending it to the actual Gmail account which I created for myself, and then sending it from this account to the attacker’s station, which we can see here on the left. So, in reality, it is nothing more than a “copy and download” the data encrypted in your Gmail account from that account. We will not be able to directly access the data that is stored in that account, we need to use a tool like DET to do so.

I am aware that this is a rather incomplete summary, this topic is extremely broad. Nevertheless, I should underline what is important – to connect these elements to create a sort of active protection methods, so that to make systems like machine learning to react to action with regard to the data you have. So, at 3:00 a.m., when we’re sound asleep, for critical systems this module would perform an action of dropping on firewalls or logging out all users from a given system. This is actually the grsecurity approach. It’s a nice patch for Linux kernel with a active protection module, so called active response, and it means that if kernel detects some kind of strange business going on, let’s say with kernel memory (we are talking about some kind of exploit), it can perform an action in form of logging all users out of this system or it can panic the system in general. I think this kind of active protection modules, based on deep learning and machine learning, are important. The same with systems and device updates, of course. I am also keen on monitoring domains for Typosquatting, Bitsquatting and PunyCode which has been a big deal lately. There are quite a few tools that you can actually plug into your SIEMs, for your SOCs that will analyse domains for e.g. confusingly similar domains registered recently. This is also a very valuable source of information because you will be able to get information about potential phishing attacks faster and you will be just a bit ahead of attackers.

What will happen to this data next? Several things: It can be sold, of course. One of the first Google’s answers: some seemingly highly regarded police forum where only officers of a certain rank have access, 700 thousand user records were stolen and are now for sale. Be that as it may, this is great, valuable data. Of course, data can be held to ransom: ransomware is a typical example. Data can be used by secret intelligence – making public the exploits, as it has been quite talked about recently. Besides, just type in “pastebin database dump” and you will see data from Instagram or LinkedIn – all this is available online. How did it this data get stolen? We should perform tests, check whether Red Teams or Blue Teams within your infrastructures and organisations are able to detect the issue. Whether the solutions you have implemented are able to detect this kind of issues.

Leszek Miś