Why Small Businesses Need to Exert Serious Efforts against Data Scraping

Data is the language of today’s business.

With the ubiquity of data connections, Internet-of-Things, connected workflows, and social networks, small and medium businesses are increasingly becoming capable in collecting data from customers, operations and the Internet at large. The increasing ease with which data can be acquired means there is also a need to ensure its security and integrity for them as well.

With rich data under an organization’s care, there is always the danger of malicious entities that intend to scrape their deep webs. These can include hackers wanting to sell your data or keep it hostage. Or, it can also involve competitors looking to take a deeper look into your organization, customers or products.

Why Does Data Scraping Matter

A 2015 study by IBM and the Ponemon Institute found that the business cost incurred from losing each record averages $154 in the year, up 6 percent from the previous year’s $145. Such a cost grows proportionally depending on an enterprise’s scale. Small businesses, meanwhile, risk losing a lot more, in terms of customer trust.

The key trends in data leak prevention (DLP) today have shifted from keeping track of data flows and network resources towards ensuring encryption on data objects themselves, says Charles Foley, CEO of Watchful Software. “Between 2016 and 2020, DLP technologies will admit that they can’t block the flow of information and as a result they will disregard attempts to stop/block transmission,” he says on Digital Guardian. “Instead, they’ll employ an increasingly powerful schema for encryption tied to authorization and credentials for use.”

Cyber security expert and author Joseph Steinberg adds that due to the prevalence of the cloud and distributed systems, it’s becoming more and more difficult to mitigate risks from potential leaks arising from data exchange. “[Infrastructure] will need to be improved or supplemented in order to address the risk that emanates primarily from employee and customer personal accounts used on personal devices rather than from corporate controlled systems,” he shares.

Scraping comes in many forms, although the common denominator is that bots crawl and parse a website or database in order to collect data. It can be as simple as content scraping, in which the content on your consumer-facing website is reposted elsewhere. It can go deeper, however, and bots can scrape your database in some form, either through brute force (different combinations of queries), or by finding loopholes and security flaws.

Any of these could lead to serious repercussions. For example, a competitor could use ingenious methods to extract pricing data on your products. While any potential customer can do this on an individual basis, a smartly engineered bot can extract the data in its entirety, bit by bit.

How to Deal With Data Scraping?

Hardening one’s infrastructure against the possibility of such data extraction should therefore be the priority. However, simply encrypting data might not always be the most effective method, especially if such data can be accessible on clearnet through customer-facing interfaces.

One possible solution is by filtering and blocking these potential scrapers at several levels, which prevents these from even reaching your front end. Incapsula’s Nabeel Hasan Saed, writes a four-step solution for blocking potentially harmful scrapers. He stresses the importance of blocking harmful bots, while still providing adequate access for those that are actually helpful, such as Google search crawlers. This involves analytics, taking a challenge-based approach, watching out for bot behavior and shielding your site from scrapers through robots.txt.

Incapsula’s own solutions can also define your network topology through reverse proxying, such that scraping (and other attacks) can be blocked on edge. While the main intent of such services is to prevent overloading and network outages, an added benefit is that reverse proxy can also secure infrastructures by acting as the middleman for traffic—thus filtering out potentially harmful bots, while letting legitimate traffic through.

On the need to be Proactive

Juniper Research estimates that cybercrime will cost organizations $2.1 trillion by 2019, and this will come from attacks and data breaches perpetrated or orchestrated by organized cyber crime groups and state-sponsored hackers. Increased enterprise mobility exposes even more endpoints to potential attacks. And it is becoming more and more profitable for cyber criminals to hold user data ransom, among other shady business models.

Given these potential risks, the key takeaway here is that no organization should have to passively wait for an attack to occur and then take action on a reactive basis. Rather, protecting one’s deep web assets will require proactive measures, including hardening one’s infrastructure, filtering out potentially malicious network traffic, and establishing policies and procedures for ensuring data integrity.

Why Does Data Scraping Matter

How to Deal With Data Scraping?

On the need to be Proactive

Subscribe to Newsletter