The Future Looks Bright With Our First Post-COVID In-Person Workshop

‘How to master web data collection’ was an exciting learning experience helping participants get up-to-date information on data collection technology, as well as gaining business insights. It was our first live event in over a year and a half, and the energy at The Brain Embassy in Tel Aviv was off the charts!
6 min read
TLV_Workshop_How to master web data collection

DISCLAIMER: While we use reasonable efforts to furnish accurate and up-to-date information about COVID19, we do not warrant that any information contained in or made available through this website is accurate, complete, reliable, current or error-free. We assume no liability or responsibility for any errors or omissions in the content of this website and we do not claim to be medical or professional experts. For medical advice, please contact a doctor and only view online information from reliable sources such as the Israeli Ministry Of Health or the CDC, or another licensed professional institution.

In this post I will cover:

A quick recap of the event

Professionals in Tel Aviv were eager to ‘get their hands dirty’ at our first face-to-face workshop at The Brain Embassy. From engineers, and data scientists to architects, and developers, the audience was pumped to learn the latest data collection techniques. Here is a short overview of the topics covered – some of which we will go into more depth for those of you who were not able to attend the event but are still interested in deriving value from it:

Data landscape overview: Or Lenchner, CEO of Bright Data gave an overview of where our data-driven economy currently stands, how companies are currently leveraging data-for-profit, as well as how Apples’ AirTags, and Amazon’s Sidewalk initiatives will shape the future of communal digital networks.

Modern data collection challenges, and how to overcome them: Itamar Abramovich, Web Scraper IDE Product Manager took a deep dive into key data collection challenges, data crawling infrastructure, the ever-changing structure of web data, how to scale operations quickly, as well as the solutions that are currently available to business-side consumers.

Hands-on practice: The third, and most exciting part of the event was when Bright Data developers, and product managers worked with each individual participant helping them troubleshoot errors, blocking issues, and ultimately walking away with his/her very own dataset.

Why data has become the new water, a vital resource that gives life

Bringing to light Vanson Bourne’s two recent industry surveys:

One: “ESG factors now at the heart of investment sector decision-making”

Two: “The growing importance of alternative data”

Or highlighted that our economy is currently in ‘accelerated data-driven mode’, as can be observed from these cross-sector statistics showing unprecedented levels of data consumption by business entities:

Graph from data collection Text reads: webinar - Data-driven economy - accelerated  Now more than ever, businesses in every industry are consuming data at an unprecedented pace IT 54% of enterprise IT leaders expressed the need for larger scale DATA COLLECTION. 95% of finance orgs rely on outside information and ESG finance institutions attested that 76% of their orgs investment decisions are impacted by ESG factors Big Data market size revenue forecast worldwide

Image source: Bright Data

Companies are tapping into the world’s largest database: The internet

Using SaaS platforms to collect unstructured data , and transform it into structured data that they can immediately use within their systems – forward-facing companies are gaining significant market advantages.

Some of the key areas of our economy which are benefiting from this new approach to open-source data include:

eCommerce: Performing brand protection, dynamic price comparison, competitive market research, and more.

Travel: Aggregating real-time offers from OTAs, and creating packages based on real-time consumer trends.

Finance: Collecting alternative data indicative of market trends while they are effectively happening. Such was the case with Hedge Funds who collected social sentiment data from Reddit during ‘The Big Short Squeeze’, and were able to navigate their portfolios to profitable territory.

Cybersecurity: From trust relationships and third-party risk management to blocking ransomware, malware and phishing attempts – global IP networks are enabling red teams to simulate, and prepare systems for unforeseen crises as was the case with The Colonial Pipeline.

Marketing: Performing ad protection ensuring that marketing spend is being utilized on campaigns that actually reach target audiences with the correct messages, in their native tongue, while simultaneously using geolocation routing/targeting.

The future of technology is community-based

Or also took this opportunity to talk about the explosion of recently released community-based, peer-to-peer networks. This includes two of the world’s largest tech behemoths:

Amazon who has recently released Amazon Sidewalk, essentially creating a ‘global neighborhood’ of devices in order to mutually improve network stability, third-party vendor services, as well as increased device reach.

Apple who has recently released apple Airtags to help device owners use each other’s Bluetooth-enabled device in order to find everyday items like keys.

These initiatives which have clear advantages for the consumer communities they bring together, simultaneously raise questions about consent, device ownership, and users’ rights regarding the freedom to opt-in, and opt-out of networks based on an informed, and mutually beneficial approach.

What it takes to become a master web data collector

Mr. Abramovich jumped right into discussing the key hurdles to data collection today, including:

  • An ever-changing data-structure
  • The challenges of scaling data collection operations
  • The increasingly sophisticated blocking mechanisms hindering access
  • The fact that collecting the information required for robust systems is extremely resource-heavy, and labor-intensive

Itamar went on to describe the different layers of infrastructure which allows businesses to collect data online:

One: Interacting, and parsing These are typically third-party servers which act as an external infrastructure for your data collection efforts (examples of these include: Google Cloud, Amazon AWS, Azure).

Two: Access, and scaling Comprised of a global proxy infrastructure connecting all kinds of IP addresses, and exit points (nodes) in every country in the world.

Three: Unlocking, and data collection This includes automatic IP rotation, CAPTCHA unlocking, protocol manipulation, managing headers, digital fingerprints, and user agents.

Once the roadblocks, and data collection infrastructure options were clear, Mr. Abramovich gave a live demonstration of Bright Data’s:

Web Unlocker: Which automates IP priming, cookie management, and IP selection, as well as having a 100% success rate.

Web Scraper IDE: A zero-code, zero-infrastructure, customizable solution that delivers datasets to teams, and algorithms automatically.

The truth of the matter however is that nothing beats a live demonstration where you can get free, and easy-to-implement tips from data experts. That is why we have decided to publish all Bright Data webinars on YouTube:

More from Bright Data

Datasets Icon

Get immediately structured data

Access reliable public web data for any use case. The datasets can be downloaded or delivered in a variety of formats. Subscribe to get fresh records of your preferred dataset based on a pre-defined schedule.

Web scraper IDE Icon

Build reliable web scrapers. Fast.

Build scrapers in a cloud environment with code templates and functions that speed up the development. This solution is based on Bright Data's Web Unlocker and proxy infrastructure making it easy to scale and never get blocked.

Web Unlocker Icon

Implement an automated unlocking solution

Boost the unblocking process with fingerprint management, CAPTCHA-solving, and IP rotation. Any scraper, written in any language, can integrate it via a regular proxy interface.

Ready to get started?