Skip to content

Experienced Data Extraction Engineer (Reverse/Crawler Engineer)

Belgium, GentEngineering

Job description

At OTA Insight we’re on a mission to innovate the hospitality industry with a powerful SaaS platform. We provide simple insights with intuitive dashboards and straight forward actions based upon complex and vast datasets. We pull 100TB+ of data on a daily basis from platforms like, Airbnb, Tripadvisor… Today we are considered the global leader in hotel BI and are working with 75,000+ hotels worldwide in 185+ countries.

Our ambition is burning. We are building towards a one-stop solution which will not only disrupt the industry as a whole, but has the potential to have a positive impact on every single hospitality professional across the board, regardless of their role or the size of their portfolio :)

With 350+ people, representing 36 nationalities living in 30 different countries, we are a fun, collaborative and committed team that leverages new technologies at scale to achieve amazing results. Ready to take off and join this rocketship in revolutionising the hotel sector?

About the team and the position

The OTA Insight Data Extraction team ensures that our customers get access to fresh and high-quality data. With over 100 million daily requests and 150+ data sources the team carries a big responsibility in making sure that our crawling infrastructure is up to date. As the company is scaling, the team also needs to build in flexibility so that we can easily tap into new types of data sources.

Working in the crawler team

Because of the scale at which we operate there’s many challenges that need to be faced both to keep our current infrastructure up and running and to make sure we remain best in class at crawling. Depending on your area of expertise you will be able to:

  • introduce and research new ideas to improve our data extraction process;

  • research complex scraping targets, e.g. identify anti-bot measures and implementing solutions;

  • research and develop stable and efficient network technology with pipelines on the entire network stack that can handle easily up to 200k of individual requests per minute;

  • research and develop automated browser solutions with undetectable emulation capabilities;

  • reverse engineer applications on multiple platforms and gain the ability to emulate their requests with the goal of extracting data;

  • develop clean and maintainable libraries and frameworks on which we build our infrastructure and data extraction code bases, some of which are open source;

  • keep an eye on scalability:
    • How does infrastructure scale with our data demand?

    • Can we support more diverse data sources, e.g. mobile applications?

Technologies you’ll work with:

Currently we work with the following technologies for data acquisition & processing: Scrapy, Python, Go, NodeJS, TypeScript, Javascript, Rust, Docker, k8s, BigQuery and Pub/Sub.

Next to this you will in your advanced role also work with MITM tooling such as Burp Suite, MITMProxy, Charles and Wireshark. You are free to use and introduce any technology or tools you want. We are always looking for the best technologies that fit our use cases! For example for reverse engineering you might use tools such as Frida, IDA, a variety of OSS tooling or anything else.

Benefits at OTA Insight

  • A highly competitive compensation package;

  • The opportunity to shape products that more than 75.000 users rely on world-wide;

  • The chance to grow and evolve the backend glue at a fast-growing scale-up.

On top of that:

  • A flexible working environment where you can work from home or at one of our offices;

  • No shortage of snacks, baristas… and a budget to account for home office expenses;

  • Two annual Detox Days, on top of your personal holidays, when we take some time away from our computers with the entire company;

  • A choose-your-own transportation budget should you choose to come to the office;

  • Of course an insurance package that’s the best in the business. Just in case for that little extra peace of mind.

Job requirements

We expect you to have strong expertise in at least one of the following three domains:

  • The ability to reverse engineer cross platform applications with the goal of extracting data. You don’t let obfuscated code or basic anti-reverse engineering techniques hold you down but use both static and dynamic techniques to overcome this.

  • A deep knowledge of the internals of a browser such as Chromium and how they implement their network stack, Javascript Engine and exposed Web API. Well knowing the subtle differences in implementations, the variety of protocols involved and how it all evolved over time.

  • An expert on all the layers of the Network Stack in the context of contemporary web traffic. A familiarity with the different protocols involved, the role of each, how they interact and how they evolved between their different published versions. It is expected that you have the ability to transfer this knowledge to others and be able to develop and contribute to a variety of network pipelines with the ability to read, modify and log traffic in the range of 200k to 400k requests per minute, easily.

We welcome:
  • You have great problem-solving skills;
  • You challenge the status quo;
  • You are able to work in a team as well as handle work individually. You’ll quickly learn that we have a very open culture and that everyone can be reached out to for help, ideas, feedback and much more;
  • You have great communication skills, both to technical and non-technical parties.

Do you think this might be something for you? Do not hesitate to apply below!