Many businesses today rely on large volumes and variety of data to make critical business decisions in a real-time environment. For big data applications, businesses need to combine transactional data with structured, semi-structured and unstructured data to gain holistic and deeper insights. Additionally, for timely insights, large amounts of data has to be continuously ingested into their Hadoop data lakes and other destinations. One of the biggest challenges even before data can be analyzed is the data ingestion itself.
Existing ingestion tools in the market come with many operability challenges:
- Some rely on the slow MapReduce framework which takes a lot of time and resources for ingestion.
- Some rely on batch processing frameworks which can do high throughput but at the cost of higher latency.
- Some depend on traditional stream processing frameworks which have reliability and scalability challenges.
DataTorrent provides various Application Templates for ingestion that allow users to:
- Ingest vast amounts of data with enterprise grade operability and performance guarantees provided by the underlying Apache Apex™ framework such as fault tolerance, linear scalability, high throughput, low latency and end-to-end exactly once processing.
- Quickly launch template applications to ingest raw data, while also having an easy and iterative way to add business logic and processing logic such as parse, dedup, filter, transform, enrich etc to the ingestion pipelines.
- Visualize various metrics based on throughput, latency and app data in real-time throughout the execution.
Clone this repository to try out the application templates OR download the ready to launch application templates from DataTorrent AppHub to try running the applications. Follow the tutorial videos or walkthrough documents for any of the applications to launch the template and add custom logic to process the data during ingestion.