Continuous streaming integration with streaming data platforms

Someone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading

Big Data Ingestion

I discussed in the previous post about choosing the right data ingestion technique for the Hadoop ecosystem and how critical is getting this first step right. In my experience subsequent steps like  applying some data science to that and converting them into insights are comparatively less complicated. As discussed in  previous the post there are multiple ways to… Continue reading