Someone asked me in Quora “Should I use Gobblin or Spark Streaming to ingest data from Kafka to HDFS?” Here is what I wrote: This introduces a new architecture pattern called continuous streaming integration (CSI) with streaming data platforms (SDP) for solving the app and data integration challenges. Short answer: If your data sink is… Continue reading
Post Category → Hadoop
Citi Mobile Challenge submission – Round 1
As a follow up to the previous post on Citi mobile challenge, I am happy to inform that my concept “RetailBanking 2.0” has been selected and hence I can participate in the Citi mobile challenge. Here is the confirmation of the same. As promised once the competition is over I will share my concept. For now… Continue reading
ElasticSearch on Hadoop
Some background on how ElasticSearch indexes documents For those who already have a back ground on Elastic Search, it is just a special purpose full text search document based data store with real time indexing abilities. How does Elastic Search does this? If you are already aware of the Search engines and the underlying data… Continue reading
Big Data whereabouts
I have met quite a few people who were asking, “Who is really dealing with Big Data – in the said volume or variety or velocity”. It’s a great question. So this is how I would like to take the conversation forward with them. So are you apprehensive about the research results like “the amount of… Continue reading
HaaS – Hadoop-as-a-Service
Realising Big Data platforms using Hadoop is becoming an easier decision by the day for today’s enterprises. As soon as the outcome of a technical due diligence indicates that going “data driven with Big Data” is the going to make a positive impact (amongst other apprehensions), immediate option that pops out is Hadoop. Hadoop-as-a-Service (HaaS)… Continue reading
Scheduling recurring jobs (cron like) with oozie
Any one familiar with cron will appreciate the fact that it is possible to configure a workflow to schedule any type of jobs like a map-reduce jobs or HIVE job or sqoop jobs in Hadoop using Apache oozie. Some good examples here
Big Data Ingestion
I discussed in the previous post about choosing the right data ingestion technique for the Hadoop ecosystem and how critical is getting this first step right. In my experience subsequent steps like applying some data science to that and converting them into insights are comparatively less complicated. As discussed in previous the post there are multiple ways to… Continue reading
“We don’t understand how BigData can solve our problems”
The title of this post might sound like a response for a BigData survey or research poll. I felt this as a good discussion point. Even though there is a lot of hype and movement around the BigData space, if you do some digging there will be a lot of research results showing otherwise when it comes to adoption. There… Continue reading