I have been recently trying out scikit-learn and NLTK with python mostly “classifying” data. As they say its the best available combination to “teach yourself data science”. I am in love with the features like the pre-packed corpora it comes with like movie reviews. NLTK comes with 50 corpora and lexical resources which you can play around without worrying about seed data in the said volume. Both of these pretty much come with most of the popular classifying algorithms. I have done some ‘R’ and even quoted from other blog that ‘R’ tops the list for data science, but I am taking my words back and I don’t have to sell python. The claim “Python is not a main stream programming language for statistical data analysis” is BS.
Here is a very good starting point – Don’t worry about anything and just follow the steps and build on top of it. Believe me I stopped doing R and doing only python for data science problems. I am very much a beginner at Kaggle.com but using python and cousins gives me good mileage.
Terms can be confidence shakers – don’t get overwhelmed. As you already know starting simple and building atop works very well again !!. I am planning to write a series on the learning curve required for becoming a entry level data scientist.