fairleigh dickinson university essay topic see go to site levitra time lasts essay ideas to write about follow link vorticism pound essay essay comparing between two cities red viagra fiyatlar abilify infant dose through breast milk go to link click here how to try viagra account coordinator cover letter generic viagra overnight delivery watch political corruption essay honesty is the best policy but advertising also helps essay can you get high off celebrex 200 mg essay in english books how to end an expository essay tadacip wikipedia go site Data Science Projects

The Data Incubator is an intense 8-week data science training fellowship for academic researchers with a 2% acceptance rate (out of 3000 applicants).

I completed a number of data science projects using the Digital Ocean cloud computing platform, including:

1. NYC Social Network Analysis – web scraping and network graph analysis

– Created a social network graph by extracting and parsing over 100,000 photo-captions from photo albums on a New York socialite blog,
– Analyzed the structure of a social network using the node degree, node pagerank and the highest weighted edges of a graph
– You can find the code on Github

Tools: lxml, BeautifulSoup, regular expressions, pandas, networkx

2. NYC restaurant inspection database analysis

– Performed data analysis on an aggregated NYC restaurant inspection database with over half a million inspection reports using SQL, pandas and R
– An interactive visualization of the average restaurant score for the five NYC boroughs in CartoDB can be found here

3. NLP analysis on Yelp reviews

– Used Neuro-linguistic programming (NLP) to perform sentiment extraction from over 1 million Yelp reviews (1GB JSON file).

4. Yelp review predictions with scikit-learn

– Used machine learning models and scikit-learn to predict a new venue’s popularity from available meta-data when the venue opens, e.g., where it is located, the type of food served, etc.

5. MapReduce in the Cloud

– Used MapReduce to perform a linguistic analysis on English (11GB) and Thai (160MB) Wikipedia articles to obtain character entropy of extracted words and n-gram statistics.
– You can find the code on Github

6. Time Series Analysis

– Developed a model to predict the temperature in major US cities using Fourier analysis of over 500,000 data points
– Developed classification models to recognize the genre of a musical piece, first from pre-computed features as well as from the raw waveform.