Language Detection from Speech: Chinese or English?

In language processing, it is an essential step to detect which language it is before speech recognition and machine translation. This blog post presents an approach to distinguish Chinese and English from speech (an audio sample) using a neural network model. Spark is used to perform data preprocessing, and TensorFlow ...

read more

An Approach Of Scaling Airflow To A Corporate Level

Sat 15 Jul 2017 by Tianlong Song Tags Big Data

The last post on Airflow provides step-by-step instructions on how to build an Airflow cluster from scratch. It could serve the development purpose well, but lacks critical features to work in prod, e.g., CI/CD compliance, resource monitoring, service recovery, and so on.

I have been leading the efforts ...

read more

A Guide On How To Build An Airflow Server/Cluster

Sun 23 Oct 2016 by Tianlong Song Tags Big Data

Airflow is an open-source platform to author, schedule and monitor workflows and data pipelines. When you have periodical jobs, which most likely involve various data transfer and/or show dependencies on each other, you should consider Airflow. This blog post briefly introduces Airflow, and provides the instructions to build an ...

read more

Monte Carlo Tree Search and Its Application in AlphaGo

Sat 09 Apr 2016 by Tianlong Song Tags Machine Learning

As one of the most important methods in artificial intelligence (AI), especially for playing games, Monte Carlo tree search (MCTS) has received considerable interest due to its spectacular success in the difficult problem of computer Go. In fact, most successful computer Go algorithms are powered by MCTS, including the recent ...

read more

Neural Networks and Deep Learning

Sun 03 Apr 2016 by Tianlong Song Tags Machine Learning Data Mining

It has been a long time since the idea of neural networks was proposed, but it is really during the last few years that neural networks have become widely used. One of the major enablers is the infrastructure with high computational capability (e.g., cloud computing), which makes the training ...

read more

Latent Dirichlet Allocation and Topic Modeling

When reading an article, we humans are able to easily identify the topics the article talks about. An interesting question is: can we automate this process, i.e., train a machine to find out the underlying topics in articles? In this post, a very popular topic modeling method, Latent Dirichlet ...

read more

Hidden Markov Model and Part of Speech Tagging

In a Markov model, we generally assume that the states are directly observable or one state corresponds to one observation/event only. However, this is not always true. A good example would be: in speech recognition, we are supposed to identify a sequence of words given a sequence of utterances ...

read more

Expectation Maximization Algorithm and Gaussian Mixture Model

Sat 12 Mar 2016 by Tianlong Song Tags Machine Learning Data Mining

In statistical modeling, it is possible that some observations are just missing. For example, when flipping two biased coins with unknown biases, we only have a sequence of observations on heads and tails, but forgot to record which coin each observation comes from. In this case, the conventional maximum likelihood ...

read more

Locating and Filling Missing Words in Sentences

Sat 05 Mar 2016 by Tianlong Song Tags Natural Language Processing

There has been many occasions that we have incomplete sentences that are needed to completed. One example is that in speech recognition noisy environment can lead to unrecognizable words, but we still hope to recover and understand the complete sentence (e.g., by inference); another example is sentence completion questions ...

read more

Binary and Multiclass Logistic Regression Classifiers

Sun 28 Feb 2016 by Tianlong Song Tags Machine Learning Data Mining

The generative classification model, such as Naive Bayes, tries to learn the probabilities and then predict by using Bayes rules to calculate the posterior, \(p(y|\textbf{x})\). However, discrimitive classifiers model the posterior directly. As one of the most popular discrimitive classifiers, logistic regression directly models the linear decision ...

read more