Category Archives: Analysis

Multivariate Regression with Neural Networks. Training to Shoot

Ashok Chilakapati July 2, 2018 2 Comments

Machine learning is alchemy – researchers in artificial intelligence at Google have recently proclaimed. Any high school or college student that has ever tried to solve nonlinear systems of equations with gradient descent method knows that already, kind of… Even for a perfect bowl-shaped cost-surface, gradient descent method will converge… Read more »

Reduced Order Models for Documents

Ashok Chilakapati June 18, 2018 No Comments

The term-document matrix is a high-order, high-fidelity model for the document-space. High-fidelity in the sense that will correctly shred-bag-tag it to represent it as a vector in term-space as per VSM. has entries, with distinct terms (rows) building documents (columns). But do we need all those values to capture this shred-bag-tag effect of … Read more »

Multivariate Regression with Neural Networks: Unique, Exact and Generic Models

Ashok Chilakapati May 31, 2018 No Comments

Michael Nielsen provides a visual demonstration in his web book Neural Networks and Deep Learning that a 1-layer deep neural network can match any function . It is just a matter of the number of neurons to get a prediction that is arbitrarily close – the more the neurons the better the approximation…. Read more »

Kafka Streams – Catching Data in the Act. 2: Steady and Unsteady States

Ashok Chilakapati April 11, 2018 No Comments

I was on vacation with my son at Yosemite over the spring break this past weekend. Early part of the trip was washed out due to rain as they closed the park and we were cooped up in the lodge waiting it out. But we had a patio view of… Read more »

Kafka Streams – Catching Data In the Act. 1

Ashok Chilakapati March 10, 2018 No Comments

I have been playing with Kafka on and off lately. It is an excellent addition to the ecosystem of big-data tools where scale with reliability is imperative. I find it intuitive and conceptually simple (the KISS principle) where the focus is squarely on reliability at scale. Unlike the traditional messaging… Read more »

Stacks of Documents and Bags of Words

Ashok Chilakapati January 23, 2018 No Comments

Consider these two one-line documents – “Eat to Live” and “Live to Eat“. They contain the same words, but in different order – leading to a big difference in meaning. Or consider – “Working Hard” & “Hardly Working“. Popular stemmers such as snowball convert ‘Hardly‘ to ‘Hard‘ so that functionally… Read more »

Data Dimensionality and Sensitivity to Sampling

Ashok Chilakapati January 6, 2018 No Comments

I wanted to get back to the analysis of quotes from a semantics perspective and write about searching & clustering them with Latent Semantic Analysis (LSA). Thought it was going to be a straightforward exercise in applying the venerable gensim package and appreciating the augmented information retrieval capabilities of LSA… Read more »

Quotes. Lexical Fuzziness

Ashok Chilakapati February 5, 2016 No Comments

The road to ‘Computational Linguistics Nirvana’ is littered with thesis upon thesis, stacks of journal papers, and volumes of conference proceedings… so one can get lost in a hurry. Whole programs dedicated to computational linguistics have made great advances over the years enabling the Siris and Cortanas of our time. We… Read more »

What is that? A Quote?

Ashok Chilakapati December 16, 2015 No Comments

Who does not love a good quote? I had always been a fan myself and collected a bunch over the years. Each morning as I drive kids to school a quote or more spill out as a matter of course. So much so that they started calling me a quote-monster…

H-1B. The Analysis – Part 1

Ashok Chilakapati November 25, 2015 No Comments

By the end of the previous post in this series we had built a mechanism to slice & dice the H-1B approval data. With that in hand we can start exploring the data and hopefully find some interesting insights so all that hard work so far was not totally useless…