Author Archives: Ashok Chilakapati

Kafka Streams – Catching Data in the Act. 2: Steady and Unsteady States

I was on vacation with my son at Yosemite over the spring break this past weekend. Early part of the trip was washed out due to rain as they closed the park and we were cooped up in the lodge waiting it out. But we had a patio view of… Read more »

Kafka Streams – Catching Data In the Act. 1

I have been playing with Kafka on and off lately. It is an excellent addition to the ecosystem of big-data tools where scale with reliability is imperative. I find it intuitive and conceptually simple (the KISS principle) where the focus is squarely on reliability at scale. Unlike the traditional messaging… Read more »

Stacks of Documents and Bags of Words

      No Comments on Stacks of Documents and Bags of Words

Consider these two one-line documents – “Eat to Live” and “Live to Eat“. They contain the same words, but in different order – leading to a big difference in meaning. Or consider – “Working Hard” & “Hardly Working“. Popular stemmers such as snowball convert ‘Hardly‘ to ‘Hard‘ so that functionally… Read more »

Data Dimensionality and Sensitivity to Sampling

I wanted to get back to the analysis of quotes from a semantics perspective and write about searching & clustering them with Latent Semantic Analysis (LSA). Thought it was going to be a straightforward exercise in applying the venerable gensim package and appreciating the augmented information retrieval capabilities of LSA… Read more »

ELK Clusters on AWS with Ansible

      No Comments on ELK Clusters on AWS with Ansible

In the previous post we built a virtual ELK cluster with Vagrant and Ansible, where the individual VMs comprising the cluster were carved out of a single host. While that allowed for a self-contained development & testing of all the necessary artifacts – it is not a real world scenario…. Read more »

ELK Stack with Vagrant and Ansible

      No Comments on ELK Stack with Vagrant and Ansible

It has been a while unfortunately since I sat down for some writing on this blog. But the writing bug is persistent – once hooked, you got to write. Serious writing requires original research, data collection/analysis etc… so can take a good bit of time depending on the topic. I had… Read more »

Quotes. Lexical Fuzziness

      No Comments on Quotes. Lexical Fuzziness

The road to ‘Computational Linguistics Nirvana’ is littered with thesis upon thesis, stacks of journal papers, and volumes of conference proceedings… so one can get lost in a hurry. Whole programs dedicated to computational linguistics have made great advances over the years enabling the Siris and Cortanas of our time. We… Read more »

Quote Mechanics – It is the ‘Data’ Stupid!

Ready to write again after an extended break over the holidays and we start off where we left in 2015 with our unfinished quotes… The objective for this post is to assemble the data we need to analyze the nature of quotes in some way, at least in a dry statistical sense to… Read more »

Virtual Clusters with Vagrant & Virtualbox

We take a break from the H-1B analysis and set the stage here for future posts that require us to work in environments with distributed compute & storage. A simple way to simulate them is with Virtualbox as the provider of VMs (‘Virtual Machines’) & Vagrant as a the front-end… Read more »