Author Archives: Ashok Chilakapati
Computing on Coupled Data Streams with Beam
Coupled data streams need to be analyzed together paying particular attention to simultaneity in event-time and other process specific variables controlling the data streams. With the help of a test problem with known exact solutions, we see that the pipeline processing with Beam can accurately reproduce them.
A Serving Flask on Docker
Serving a flask application with gunicorn and nginx on docker… Packaging applications for reproducible results across environments has gotten a great boost with docker. Docker allows us to bundle the application with all its dependencies so that the resulting image can be run anywhere with a compatible docker runtime. The… Read more »
A Flask Full of Whiskey (WSGI)
Serving up python web applications has never been easier with the suite of WSGI servers currently at our disposal. Both uWSGI and gunicorn behind Nginx are excellent performers for serving up a Flask app… Yup, what more could you ask for in life right? There are a number of varieties… Read more »
Predictions R Us
Have Unbalanced Classes? Try Significant Terms
The words that are significant to a class can be used improve the precision-recall trade off in classification. Using the top significant terms as the vocabulary to drive a classifier yields improved results with a much small sized model for predicting MIMIC-III CCU readmissions from discharge notes
Predicting ICU Readmission from Discharge Notes: Significant Terms
Querying with high frequency terms improves recall and, the rare terms precision. The significant terms balance both while offering some discriminative capacity among the latent classes the retrieved documents may belong to. The MIMIC-III dataset is studied here in the context of predicting patient readmission from the discharge notes with Elasticsearch driving the significance measures…
Semantics at Scale: BERT + Elasticsearch
Semantic search at scale is made possible with the advent of tools like BERT, bert-as-service, and of course support for dense vector manipulations in Elasticsearch. While the degree may vary depending on the use case, the search results can certainly benefit from augmenting the keyword based results with the semantic ones…
BoW vs BERT: Classification
BoW to BERT
Word vectors have evolved over the years to know the difference between “record the play” vs “play the record”. They have evolved from a one-hot world where every word was orthogonal to every other word, to a place where word vectors morph to suit the context. Slapping a BoW on word vectors is the usual way to build a document vector for tasks such as classification. But BERT does not need a BoW as the vector shooting out of the top [CLS] token is already primed for the specific classification objective