Tag Archives: Word Embeddings

BoW to BERT

Ashok Chilakapati September 23, 2019 No Comments

Word vectors have evolved over the years to know the difference between “record the play” vs “play the record”. They have evolved from a one-hot world where every word was orthogonal to every other word, to a place where word vectors morph to suit the context. Slapping a BoW on word vectors is the usual way to build a document vector for tasks such as classification. But BERT does not need a BoW as the vector shooting out of the top [CLS] token is already primed for the specific classification objective

Flowing Tensors and Heaping Parameters in Deep Learning

Ashok Chilakapati June 6, 2019 1 Comment

Formulae for trainable parameter counts are developed for a few popular layers as function of layer parameters and input characteristics. The results are then reconciled with what Keras reports upon running the model…

Want to Cluster Text? Try Custom Word-Embeddings!

Ashok Chilakapati December 14, 2018 No Comments

Tf-idf vectors with word-embeddings are analyzed for clustering effectiveness. The text corpus examples considered here indicate that custom word-embeddings can help with clustering

Word Embeddings and Document Vectors: Part 2. Classification

Ashok Chilakapati October 9, 2018 1 Comment

In the previous post Word Embeddings and Document Vectors: Part 1. Similarity we laid the groundwork for using bag-of-words based document vectors in conjunction with word embeddings (pre-trained or custom-trained) for computing document similarity, as a precursor to classification. It seemed that document+word vectors were better at picking up on similarities… Read more »

Word Embeddings and Document Vectors: Part 1. Similarity

Ashok Chilakapati September 27, 2018 2 Comments

Classification hinges on the notion of similarity. This similarity can be as simple as a categorical feature value such as the color or shape of the objects we are classifying, or a more complex function of all categorical and/or continuous feature values that these objects possess. Documents can be classified… Read more »