Tag Archives: Singular Value Decomposition

Reduced Order Models for Documents

The term-document matrix  is a high-order, high-fidelity model for the document-space. High-fidelity in the sense that  will correctly shred-bag-tag it to represent it as a vector in term-space as per VSM.  has entries, with distinct terms (rows) building documents (columns). But do we need all those values to capture this shred-bag-tag effect of … Read more »

Stacks of Documents and Bags of Words

      No Comments on Stacks of Documents and Bags of Words

Consider these two one-line documents – “Eat to Live” and “Live to Eat“. They contain the same words, but in different order – leading to a big difference in meaning. Or consider – “Working Hard” & “Hardly Working“. Popular stemmers such as snowball convert ‘Hardly‘ to ‘Hard‘ so that functionally… Read more »

Data Dimensionality and Sensitivity to Sampling

I wanted to get back to the analysis of quotes from a semantics perspective and write about searching & clustering them with Latent Semantic Analysis (LSA). Thought it was going to be a straightforward exercise in applying the venerable gensim package and appreciating the augmented information retrieval capabilities of LSA… Read more »