Text Wrangling

Which of the following would be of most use in cases where a specific ordering of words is important in a text document?
Correct! N-grams allow for multiple words to be together. Where a bag of words (BOW) uses just single word tokens, phrases can be preserved in an N-gram BOW.
Incorrect. A bag of words includes just single-word tokens and therefore wouldn't allow for information from word ordering to be preserved.
That's not right. A document term matrix (DTM) is essentially a spreadsheet of words (tokens) showing frequency of those tokens in various documents. But word ordering is not preserved in a DTM.
N-grams
Bag of words
Document term matrix

The quickest way to get your CFA® charter

Adaptive learning technology

5000+ practice questions

8 simulation exams

Industry-Leading Pass Insurance

Save 100+ hours of your life

Tablet device with “CFA® Exam | Bloomberg Exam Prep” app