npm install compromise
-
how easy text is to make,
↬ᔐᖜ↬ and how hard it is to actually parse and use?
it makes limited and sensible decisions.
it's not as smart as you'd think.
import nlp from 'compromise'
let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'
if (doc.has('simon says #Verb')) {
return true
}
let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
and get data:
import plg from 'compromise-speech'
nlp.extend(plg)
let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
"text": "Milwaukee",
"terms": [{
"normal": "milwaukee",
"syllables": ["mil", "wau", "kee"]
}]
}]
*/
avoid the problems of brittle parsers:
let doc = nlp("we're not gonna take it..")
doc.has('gonna') // true
doc.has('going to') // true (implicit)
// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'
and whip stuff around like it's data:
let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'
-because it actually is-
let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'
Use it on the client-side:
<script src="https://unpkg.com/compromise"></script>
<script>
var doc = nlp('two bottles of beer')
doc.numbers().minus(1)
document.body.innerHTML = doc.text()
// 'one bottle of beer'
</script>
or likewise:
import nlp from 'compromise'
var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'
compromise is ~250kb (minified):
it's pretty fast. It can run on keypress:
it works mainly by conjugating all forms of a basic word list.
The final lexicon is ~14,000 words:
you can read more about how it works, here. it's weird.
okay -
compromise/one
A tokenizer
of words, sentences, and punctuation.
import nlp from 'compromise/one'
let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
normal:"wayne's world party time",
terms:[{ text: "Wayne's", normal: "wayne" },
...
]
}]
*/
compromise/one splits your text up, wraps it in a handy API,
-
and does nothing else -
/one is quick - most sentences take a 10th of a millisecond.
It can do ~1mb of text a second - or 10 wikipedia pages.
Infinite jest takes 3s.
compromise/two
A part-of-speech
tagger, and grammar-interpreter.
import nlp from 'compromise/two'
let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"
compromise/two automatically calculates the very basic grammar of each word.
this is more useful than people sometimes realize.
Light grammar helps you write cleaner templates, and get closer to the information.
compromise has 83 tags, arranged in a handsome graph.
#FirstName →