Natural language processing in C ++
I am working on a project that already has a C ++ base. I would like to have a natural language processing plugin. I really like GATE , but I'm not sure if it's worth starting the JVM and splitting the project into C ++ and Java parts. I noticed that UIMA has a C ++ framework, but haven't tried it, but it seems to have fewer features than GATE.
Does anyone know a better option than trying to cover the GATE in C ++ somehow (for example, better NLP libraries in C ++)? If I complete GATE in C ++, which is better? SOA?
thank
source to share
List of resources for NLP (POS Taggers, NP Chunking, Sequence models, Parsers ...) in C ++ and other languages ββby Christopher Manning. Another one on Wikipedia.
There is also a Boost page for String and Text Processing .
source to share
Of course, this depends on what exactly you want to do.
GATE and UIMA are the foundation for NLP, mainly developed around the idea of ββinformation management and extraction. It's not entirely fair to say that GATE has more features than UIMA, since strictly both of them are only frameworks. However, GATE comes with ANNIE, which has a lot of nice features that you might find useful (again, depending on what you want to do). UIMA is bundled with OpenNLP libraries that reflect some, but not all, of these features, but are written in Java, so a JVM will need to be loaded.
You can find similar functionality for GATE / ANNIE or UIMA / OpenNLP using C ++ libraries, but the nice thing about the two environments is that they are consistent and don't require a lot of "glue code" to deploy the individual libraries to each other. friend.
What is the reason for not wanting to wrap GATE in C ++ code? I can appreciate that this will add complexity to the project, but if your worries are about performance / memory, then the JVM might be the least of your worries. NLP tools tend to be very hungry, expecting to do half the gig for NER models, more for the statistical parser.
source to share
Perhaps you would like to take a look at NLP ++, a programming language designed for natural language processing and text analytics.
I want to start here:
This package contains everything you need to get started with NLP ++. Yes, you need to learn a new programming language, but it is similar to C ++ and you don't need to use the black box API. Also, the compiled text analyzer in VisualText creates a Visual Studio solution that you can include in other C ++ projects.
You can use VisualText and NLP ++ for free for non-commercial projects.
Join the NLP ++ community to ask questions, discuss your parsers, and learn more about NLP ++:
Respectfully,
Dominic Holenstein
NLP ++ Community Manager
source to share