Natural language processing in C ++

I am working on a project that already has a C ++ base. I would like to have a natural language processing plugin. I really like GATE , but I'm not sure if it's worth starting the JVM and splitting the project into C ++ and Java parts. I noticed that UIMA has a C ++ framework, but haven't tried it, but it seems to have fewer features than GATE.

Does anyone know a better option than trying to cover the GATE in C ++ somehow (for example, better NLP libraries in C ++)? If I complete GATE in C ++, which is better? SOA?

thank

+2


source to share


3 answers


List of resources for NLP (POS Taggers, NP Chunking, Sequence models, Parsers ...) in C ++ and other languages ​​by Christopher Manning. Another one on Wikipedia.



There is also a Boost page for String and Text Processing .

+5


source


Of course, this depends on what exactly you want to do.

GATE and UIMA are the foundation for NLP, mainly developed around the idea of ​​information management and extraction. It's not entirely fair to say that GATE has more features than UIMA, since strictly both of them are only frameworks. However, GATE comes with ANNIE, which has a lot of nice features that you might find useful (again, depending on what you want to do). UIMA is bundled with OpenNLP libraries that reflect some, but not all, of these features, but are written in Java, so a JVM will need to be loaded.



You can find similar functionality for GATE / ANNIE or UIMA / OpenNLP using C ++ libraries, but the nice thing about the two environments is that they are consistent and don't require a lot of "glue code" to deploy the individual libraries to each other. friend.

What is the reason for not wanting to wrap GATE in C ++ code? I can appreciate that this will add complexity to the project, but if your worries are about performance / memory, then the JVM might be the least of your worries. NLP tools tend to be very hungry, expecting to do half the gig for NER models, more for the statistical parser.

+1


source


Perhaps you would like to take a look at NLP ++, a programming language designed for natural language processing and text analytics.

I want to start here:

Starter package for NLP ++

This package contains everything you need to get started with NLP ++. Yes, you need to learn a new programming language, but it is similar to C ++ and you don't need to use the black box API. Also, the compiled text analyzer in VisualText creates a Visual Studio solution that you can include in other C ++ projects.

You can use VisualText and NLP ++ for free for non-commercial projects.

Join the NLP ++ community to ask questions, discuss your parsers, and learn more about NLP ++:

NLP ++ Community

Respectfully,

Dominic Holenstein

NLP ++ Community Manager

0


source







All Articles