Gensim

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Gensim
Gensim logo.png
Original author(s) Radim Řehůřek
Developer(s) various
Stable release 0.12.4 / 29 January 2016; 9 years ago (2016-01-29)
Development status active
Written in Python
Platform cross-platform
Type Natural language processing
License LGPL
Website radimrehurek.com/gensim/

Gensim is an open-source vector space modeling and topic modeling toolkit, implemented in the Python programming language. It uses NumPy, SciPy and optionally Cython for performance. It is specifically intended for handling large text collections, using efficient online, incremental algorithms.

Gensim includes implementations of tf-idf, random projections, word2vec and document2vec algorithms,[1] hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.[2]

Gensim has been used in a number of commercial as well as academic applications.[3][4] The code is hosted on GitHub[5] and a support forum is maintained on Google Groups.[6]

Some of the online algorithms in gensim were also published in the PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek (2011).[7][8]

References

<templatestyles src="Reflist/styles.css" />

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />

External links

<templatestyles src="Asbox/styles.css"></templatestyles>

  1. Deep learning with word2vec and gensim
  2. Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks.
  3. Interview with Radim Řehůřek, creator of gensim
  4. gensim academic citations
  5. gensim source code
  6. gensim mailing list
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. http://decisionstats.com/2015/12/07/decisionstats-interview-radim-rehurek-gensim-python/