RcppMeCab 0.0.1.2 is released on CRAN. You can install the package via install.packages in Windows/Mac OS X/Linux (Solaris is not supported). In this version:

  • Several errors are fixed
  • Each function checks getOption("mecabSysDic") to get user preference of MeCab system dictionary
  • An option for result type is added: with arg format="data.frame"
  • Presents input character vecters over the result list attributes (names)
  • A single character vector input in pos() will return a list

Error Fixed

Two errors in version 0.0.1.1 are fixed.

  • loop version of pos function is fixed (duplicated result)
  • sys_dic is now working properly

options(mecabSysDic=’’)

You can save your preferred MeCab system dictionary via options(mecabSysDic="(directory)" in R console.

Result in data frame type

You can get the result from the function with a parameter format. For example,

> pos("안녕하세요.", format="data.frame")
  doc_id sentence_id token_id token   pos subtype
1      1           1        1  안녕   NNG        
2      1           1        2    하   XSV        
3      1           1        3  세요 EP+EF        
4      1           1        4     .    SF

# "안녕하세요." means hello.

If you put a character vector which has multiple strings, the function will return doc_id based on the vector attribute names.

Minor changes

Now the function will attach the input text on the vector attribute names. For example,

> pos("안녕하세요.")
$안녕하세요.
[1] "안녕/NNG"   "하/XSV"     "세요/EP+EF" "./SF"      

As you can see in the above example, pos function returns a list, not a character vector when the input is a one-length character vector.