WEKO3
アイテム
{"_buckets": {"deposit": "459d01fc-c073-404e-9e39-86825a1d8443"}, "_deposit": {"created_by": 2, "id": "20352", "owners": [2], "pid": {"revision_id": 0, "type": "depid", "value": "20352"}, "status": "published"}, "_oai": {"id": "oai:nagasaki-u.repo.nii.ac.jp:00020352", "sets": ["21"]}, "author_link": ["86145", "86144", "86143"], "item_2_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2008-03", "bibliographicIssueDateType": "Issued"}, "bibliographicPageEnd": "26", "bibliographicPageStart": "13", "bibliographicVolumeNumber": "4938", "bibliographic_titles": [{"bibliographic_title": "Lecture notes in computer science"}]}]}, "item_2_description_4": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.", "subitem_description_type": "Abstract"}]}, "item_2_description_5": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "The original publication is available at www.springerlink.com", "subitem_description_type": "Other"}, {"subitem_description": "Large-scale Knowledge Resources: Construction and Application - Third International Conference on Large-scale Knowledge Resources, Lkr 2008, Tokyo, Japan, March 3-5, 2008, Proceedings", "subitem_description_type": "Other"}]}, "item_2_description_63": {"attribute_name": "引用", "attribute_value_mlt": [{"subitem_description": "Lecture Notes in Computer Science, 4938, pp.13-26; 2008", "subitem_description_type": "Other"}]}, "item_2_publisher_33": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "Springer"}]}, "item_2_relation_12": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isVersionOf", "subitem_relation_type_id": {"subitem_relation_type_id_text": "10.1007/978-3-540-78159-2_2", "subitem_relation_type_select": "DOI"}}]}, "item_2_relation_9": {"attribute_name": "ISBN", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "978-3-540-78158-5", "subitem_relation_type_select": "ISBN"}}]}, "item_2_source_id_10": {"attribute_name": "書誌レコードID", "attribute_value_mlt": [{"subitem_source_identifier": "AA0071599X", "subitem_source_identifier_type": "NCID"}]}, "item_2_source_id_7": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "03029743", "subitem_source_identifier_type": "ISSN"}]}, "item_2_source_id_8": {"attribute_name": "EISSN", "attribute_value_mlt": [{"subitem_source_identifier": "1611-3349", "subitem_source_identifier_type": "ISSN"}]}, "item_2_version_type_16": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_ab4af688f83e57aa", "subitem_version_type": "AM"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Masada, Tomonari"}], "nameIdentifiers": [{"nameIdentifier": "86143", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Kiyasu, Senya"}], "nameIdentifiers": [{"nameIdentifier": "86144", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Miyahara, Sueharu"}], "nameIdentifiers": [{"nameIdentifier": "86145", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-12-23"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "LNCS4938_13.pdf", "filesize": [{"value": "237.8 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 237800.0, "url": {"label": "LNCS4938_13.pdf", "url": "https://nagasaki-u.repo.nii.ac.jp/record/20352/files/LNCS4938_13.pdf"}, "version_id": "6decab8d-d8f2-402f-95d9-3b3403169990"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering"}]}, "item_type_id": "2", "owner": "2", "path": ["21"], "permalink_uri": "http://hdl.handle.net/10069/16305", "pubdate": {"attribute_name": "公開日", "attribute_value": "2008-03-31"}, "publish_date": "2008-03-31", "publish_status": "0", "recid": "20352", "relation": {}, "relation_version_is_last": true, "title": ["Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering"], "weko_shared_id": -1}
Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering
http://hdl.handle.net/10069/16305
http://hdl.handle.net/10069/1630576f229f3-d491-45b5-8ff6-494e7f23d8b6
名前 / ファイル | ライセンス | アクション |
---|---|---|
LNCS4938_13.pdf (237.8 kB)
|
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2008-03-31 | |||||
タイトル | ||||||
タイトル | Comparing LDA with pLSI as a Dimensionality Reduction Method in Document Clustering | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||
資源タイプ | journal article | |||||
著者 |
Masada, Tomonari
× Masada, Tomonari× Kiyasu, Senya× Miyahara, Sueharu |
|||||
抄録 | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering. | |||||
内容記述 | ||||||
内容記述タイプ | Other | |||||
内容記述 | The original publication is available at www.springerlink.com | |||||
内容記述 | ||||||
内容記述タイプ | Other | |||||
内容記述 | Large-scale Knowledge Resources: Construction and Application - Third International Conference on Large-scale Knowledge Resources, Lkr 2008, Tokyo, Japan, March 3-5, 2008, Proceedings | |||||
書誌情報 |
Lecture notes in computer science 巻 4938, p. 13-26, 発行日 2008-03 |
|||||
出版者 | ||||||
出版者 | Springer | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 03029743 | |||||
EISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 1611-3349 | |||||
ISBN | ||||||
識別子タイプ | ISBN | |||||
関連識別子 | 978-3-540-78158-5 | |||||
書誌レコードID | ||||||
収録物識別子タイプ | NCID | |||||
収録物識別子 | AA0071599X | |||||
DOI | ||||||
関連タイプ | isVersionOf | |||||
識別子タイプ | DOI | |||||
関連識別子 | 10.1007/978-3-540-78159-2_2 | |||||
著者版フラグ | ||||||
出版タイプ | AM | |||||
出版タイプResource | http://purl.org/coar/version/c_ab4af688f83e57aa | |||||
引用 | ||||||
内容記述タイプ | Other | |||||
内容記述 | Lecture Notes in Computer Science, 4938, pp.13-26; 2008 |