Scalability Issue In Mining Large Data Sets
Author(s)
A. Mc Manus & M.-T. Kechadi
Abstract
The most distinct characteristic of data mining is that it deals with large data
sets. This requires the algorithms used in data mining to be highly scalable. However,
most algorithms currently used in data mining do not scale very well when
applied to very large data sets because they were initially developed and tested
upon smaller data sets. Today, we have such large data sets that these algorithms
are no longer efficient enough for mining and analysing. In this paper, we have
addressed a data clustering problem and developed a scalable algorithm, which
provides very high precision and recall values. This algorithm is used on a very
large real-world data-set, TDT2, and experimental results showed that this algorithm
performs well compared to traditional ones.
1 Introduction
Nowadays, people have great demand on knowledge and information, while
information overload becoming one serious problem. News media and publishing
industries therefore try to suit customers needs by using electronic information
management system. Document clustering algorithm has been introduced to
group similar documents together for easier searching and reading. Document
clustering algorithm has been widely used in news media and publishing industry,
which ensured it effectiveness over manual clustering.With labor cost reduced
and time saved, document clustering algorithms provides convenient clusterednews
for users.
Clustering is an important task that is performed as part of many text mining
and information retrieval systems. Clustering can be used for efficiently finding
the nearest neighbors of a document [1, 4], for improving the precision or recall
in information retrieval systems, for aid in browsing a collection of documents [2],
Keywords
Related Book
Other papers in this volume
Warning (2)
: foreach() argument must be of type array|object, null given [in
/var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/templates/Papers/view.php, line
364]
Code
$counter = '0';
foreach ($paper['book']['Paper'] as $otherPaper) {
if ((!empty($otherPaper['name'])) && ($counter < '7') && ($otherPaper['available'] == 1)) {
Cake\Error\ErrorTrap->handleError() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/templates/Papers/view.php, line 364
/var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/View/View.php /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/View/View.php, line 1188
Cake\View\View->_evaluate() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/View/View.php, line 1145
Cake\View\View->_render() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/View/View.php, line 785
Cake\View\View->render() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Controller/Controller.php, line 712
Cake\Controller\Controller->render() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Controller/Controller.php, line 516
Cake\Controller\Controller->invokeAction() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php, line 166
Cake\Controller\ControllerFactory->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php, line 141
Cake\Controller\ControllerFactory->invoke() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/BaseApplication.php, line 362
Cake\Http\BaseApplication->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 86
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Middleware/CsrfProtectionMiddleware.php, line 159
Cake\Http\Middleware\CsrfProtectionMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Middleware/BodyParserMiddleware.php, line 157
Cake\Http\Middleware\BodyParserMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Routing/Middleware/RoutingMiddleware.php, line 118
Cake\Routing\Middleware\RoutingMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Routing/Middleware/AssetMiddleware.php, line 69
Cake\Routing\Middleware\AssetMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Error/Middleware/ErrorHandlerMiddleware.php, line 115
Cake\Error\Middleware\ErrorHandlerMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/debug_kit/src/Middleware/DebugKitMiddleware.php, line 60
DebugKit\Middleware\DebugKitMiddleware->process() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 82
Cake\Http\Runner->handle() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Runner.php, line 60
Cake\Http\Runner->run() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/vendor/cakephp/cakephp/src/Http/Server.php, line 104
Cake\Http\Server->run() /var/www/dce7ae55-385b-4ffa-8595-3ec5e61ff110/public_html/app/webroot/index.php, line 37
[main]