- Name: boilerpipe
- Version: 1.2.0
- Release: 1.mga2
- Epoch:
- Group: Development/Java
- License: ASL 2.0
- Url: http://code.google.com/p/boilerpipe/
- Summary: Boilerplate Removal and Fulltext Extraction from HTML pages
- Architecture: noarch
- Size: 117729
- Distribution: Mageia
- Vendor: Mageia.Org
- Packager: Mageia Team <http://www.mageia.org>
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.
The library already provides specific strategies
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate.
- OptFlags: -O2 -g -pipe -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4
- Cookie: jonund 1320587292
- Buildhost: jonund
Sources packages:
Other version of this rpm: