source: sandbox/expresso-solr/solr/contrib/extraction/CHANGES.txt @ 7588

Revision 7588, 3.5 KB checked in by adir, 11 years ago (diff)

Ticket #000 - Adicionando a integracao de buscas com Solr na base a ser isnerida na comunidade

Line 
1Apache Solr Content Extraction Library (Solr Cell)
2                            Release Notes
3
4This file describes changes to the Solr Cell (contrib/extraction) module.  See SOLR-284 for details.
5
6Introduction
7------------
8
9Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such
10as Microsoft Word, Adobe PDF, etc.  (Each name is a trademark of their respective owners)  This contrib module
11uses Apache Tika to extract content and metadata from the files, which can then be indexed.  For more information,
12see http://wiki.apache.org/solr/ExtractingRequestHandler
13
14Getting Started
15---------------
16You will need Solr up and running.  Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder)
17to your Solr Home lib directory.  See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in
18 and configuring.
19
20Tika Dependency
21---------------
22
23Current Version: Tika 1.1 (released 2012-03-23)
24
25$Id$
26
27================== Release 4.0.0-ALPHA ==============
28
29* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
30
31================== Release 3.6.0 ==================
32
33* SOLR-2346: Add a chance to set content encoding explicitly via content type of stream.
34  This is convenient when Tika's auto detector cannot detect encoding, especially
35  the text file is too short to detect encoding. (koji)
36
37* SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy)
38
39* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml)
40  because it requires java 6. If you want to parse this content and are willing to
41  use java 6, just add the jar. (rmuir)
42
43================== Release 3.5.0 ==================
44
45* SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy)
46
47================== Release 3.4.0 ==================
48
49* SOLR-2540: CommitWithin as an Update Request parameter
50  You can now specify &commitWithin=N (ms) on the update request (janhoy)
51
52* SOLR-2743: Remove commons logging. (koji)
53
54================== Release 3.3.0 ==================
55
56(No Changes)
57
58================== Release 3.2.0 ==================
59
60* SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index
61  meta data. (Shinichiro Abe, koji)
62
63================== Release 3.1.0 ==================
64
65* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
66
67* SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that
68  parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller)
69
70* SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers)
71
72* SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers)
73
74================== Release 1.4.0 ==================
75
761. SOLR-284:  Added in support for extraction. (Eric Pugh, Chris Harris, gsingers)
77
782. SOLR-284: Removed "silent success" key generation (gsingers)
79
803. SOLR-1075: Upgrade to Tika 0.3.  See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
81
824. SOLR-1128: Added metadata output to "extract only" option.  (gsingers)
83
845. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now.
85    See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
86    for discussion on language detection.
87    See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
88
896. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers)   
Note: See TracBrowser for help on using the repository browser.