1 | Apache Solr Content Extraction Library (Solr Cell) |
---|
2 | Release Notes |
---|
3 | |
---|
4 | This file describes changes to the Solr Cell (contrib/extraction) module. See SOLR-284 for details. |
---|
5 | |
---|
6 | Introduction |
---|
7 | ------------ |
---|
8 | |
---|
9 | Apache Solr Extraction provides a means for extracting and indexing content contained in "rich" documents, such |
---|
10 | as Microsoft Word, Adobe PDF, etc. (Each name is a trademark of their respective owners) This contrib module |
---|
11 | uses Apache Tika to extract content and metadata from the files, which can then be indexed. For more information, |
---|
12 | see http://wiki.apache.org/solr/ExtractingRequestHandler |
---|
13 | |
---|
14 | Getting Started |
---|
15 | --------------- |
---|
16 | You will need Solr up and running. Then, simply add the extraction JAR file, plus the Tika dependencies (in the ./lib folder) |
---|
17 | to your Solr Home lib directory. See http://wiki.apache.org/solr/ExtractingRequestHandler for more details on hooking it in |
---|
18 | and configuring. |
---|
19 | |
---|
20 | Tika Dependency |
---|
21 | --------------- |
---|
22 | |
---|
23 | Current Version: Tika 1.1 (released 2012-03-23) |
---|
24 | |
---|
25 | $Id$ |
---|
26 | |
---|
27 | ================== Release 4.0.0-ALPHA ============== |
---|
28 | |
---|
29 | * SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy) |
---|
30 | |
---|
31 | ================== Release 3.6.0 ================== |
---|
32 | |
---|
33 | * SOLR-2346: Add a chance to set content encoding explicitly via content type of stream. |
---|
34 | This is convenient when Tika's auto detector cannot detect encoding, especially |
---|
35 | the text file is too short to detect encoding. (koji) |
---|
36 | |
---|
37 | * SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy) |
---|
38 | |
---|
39 | * SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml) |
---|
40 | because it requires java 6. If you want to parse this content and are willing to |
---|
41 | use java 6, just add the jar. (rmuir) |
---|
42 | |
---|
43 | ================== Release 3.5.0 ================== |
---|
44 | |
---|
45 | * SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy) |
---|
46 | |
---|
47 | ================== Release 3.4.0 ================== |
---|
48 | |
---|
49 | * SOLR-2540: CommitWithin as an Update Request parameter |
---|
50 | You can now specify &commitWithin=N (ms) on the update request (janhoy) |
---|
51 | |
---|
52 | * SOLR-2743: Remove commons logging. (koji) |
---|
53 | |
---|
54 | ================== Release 3.3.0 ================== |
---|
55 | |
---|
56 | (No Changes) |
---|
57 | |
---|
58 | ================== Release 3.2.0 ================== |
---|
59 | |
---|
60 | * SOLR-2480: Add ignoreTikaException flag so that users can ignore TikaException but index |
---|
61 | meta data. (Shinichiro Abe, koji) |
---|
62 | |
---|
63 | ================== Release 3.1.0 ================== |
---|
64 | |
---|
65 | * SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call |
---|
66 | |
---|
67 | * SOLR-1756: The date.format setting causes ClassCastException when enabled and the config code that |
---|
68 | parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller) |
---|
69 | |
---|
70 | * SOLR-18913: Add ICU4j to libs and add tests for Arabic extraction (Robert Muir via gsingers) |
---|
71 | |
---|
72 | * SOLR-1902: Upgraded to Tika 0.8-SNAPSHOT to incorporate passing in Solr's custom ClassLoader (gsingers) |
---|
73 | |
---|
74 | ================== Release 1.4.0 ================== |
---|
75 | |
---|
76 | 1. SOLR-284: Added in support for extraction. (Eric Pugh, Chris Harris, gsingers) |
---|
77 | |
---|
78 | 2. SOLR-284: Removed "silent success" key generation (gsingers) |
---|
79 | |
---|
80 | 3. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers) |
---|
81 | |
---|
82 | 4. SOLR-1128: Added metadata output to "extract only" option. (gsingers) |
---|
83 | |
---|
84 | 5. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now. |
---|
85 | See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c |
---|
86 | for discussion on language detection. |
---|
87 | See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers) |
---|
88 | |
---|
89 | 6. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers) |
---|