source: sandbox/expresso-solr/solr/contrib/dataimporthandler/CHANGES.txt @ 7588

Revision 7588, 23.6 KB checked in by adir, 11 years ago (diff)

Ticket #000 - Adicionando a integracao de buscas com Solr na base a ser isnerida na comunidade

Line 
1                    Apache Solr - DataImportHandler
2                            Release Notes
3
4Introduction
5------------
6DataImportHandler is a data import tool for Solr which makes importing data from Databases, XML files and
7HTTP data sources quick and easy.
8
9
10$Id: CHANGES.txt 1350278 2012-06-14 14:52:22Z jdyer $
11==================  4.0.0-ALPHA ==============
12Bug Fixes
13----------------------
14* SOLR-3430: Added a new test against a real SQL database.  Fixed problems revealed by this new test
15             related to  the expanded cache support added to 3.6/SOLR-2382 (James Dyer)
16             
17* SOLR-1958: When using the MailEntityProcessor, import would fail if fetchMailsSince was not specified.
18             (Max Lynch via James Dyer)
19
20Other Changes
21----------------------
22* SOLR-3262: The "threads" feature is removed (deprecated in Solr 3.6) (James Dyer)
23
24* SOLR-3422: Refactored internal data classes.
25             All entities in data-config.xml must have a name (James Dyer)
26
27==================  3.6.1 ==================
28
29Bug Fixes
30----------------------
31* SOLR-3336: SolrEntityProcessor substitutes most variables at query time
32             (Michael Kroh, Lance Norskog, via Martijn van Groningen)
33
34==================  3.6.0 ==================
35
36New Features
37----------------------
38* SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core or instance based on a specified query.
39             (Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna, Martijn van Groningen)
40             Additional Work:
41             SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency between solr parameters
42             and parameters used in SolrEntityProcessor and ability to specify a custom HttpClient instance.
43             (Luca Cavanna via Martijn van Groningen)
44* SOLR-2382: Added pluggable cache support so that any Entity can be made cache-able by adding the "cacheImpl" parameter.
45             Include "SortedMapBackedCache" to provide in-memory caching (as previously this was the only option when
46             using CachedSqlEntityProcessor).  Users can provide their own implementations of DIHCache for other
47             caching strategies.  Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with
48             SqlEntityProcessor.  Make SolrWriter implement DIHWriter and allow the possibility of pluggable Writers
49             (DIH writing to something other than Solr).  (James Dyer, Noble Paul)
50
51Changes in Runtime Behavior
52----------------------
53* SOLR-3142: Imports no longer default optimize to true, instead false. If you want to force all segments to be merged
54             into one, you can specify this parameter yourself. NOTE: this can be very expensive operation and usually
55             does not make sense for delta-imports.  (Robert MUir)
56
57==================  3.5.0 ==================
58
59Bug Fixes
60----------------------
61* SOLR-2875: Fix the incorrect url in tika-data-config.xml (Shinichiro Abe via koji)
62
63==================  3.4.0 ==================
64
65Bug Fixes
66----------------------
67* SOLR-2644: When using threads=2 the default logging is set too high (Bill Bell via shalin)
68* SOLR-2492: DIH does not commit if only deletes are processed (James Dyer via shalin)
69* SOLR-2186: DataImportHandler's multi-threaded option throws NPE (Lance Norskog, Frank Wesemann, shalin)
70* SOLR-2655: DIH multi threaded mode does not resolve attributes correctly (Frank Wesemann, shalin)
71* SOLR-2695: Documents are collected in unsynchronized list in multi-threaded debug mode (Michael McCandless, shalin)
72* SOLR-2668: DIH multithreaded mode does not rollback on errors from EntityProcessor (Frank Wesemann, shalin)
73
74==================  3.3.0 ==================
75
76* SOLR-2551: Check dataimport.properties for write access (if delta-import is supported
77  in DIH configuration) before starting an import (C S, shalin)
78
79==================  3.2.0 ==================
80
81(No Changes)
82
83==================  3.1.0 ==================
84Upgrading from Solr 1.4
85----------------------
86
87Versions of Major Components
88---------------------
89
90Detailed Change List
91----------------------
92
93New Features
94----------------------
95
96* SOLR-1525 : allow DIH to refer to core properties (noble)
97
98* SOLR-1547 : TemplateTransformer copy objects more intelligently when there when the template is a single variable (noble)
99
100* SOLR-1627 : VariableResolver should be fetched just in time (noble)
101
102* SOLR-1583 : Create DataSources that return InputStream (noble)
103
104* SOLR-1358 : Integration of Tika and DataImportHandler ( Akshay Ukey, noble)
105
106* SOLR-1654 : TikaEntityProcessor example added DIHExample (Akshay Ukey via noble)
107
108* SOLR-1678 :  Move onError handling to DIH framework (noble)
109
110* SOLR-1352 : Multi-threaded implementation of DIH (noble)
111
112* SOLR-1721 : Add explicit option to run DataImportHandler in synchronous mode (Alexey Serba via noble)
113
114* SOLR-1737 : Added FieldStreamDataSource (noble)
115
116Optimizations
117----------------------
118
119* SOLR-2200: Improve the performance of DataImportHandler for large delta-import
120  updates. (Mark Waddle via rmuir)
121 
122Bug Fixes
123----------------------
124* SOLR-1638: Fixed NullPointerException during import if uniqueKey is not specified
125  in schema (Akshay Ukey via shalin)
126
127* SOLR-1639: Fixed misleading error message when dataimport.properties is not writable (shalin)
128
129* SOLR-1598: Reader used in PlainTextEntityProcessor is not explicitly closed (Sascha Szott via noble)
130
131* SOLR-1759: $skipDoc was not working correctly (Gian Marco Tagliani via noble)
132
133* SOLR-1762: DateFormatTransformer does not work correctly with non-default locale dates (tommy chheng via noble)
134
135* SOLR-1757: DIH multithreading sometimes throws NPE (noble)
136
137* SOLR-1766: DIH with threads enabled doesn't respond to the abort command (Michael Henson via noble)
138
139* SOLR-1767: dataimporter.functions.escapeSql() does not escape backslash character (Sean Timm via noble)
140
141* SOLR-1811: formatDate should use the current NOW value always (Sean Timm via noble)
142
143* SOLR-1794: Dataimport of CLOB fields fails when getCharacterStream() is
144  defined in a superclass. (Gunnar Gauslaa Bergem via rmuir)
145
146* SOLR-2057: DataImportHandler never calls UpdateRequestProcessor.finish()
147  (Drew Farris via koji)
148
149* SOLR-1973: Empty fields in XML update messages confuse DataImportHandler. (koji)
150
151* SOLR-2221: Use StrUtils.parseBool() to get values of boolean options in DIH.
152  true/on/yes (for TRUE) and false/off/no (for FALSE) can be used for sub-options
153  (debug, verbose, synchronous, commit, clean, optimize) for full/delta-import commands. (koji)
154
155* SOLR-2310: getTimeElapsedSince() returns incorrect hour value when the elapse is over 60 hours
156  (tom liu via koji)
157
158* SOLR-2252: When a child entity in nested entities is rootEntity="true", delta-import doesn't work.
159  (koji)
160
161* SOLR-2330: solrconfig.xml files in example-DIH are broken. (Matt Parker, koji)
162
163* SOLR-1191: resolve DataImportHandler deltaQuery column against pk when pk
164  has a prefix (e.g. pk="book.id" deltaQuery="select id from ..."). More
165  useful error reporting when no match found (previously failed with a
166  NullPointerException in log and no clear user feedback). (gthb via yonik)
167
168* SOLR-2116: Fix TikaConfig classloader bug in TikaEntityProcessor
169  (Martijn van Groningen via hossman)
170 
171
172Other Changes
173----------------------
174
175* SOLR-1821: Fix TimeZone-dependent test failure in TestEvaluatorBag.
176  (Chris Male via rmuir)
177
178* SOLR-2367: Reduced noise in test output by ensuring the properties file can be written.
179  (Gunnlaugur Thor Briem via rmuir)
180
181
182Build
183----------------------
184
185
186Documentation
187----------------------
188
189================== Release 1.4.0 ==================
190
191Upgrading from Solr 1.3
192-----------------------
193
194Evaluator API has been changed in a non back-compatible way. Users who have developed custom Evaluators will need
195to change their code according to the new API for it to work. See SOLR-996 for details.
196
197The formatDate evaluator's syntax has been changed. The new syntax is formatDate(<variable>, '<format_string>').
198For example, formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was written without a single-quotes.
199The old syntax has been deprecated and will be removed in 1.5, until then, using the old syntax will log a warning.
200
201The Context API has been changed in a non back-compatible way. In particular, the Context.currentProcess() method
202now returns a String describing the type of the current import process instead of an int. Similarily, the public
203constants in Context viz. FULL_DUMP, DELTA_DUMP and FIND_DELTA are changed to a String type. See SOLR-969 for details.
204
205The EntityProcessor API has been simplified by moving logic for applying transformers and handling multi-row outputs
206from Transformers into an EntityProcessorWrapper class. The EntityProcessor#destroy is now called once per
207parent-row at the end of row (end of data). A new method EntityProcessor#close is added which is called at the end
208of import.
209
210In Solr 1.3, if the last_index_time was not available (first import) and a delta-import was requested, a full-import
211was run instead. This is no longer the case. In Solr 1.4 delta import is run with last_index_time as the epoch
212date (January 1, 1970, 00:00:00 GMT) if last_index_time is not available.
213
214Detailed Change List
215----------------------
216
217New Features
218----------------------
2191. SOLR-768:  Set last_index_time variable in full-import command.
220              (Wojtek Piaseczny, Noble Paul via shalin)
221
2222. SOLR-811:  Allow a "deltaImportQuery" attribute in SqlEntityProcessor which is used for delta imports
223              instead of DataImportHandler manipulating the SQL itself.
224              (Noble Paul via shalin)
225
2263. SOLR-842:  Better error handling in DataImportHandler with options to abort, skip and continue imports.
227              (Noble Paul, shalin)
228
2294. SOLR-833:  A DataSource to read data from a field as a reader. This can be used, for example, to read XMLs
230              residing as CLOBs or BLOBs in databases.
231              (Noble Paul via shalin)
232
2335. SOLR-887:  A Transformer to strip HTML tags.
234              (Ahmed Hammad via shalin)
235
2366. SOLR-886:  DataImportHandler should rollback when an import fails or it is aborted
237              (shalin)
238
2397. SOLR-891:  A Transformer to read strings from Clob type.
240              (Noble Paul via shalin)
241
2428. SOLR-812:  Configurable JDBC settings in JdbcDataSource including optimized defaults for read only mode.
243              (David Smiley, Glen Newton, shalin)
244
2459. SOLR-910:  Add a few utility commands to the DIH admin page such as full import, delta import, status, reload config.
246              (Ahmed Hammad via shalin)
247
24810.SOLR-938:  Add event listener API for import start and end.
249              (Kay Kay, Noble Paul via shalin)
250
25111.SOLR-801:  Add support for configurable pre-import and post-import delete query per root-entity.
252              (Noble Paul via shalin)
253
25412.SOLR-988:  Add a new scope for session data stored in Context to store objects across imports.
255              (Noble Paul via shalin)
256
25713.SOLR-980:  A PlainTextEntityProcessor which can read from any DataSource<Reader> and output a String.
258              (Nathan Adams, Noble Paul via shalin)
259
26014.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given xml node and its children.
261              (Noble Paul via shalin)
262
26315.SOLR-1001: Allow variables in various attributes of RegexTransformer, HTMLStripTransformer
264              and NumberFormatTransformer.
265              (Fergus McMenemie, Noble Paul, shalin)
266
26716.SOLR-989:  Expose running statistics from the Context API.
268              (Noble Paul, shalin)
269
27017.SOLR-996:  Expose Context to Evaluators.
271              (Noble Paul, shalin)
272
27318.SOLR-783:  Enhance delta-imports by maintaining separate last_index_time for each entity.
274              (Jon Baer, Noble Paul via shalin)
275
27619.SOLR-1033: Current entity's namespace is made available to all Transformers. This allows one to use an output field
277              of TemplateTransformer in other transformers, among other things.
278              (Fergus McMenemie, Noble Paul via shalin)
279
28020.SOLR-1066: New methods in Context to expose Script details. ScriptTransformer changed to read scripts
281              through the new API methods.
282              (Noble Paul via shalin)
283
28421.SOLR-1062: A LogTransformer which can log data in a given template format.
285              (Jon Baer, Noble Paul via shalin)
286
28722.SOLR-1065: A ContentStreamDataSource which can accept HTTP POST data in a content stream. This can be used to
288              push data to Solr instead of just pulling it from DB/Files/URLs.
289              (Noble Paul via shalin)
290
29123.SOLR-1061: Improve RegexTransformer to create multiple columns from regex groups.
292              (Noble Paul via shalin)
293
29424.SOLR-1059: Special flags introduced for deleting documents by query or id, skipping rows and stopping further
295              transforms. Use $deleteDocById, $deleteDocByQuery for deleting by id and query respectively.
296              Use $skipRow to skip the current row but continue with the document. Use $stopTransform to stop
297              further transformers. New methods are introduced in Context for deleting by id and query.
298              (Noble Paul, Fergus McMenemie, shalin)
299
30025.SOLR-1076: JdbcDataSource should resolve variables in all its configuration parameters.
301              (shalin)
302
30326.SOLR-1055: Make DIH JdbcDataSource easily extensible by making the createConnectionFactory method protected and
304              return a Callable<Connection> object.
305              (Noble Paul, shalin)
306
30727.SOLR-1058: JdbcDataSource can lookup javax.sql.DataSource using JNDI. Use a jndiName attribute to specify the
308              location of the data source.
309              (Jason Shepherd, Noble Paul via shalin)
310
31128.SOLR-1083: An Evaluator for escaping query characters.
312              (Noble Paul, shalin)
313
31429.SOLR-934:  A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index.
315              (Preetam Rao, shalin)
316
31730.SOLR-1060: A LineEntityProcessor which can stream lines of text from a given file to be indexed directly or
318              for processing with transformers and child entities.
319              (Fergus McMenemie, Noble Paul, shalin)
320
32131.SOLR-1127: Add support for field name to be templatized.
322              (Noble Paul, shalin)
323
32432.SOLR-1092: Added a new command named 'import' which does not automatically clean the index. This is useful and
325              more appropriate when one needs to import only some of the entities.
326              (Noble Paul via shalin)
327             
32833.SOLR-1153: 'deltaImportQuery' is honored on child entities as well (noble)
329
33034.SOLR-1230: Enhanced dataimport.jsp to work with all DataImportHandler request handler configurations,
331              rather than just a hardcoded /dataimport handler. (ehatcher)
332             
33335.SOLR-1235: disallow period (.) in entity names (noble)
334
33536.SOLR-1234: Multiple DIH does not work because all of them write to dataimport.properties.
336              Use the handler name as the properties file name (noble)
337
33837.SOLR-1348: Support binary field type in convertType logic in JdbcDataSource (shalin)
339
34038.SOLR-1406: Make FileDataSource and FileListEntityProcessor to be more extensible (Luke Forehand, shalin)
341
34239.SOLR-1437 : XPathEntityProcessor can deal with xpath syntaxes such as //tagname , /root//tagname (Fergus McMenemie via noble)
343
344Optimizations
345----------------------
3461. SOLR-846:  Reduce memory consumption during delta import by removing keys when used
347              (Ricky Leung, Noble Paul via shalin)
348
3492. SOLR-974:  DataImportHandler skips commit if no data has been updated.
350              (Wojtek Piaseczny, shalin)
351
3523. SOLR-1004: Check for abort more frequently during delta-imports.
353              (Marc Sturlese, shalin)
354
3554. SOLR-1098: DateFormatTransformer can cache the format objects.
356              (Noble Paul via shalin)
357
3585. SOLR-1465: Replaced string concatenations with StringBuilder append calls in XPathRecordReader.
359              (Mark Miller, shalin)
360
361
362Bug Fixes
363----------------------
3641. SOLR-800:  Deep copy collections to avoid ConcurrentModificationException in XPathEntityprocessor while streaming
365              (Kyle Morrison, Noble Paul via shalin)
366
3672. SOLR-823:  Request parameter variables ${dataimporter.request.xxx} are not resolved
368              (Mck SembWever, Noble Paul, shalin)
369
3703. SOLR-728:  Add synchronization to avoid race condition of multiple imports working concurrently
371              (Walter Ferrara, shalin)
372
3734. SOLR-742:  Add ability to create dynamic fields with custom DataImportHandler transformers
374              (Wojtek Piaseczny, Noble Paul, shalin)
375
3765. SOLR-832:  Rows parameter is not honored in non-debug mode and can abort a running import in debug mode.
377              (Akshay Ukey, shalin)
378
3796. SOLR-838:  The VariableResolver obtained from a DataSource's context does not have current data.
380              (Noble Paul via shalin)
381
3827. SOLR-864:  DataImportHandler does not catch and log Errors (shalin)
383
3848. SOLR-873:  Fix case-sensitive field names and columns (Jon Baer, shalin)
385
3869. SOLR-893:  Unable to delete documents via SQL and deletedPkQuery with deltaimport
387              (Dan Rosher via shalin)
388
38910. SOLR-888: DateFormatTransformer cannot convert non-string type
390              (Amit Nithian via shalin)
391
39211. SOLR-841: DataImportHandler should throw exception if a field does not have column attribute
393              (Michael Henson, shalin)
394
39512. SOLR-884: CachedSqlEntityProcessor should check if the cache key is present in the query results
396              (Noble Paul via shalin)
397
39813. SOLR-985: Fix thread-safety issue with TemplateString for concurrent imports with multiple cores.
399              (Ryuuichi Kumai via shalin)
400
40114. SOLR-999: XPathRecordReader fails on XMLs with nodes mixed with CDATA content.
402              (Fergus McMenemie, Noble Paul via shalin)
403
40415.SOLR-1000: FileListEntityProcessor should not apply fileName filter to directory names.
405              (Fergus McMenemie via shalin)
406
40716.SOLR-1009: Repeated column names result in duplicate values.
408              (Fergus McMenemie, Noble Paul via shalin)
409
41017.SOLR-1017: Fix thread-safety issue with last_index_time for concurrent imports in multiple cores due to unsafe usage
411              of SimpleDateFormat by multiple threads.
412              (Ryuuichi Kumai via shalin)
413
41418.SOLR-1024: Calling abort on DataImportHandler import commits data instead of calling rollback.
415              (shalin)
416
41719.SOLR-1037: DIH should not add null values in a row returned by EntityProcessor to documents.
418              (shalin)
419
42020.SOLR-1040: XPathEntityProcessor fails with an xpath like /feed/entry/link[@type='text/html']/@href
421              (Noble Paul via shalin)
422
42321.SOLR-1042: Fix memory leak in DIH by making TemplateString non-static member in VariableResolverImpl
424              (Ryuuichi Kumai via shalin)
425
42622.SOLR-1053: IndexOutOfBoundsException in SolrWriter.getResourceAsString when size of data-config.xml is a
427              multiple of 1024 bytes.
428              (Herb Jiang via shalin)
429
43023.SOLR-1077: IndexOutOfBoundsException with useSolrAddSchema in XPathEntityProcessor.
431              (Sam Keen, Noble Paul via shalin)
432
43324.SOLR-1080: RegexTransformer should not replace if regex is not matched.
434              (Noble Paul, Fergus McMenemie via shalin)
435
43625.SOLR-1090: DataImportHandler should load the data-config.xml using UTF-8 encoding.
437              (Rui Pereira, shalin)
438
43926.SOLR-1146: ConcurrentModificationException in DataImporter.getStatusMessages
440              (Walter Ferrara, Noble Paul via shalin)
441
44227.SOLR-1229: Fixes for deletedPkQuery, particularly when using transformed Solr unique id's
443              (Lance Norskog, Noble Paul via ehatcher)
444             
44528.SOLR-1286: Fix the commit parameter always defaulting to "true" even if "false" is explicitly passed in.
446              (Jay Hill, Noble Paul via ehatcher)
447           
44829.SOLR-1323: Reset XPathEntityProcessor's $hasMore/$nextUrl when fetching next URL (noble, ehatcher)
449
45030.SOLR-1450: Jdbc connection properties such as batchSize are not applied if the driver jar is placed
451              in solr_home/lib.
452              (Steve Sun via shalin)
453
45431.SOLR-1474: Delta-import should run even if last_index_time is not set.
455              (shalin)
456             
457
458Documentation
459----------------------
4601. SOLR-1369: Add HSQLDB Jar to example-DIH, unzip database and update instructions.
461
462Other
463----------------------
4641. SOLR-782:  Refactored SolrWriter to make it a concrete class and removed wrappers over SolrInputDocument.
465              Refactored to load Evaluators lazily. Removed multiple document nodes in the configuration xml.
466              Removed support for 'default' variables, they are automatically available as request parameters.
467              (Noble Paul via shalin)
468
4692. SOLR-964:  XPathEntityProcessor now ignores DTD validations
470              (Fergus McMenemie, Noble Paul via shalin)
471
4723. SOLR-1029: Standardize Evaluator parameter parsing and added helper functions for parsing all evaluator
473              parameters in a standard way.
474              (Noble Paul, shalin)
475
4764. SOLR-1081: Change EventListener to be an interface so that components such as an EntityProcessor or a Transformer
477              can act as an event listener.
478              (Noble Paul, shalin)
479
4805. SOLR-1027: Alias the 'dataimporter' namespace to a shorter name 'dih'.
481              (Noble Paul via shalin)
482
4836. SOLR-1084: Better error reporting when entity name is a reserved word and data-config.xml root node
484              is not <dataConfig>.
485              (Noble Paul via shalin)
486
4877. SOLR-1087: Deprecate 'where' attribute in CachedSqlEntityProcessor in favor of cacheKey and cacheLookup.
488              (Noble Paul via shalin)
489
4908. SOLR-969:  Change the FULL_DUMP, DELTA_DUMP, FIND_DELTA constants in Context to String.
491              Change Context.currentProcess() to return a string instead of an integer.
492              (Kay Kay, Noble Paul, shalin)
493
4949. SOLR-1120: Simplified EntityProcessor API by moving logic for applying transformers and handling multi-row outputs
495              from Transformers into an EntityProcessorWrapper class. The behavior of the method
496              EntityProcessor#destroy has been modified to be called once per parent-row at the end of row. A new
497              method EntityProcessor#close is added which is called at the end of import. A new method
498              Context#getResolvedEntityAttribute is added which returns the resolved value of an entity's attribute.
499              Introduced a DocWrapper which takes care of maintaining document level session variables.
500              (Noble Paul, shalin)
501
50210.SOLR-1265: Add variable resolving for URLDataSource properties like baseUrl.  (Chris Eldredge via ehatcher)
503
50411.SOLR-1269: Better error messages from JdbcDataSource when JDBC Driver name or SQL is incorrect.
505              (ehatcher, shalin)
506
507================== Release 1.3.0 ==================
508
509Status
510------
511This is the first release since DataImportHandler was added to the contrib solr distribution.
512The following changes list changes since the code was introduced, not since
513the first official release.
514
515
516Detailed Change List
517--------------------
518
519New Features
5201. SOLR-700:  Allow configurable locales through a locale attribute in fields for NumberFormatTransformer.
521              (Stefan Oestreicher, shalin)
522
523Changes in runtime behavior
524
525Bug Fixes
5261. SOLR-704:  NumberFormatTransformer can silently ignore part of the string while parsing. Now it tries to
527              use the complete string for parsing. Failure to do so will result in an exception.
528              (Stefan Oestreicher via shalin)
529
5302. SOLR-729:  Context.getDataSource(String) gives current entity's DataSource instance regardless of argument.
531              (Noble Paul, shalin)
532
5333. SOLR-726:  Jdbc Drivers and DataSources fail to load if placed in multicore sharedLib or core's lib directory.
534              (Walter Ferrara, Noble Paul, shalin)
535
536Other Changes
537
538
Note: See TracBrowser for help on using the repository browser.