source: sandbox/expresso-solr/solr/docs/api/doc-files/tutorial.html @ 7588

Revision 7588, 29.9 KB checked in by adir, 11 years ago (diff)

Ticket #000 - Adicionando a integracao de buscas com Solr na base a ser isnerida na comunidade

Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<!--
3 Licensed to the Apache Software Foundation (ASF) under one or more
4 contributor license agreements.  See the NOTICE file distributed with
5 this work for additional information regarding copyright ownership.
6 The ASF licenses this file to You under the Apache License, Version 2.0
7 (the "License"); you may not use this file except in compliance with
8 the License.  You may obtain a copy of the License at
9
10     http://www.apache.org/licenses/LICENSE-2.0
11
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an "AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License for the specific language governing permissions and
16 limitations under the License.
17-->
18<html>
19<head>
20<META http-equiv="Content-Type" content="text/html; charset=UTF-8" />
21<title>Solr Tutorial</title>
22<style>
23pre.code {
24  background-color: #D3D3D3;
25  padding: 0.2em;
26}
27.codefrag {
28  font-family: monospace;
29  font-weight:bold;
30}
31</style>
32
33</head>
34<body>
35
36<div id="content">
37<h1>Solr Tutorial</h1>
38
39<a name="N1000E"></a><a name="Overview"></a>
40<h2 class="boxed">Overview</h2>
41<div class="section">
42<p>
43This document covers the basics of running Solr using an example
44schema, and some sample data.
45</p>
46</div>
47
48
49<a name="N10018"></a><a name="Requirements"></a>
50<h2 class="boxed">Requirements</h2>
51<div class="section">
52<p>
53To follow along with this tutorial, you will need...
54</p>
55<ol>
56 
57<li>Java 1.6 or greater.  Some places you can get it are from
58  <a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">Oracle</a>,
59  <a href="http://openjdk.java.net/">Open JDK</a>, or
60  <a href="http://www.ibm.com/developerworks/java/jdk/">IBM</a>.
61  <ul>
62    <li>Running <span class="codefrag">java -version</span> at the command
63      line should indicate a version number starting with 1.6.
64    </li>
65    <li>Gnu's GCJ is not supported and does not work with Solr.</li>
66  </ul>
67</li>
68 
69<li>A <a href="http://lucene.apache.org/solr/mirrors-solr-latest-redir.html">Solr release</a>.
70  </li>
71
72</ol>
73</div>
74
75
76<a name="N10040"></a><a name="Getting+Started"></a>
77<h2 class="boxed">Getting Started</h2>
78<div class="section">
79<p>
80<strong>
81Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.
82</strong>
83</p>
84<p>
85Begin by unziping the Solr release and changing your working directory
86to be the "<span class="codefrag">example</span>" directory.  (Note that the base directory name may vary with the version of Solr downloaded.)  For example, with a shell in UNIX, Cygwin, or MacOS:
87</p>
88<pre class="code">
89user:~solr$ <strong>ls</strong>
90solr-nightly.zip
91user:~solr$ <strong>unzip -q solr-nightly.zip</strong>
92user:~solr$ <strong>cd solr-nightly/example/</strong>
93</pre>
94<p>
95Solr can run in any Java Servlet Container of your choice, but to simplify
96this tutorial, the example index includes a small installation of Jetty.
97</p>
98<p>
99To launch Jetty with the Solr WAR, and the example configs, just run the <span class="codefrag">start.jar</span> ...
100</p>
101<pre class="code">
102user:~/solr/example$ <strong>java -jar start.jar</strong>
1032012-06-06 15:25:59.815:INFO:oejs.Server:jetty-8.1.2.v20120308
1042012-06-06 15:25:59.834:INFO:oejdp.ScanningAppProvider:Deployment monitor .../solr/example/webapps at interval 0
1052012-06-06 15:25:59.839:INFO:oejd.DeploymentManager:Deployable added: .../solr/example/webapps/solr.war
106...
107Jun 6, 2012 3:26:03 PM org.apache.solr.core.SolrCore registerSearcher
108INFO: [collection1] Registered new searcher Searcher@7527e2ee main{StandardDirectoryReader(segments_1:1)}
109</pre>
110<p>
111This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.
112</p>
113<p>
114You can see that the Solr is running by loading <a href="http://localhost:8983/solr/">http://localhost:8983/solr/</a> in your web browser.  This is the main starting point for Administering Solr.
115</p>
116</div>
117
118
119
120
121<a name="N10078"></a><a name="Indexing+Data"></a>
122<h2 class="boxed">Indexing Data</h2>
123<div class="section">
124<p>
125Your Solr server is up and running, but it doesn't contain any data.  You can
126modify a Solr index by POSTing commands to Solr to add (or
127update) documents, delete documents, and commit pending adds and deletes. 
128These commands can be in a
129<a href="http://wiki.apache.org/solr/UpdateRequestHandler">variety of formats</a>.
130</p>
131<p>
132The <span class="codefrag">exampledocs</span> directory contains sample files
133showing of the types of commands Solr accepts, as well as a java utility
134for posting them from the command line (a <span class="codefrag">post.sh</span>
135shell script is also available, but for this tutorial we'll use the
136cross-platform Java client). 
137</p>
138<p> To try this, open a new terminal window, enter the exampledocs directory,
139and run "<span class="codefrag">java -jar post.jar</span>" on some of the XML
140files in that directory.
141</p>
142<pre class="code">
143user:~/solr/example/exampledocs$ <strong>java -jar post.jar solr.xml monitor.xml</strong>
144SimplePostTool: version 1.4
145SimplePostTool: POSTing files to http://localhost:8983/solr/update..
146SimplePostTool: POSTing file solr.xml
147SimplePostTool: POSTing file monitor.xml
148SimplePostTool: COMMITting Solr index changes..
149</pre>
150<p>
151You have now indexed two documents in Solr, and committed these changes. 
152You can now search for "solr" by loading the <a href="http://localhost:8983/solr/#/collection1/query">"Query" tab</a> in the Admin interface, and entering "solr" in the "q" text box.  Clicking the "Execute Query" button should display the following URL containing one result...
153</p>
154<p>
155<a href="http://localhost:8983/solr/collection1/select?q=solr&amp;wt=xml">http://localhost:8983/solr/collection1/select?q=solr&amp;wt=xml</a>
156
157</p>
158<p>
159You can index all of the sample data, using the following command
160(assuming your command line shell supports the *.xml notation):
161</p>
162<pre class="code">
163user:~/solr/example/exampledocs$ <strong>java -jar post.jar *.xml</strong>
164SimplePostTool: version 1.4
165SimplePostTool: POSTing files to http://localhost:8983/solr/update..
166SimplePostTool: POSTing file gb18030-example.xml
167SimplePostTool: POSTing file hd.xml
168SimplePostTool: POSTing file ipod_other.xml
169SimplePostTool: POSTing file ipod_video.xml
170...
171SimplePostTool: POSTing file solr.xml
172SimplePostTool: POSTing file utf8-example.xml
173SimplePostTool: POSTing file vidcard.xml
174SimplePostTool: COMMITting Solr index changes..
175</pre>
176<p>
177  ...and now you can search for all sorts of things using the default <a href="http://wiki.apache.org/solr/SolrQuerySyntax">Solr Query Syntax</a> (a superset of the Lucene query syntax)...
178</p>
179<ul>
180 
181<li>
182<a href="http://localhost:8983/solr/#/collection1/query?q=video">video</a>
183</li>
184 
185<li>
186<a href="http://localhost:8983/solr/#/collection1/query?q=name:video">name:video</a>
187</li>
188 
189<li>
190<a href="http://localhost:8983/solr/#/collection1/query?q=%2Bvideo%20%2Bprice%3A[*%20TO%20400]">+video +price:[* TO 400]</a>
191</li>
192
193
194</ul>
195<p></p>
196<p>
197  There are many other different ways to import your data into Solr... one can
198</p>
199<ul>
200 
201<li>Import records from a database using the
202    <a href="http://wiki.apache.org/solr/DataImportHandler">Data Import Handler (DIH)</a>.
203  </li>
204 
205<li>
206<a href="http://wiki.apache.org/solr/UpdateCSV">Load a CSV file</a> (comma separated values),
207   including those exported by Excel or MySQL.
208  </li>
209 
210<li>
211<a href="http://wiki.apache.org/solr/UpdateJSON">POST JSON documents</a>
212 
213</li>
214 
215<li>Index binary documents such as Word and PDF with
216    <a href="http://wiki.apache.org/solr/ExtractingRequestHandler">Solr Cell</a> (ExtractingRequestHandler).
217  </li>
218 
219<li>
220    Use <a href="http://wiki.apache.org/solr/Solrj">SolrJ</a> for Java or other Solr clients to
221    programatically create documents to send to Solr.
222  </li>
223
224
225</ul>
226</div>
227
228
229
230
231<a name="N100EE"></a><a name="Updating+Data"></a>
232<h2 class="boxed">Updating Data</h2>
233<div class="section">
234<p>
235You may have noticed that even though the file <span class="codefrag">solr.xml</span> has now
236been POSTed to the server twice, you still only get 1 result when searching for
237"solr".  This is because the example <span class="codefrag">schema.xml</span> specifies a "<span class="codefrag">uniqueKey</span>" field
238called "<span class="codefrag">id</span>".  Whenever you POST commands to Solr to add a
239document with the same value for the <span class="codefrag">uniqueKey</span> as an existing document, it
240automatically replaces it for you.  You can see that that has happened by
241looking at the values for <span class="codefrag">numDocs</span> and <span class="codefrag">maxDoc</span> in the
242"CORE"/searcher section of the statistics page...  </p>
243<p>
244
245<a href="http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher">http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher</a>
246
247</p>
248<p>
249 
250<strong><span class="codefrag">numDocs</span></strong> represents the number of searchable documents in the
251  index (and will be larger than the number of XML files since some files
252  contained more than one <span class="codefrag">&lt;doc&gt;</span>). <strong><span class="codefrag">maxDoc</span></strong>
253  may be larger as the <span class="codefrag">maxDoc</span> count includes logically deleted documents that
254  have not yet been removed from the index. You can re-post the sample XML
255  files over and over again as much as you want and <span class="codefrag">numDocs</span> will never
256  increase, because the new documents will constantly be replacing the old.
257</p>
258<p>
259Go ahead and edit the existing XML files to change some of the data, and re-run
260the <span class="codefrag">java -jar post.jar</span> command, you'll see your changes reflected
261in subsequent searches.
262</p>
263<a name="N1012D"></a><a name="Deleting+Data"></a>
264<h3 class="boxed">Deleting Data</h3>
265
266<p>
267You can delete data by POSTing a delete command to the update URL and
268specifying the value of the document's unique key field, or a query that
269matches multiple documents (be careful with that one!).  Since these commands
270are smaller, we will specify them right on the command line rather than
271reference an XML file.
272</p>
273
274<p>Execute the following command to delete a specific document</p>
275<pre class="code">java -Ddata=args -Dcommit=false -jar post.jar "&lt;delete&gt;&lt;id&gt;SP2514N&lt;/id&gt;&lt;/delete&gt;"</pre>
276
277<p>
278Because we have specified "commit=false", a search for <a href="http://localhost:8983/solr/#/collection1/query?q=id:SP2514N">id:SP2514N</a> we still find the document we have deleted.  Since the example configuration uses Solr's "autoCommit" feature Solr will still automatically persist this change to the index, but it will not affect search results until an "openSearcher" commit is explicitly executed.
279</p>
280
281<p>
282Using the <a href="http://localhost:8983/solr/#/collection1/plugins/updatehandler?entry=updateHandler">statistics page</a>
283for the <span class="codefrag">updateHandler</span> you can observe this delete
284propogate to disk by watching the <span class="codefrag">deletesById</span>
285value drop to 0 as the <span class="codefrag">cumulative_deletesById</span>
286and <span class="codefrag">autocommit</span> values increase.
287</p>
288
289<p>
290Here is an example of using delete-by-query to delete anything with
291<a href="http://localhost:8983/solr/collection1/select?q=name:DDR&amp;fl=name">DDR</a> in the name:
292</p>
293<pre class="code">java -Dcommit=false -Ddata=args -jar post.jar "&lt;delete&gt;&lt;query&gt;name:DDR&lt;/query&gt;&lt;/delete&gt;"</pre>
294
295<p>
296You can force a new searcher to be opened to reflect these changes by sending a commit command to Solr (which post.jar does for you by default):
297</p>
298<pre class="code">java -jar post.jar</pre>
299
300<p>
301Now re-execute <a href="http://localhost:8983/solr/#/collection1/query?q=id:SP2514N">the previous search</a>
302and verify that no matching documents are found.  You can also revisit the
303statistics page and observe the changes to both the number of commits in the <a href="http://localhost:8983/solr/#/collection1/plugins/updatehandler?entry=updateHandler">updateHandler</a> and the numDocs in the <a href="http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher">searcher</a>.
304</p>
305
306<p>
307Commits that open a new searcher can be expensive operations so it's best to
308make many changes to an index in a batch and then send the
309<span class="codefrag">commit</span> command at the end. 
310There is also an <span class="codefrag">optimize</span> command that does the
311same things as <span class="codefrag">commit</span>, but also forces all index
312segments to be merged into a single segment -- this can be very resource
313intsenive, but may be worthwhile for improving search speed if your index
314changes very infrequently.
315</p>
316<p>
317All of the update commands can be specified using either <a href="http://wiki.apache.org/solr/UpdateXmlMessages">XML</a> or <a href="http://wiki.apache.org/solr/UpdateJSON">JSON</a>.
318</p>
319
320<p>To continue with the tutorial, re-add any documents you may have deleted by going to the <span class="codefrag">exampledocs</span> directory and executing</p>
321<pre class="code">java -jar post.jar *.xml</pre>
322</div>
323
324
325<a name="N1017C"></a><a name="Querying+Data"></a>
326<h2 class="boxed">Querying Data</h2>
327<div class="section">
328<p>
329    Searches are done via HTTP GET on the <span class="codefrag">select</span> URL with the query string in the <span class="codefrag">q</span> parameter.
330    You can pass a number of optional <a href="http://wiki.apache.org/solr/SearchHandler">request parameters</a>
331    to the request handler to control what information is returned.  For example, you can use the "<span class="codefrag">fl</span>" parameter
332    to control what stored fields are returned, and if the relevancy score is returned:
333  </p>
334<ul>
335     
336<li>
337<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;fl=name,id">q=video&amp;fl=name,id</a>       (return only name and id fields)   </li>
338     
339<li>
340<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;fl=name,id,score">q=video&amp;fl=name,id,score</a>  (return relevancy score as well) </li>
341     
342<li>
343<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;fl=*,score">q=video&amp;fl=*,score</a>        (return all stored fields, as well as relevancy score)  </li>
344     
345<li>
346<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=price desc&amp;fl=name,id,price">q=video&amp;sort=price desc&amp;fl=name,id,price</a>  (add sort specification: sort by price descending) </li>
347     
348<li>
349<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;wt=json">q=video&amp;wt=json</a> (return response in JSON format)  </li>
350   
351</ul>
352<p>
353The <a href="http://localhost:8983/solr/#/collection1/query">query form</a>
354provided in the web admin interface allows setting various request parameters
355and is useful when testing or debugging queries.
356</p>
357
358<a name="N101BA"></a><a name="Sorting"></a>
359<h3 class="boxed">Sorting</h3>
360<p>
361      Solr provides a simple method to sort on one or more indexed fields.
362      Use the "<span class="codefrag">sort</span>' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field:
363    </p>
364<ul>
365     
366<li>
367<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=price+desc">q=video&amp;sort=price desc</a>
368</li>
369     
370<li>
371<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=price+asc">q=video&amp;sort=price asc</a>
372</li>
373     
374<li>
375<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=inStock+asc,price+desc">q=video&amp;sort=inStock asc, price desc</a>
376</li>
377   
378</ul>
379<p>
380      "<span class="codefrag">score</span>" can also be used as a field name when specifying a sort:
381    </p>
382<ul>
383     
384<li>
385<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=score+desc">q=video&amp;sort=score desc</a>
386</li>
387     
388<li>
389<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=video&amp;sort=inStock+asc,score+desc">q=video&amp;sort=inStock asc, score desc</a>
390</li>
391   
392</ul>
393<p>
394      Complex functions may also be used to sort results:
395    </p>
396<ul>
397     
398<li>
399<a href="http://localhost:8983/solr/collection1/select/?indent=on&amp;q=*:*&amp;sort=div(popularity,add(price,1))+desc">q=video&amp;sort=div(popularity,add(price,1)) desc</a>
400</li>
401   
402</ul>
403<p>
404      If no sort is specified, the default is <span class="codefrag">score desc</span> to return the matches having the highest relevancy.
405    </p>
406</div>
407
408
409
410<a name="N101FE"></a><a name="Highlighting"></a>
411<h2 class="boxed">Highlighting</h2>
412<div class="section">
413<p>
414    Hit highlighting returns relevent snippets of each returned document, and highlights
415    terms from the query within those context snippets.
416  </p>
417<p>
418    The following example searches for <span class="codefrag">video card</span> and requests
419    highlighting on the fields <span class="codefrag">name,features</span>.  This causes a
420    <span class="codefrag">highlighting</span> section to be added to the response with the
421    words to highlight surrounded with <span class="codefrag">&lt;em&gt;</span> (for emphasis)
422    tags.
423  </p>
424<p>
425   
426<a href="http://localhost:8983/solr/collection1/select/?wt=json&amp;indent=on&amp;q=video+card&amp;fl=name,id&amp;hl=true&amp;hl.fl=name,features">...&amp;q=video card&amp;fl=name,id&amp;hl=true&amp;hl.fl=name,features</a>
427 
428</p>
429<p>
430    More request parameters related to controlling highlighting may be found
431    <a href="http://wiki.apache.org/solr/HighlightingParameters">here</a>.
432  </p>
433</div> <!-- highlighting -->
434
435
436
437<a name="N10227"></a><a name="Faceted+Search"></a>
438<h2 class="boxed">Faceted Search</h2>
439<div class="section">
440<p>
441    Faceted search takes the documents matched by a query and generates counts for various
442    properties or categories.  Links are usually provided that allows users to "drill down" or
443    refine their search results based on the returned categories.
444  </p>
445<p>
446    The following example searches for all documents (<span class="codefrag">*:*</span>) and
447    requests counts by the category field <span class="codefrag">cat</span>.
448  </p>
449<p>
450   
451<a href="http://localhost:8983/solr/collection1/select/?wt=json&amp;indent=on&amp;q=*:*&amp;fl=name&amp;facet=true&amp;facet.field=cat">...&amp;q=*:*&amp;facet=true&amp;facet.field=cat</a>
452 
453</p>
454<p>
455    Notice that although only the first 10 documents are returned in the results list,
456    the facet counts generated are for the complete set of documents that match the query.
457  </p>
458<p>
459    We can facet multiple ways at the same time.  The following example adds a facet on the
460    boolean <span class="codefrag">inStock</span> field:
461  </p>
462<p>
463   
464<a href="http://localhost:8983/solr/collection1/select/?wt=json&amp;indent=on&amp;q=*:*&amp;fl=name&amp;facet=true&amp;facet.field=cat&amp;facet.field=inStock">...&amp;q=*:*&amp;facet=true&amp;facet.field=cat&amp;facet.field=inStock</a>
465 
466</p>
467<p>
468    Solr can also generate counts for arbitrary queries. The following example
469    queries for <span class="codefrag">ipod</span> and shows prices below and above 100 by using
470    range queries on the price field.
471  </p>
472<p>
473   
474<a href="http://localhost:8983/solr/collection1/select/?wt=json&amp;indent=on&amp;q=ipod&amp;fl=name&amp;facet=true&amp;facet.query=price:[0+TO+100]&amp;facet.query=price:[100+TO+*]">...&amp;q=ipod&amp;facet=true&amp;facet.query=price:[0 TO 100]&amp;facet.query=price:[100 TO *]</a>
475 
476</p>
477<p>
478    Solr can even facet by numeric ranges (including dates).  This example requests counts for the manufacture date (<span class="codefrag">manufacturedate_dt</span> field) for each year between 2004 and 2010.
479  </p>
480<p>
481   
482<a href="http://localhost:8983/solr/collection1/select/?wt=json&amp;indent=on&amp;q=*:*&amp;fl=name,manufacturedate_dt&amp;facet=true&amp;facet.range=manufacturedate_dt&amp;facet.range.start=2004-01-01T00:00:00Z&amp;facet.range.end=2010-01-01T00:00:00Z&amp;facet.range.gap=%2b1YEAR">...&amp;q=*:*&amp;facet=true&amp;facet.rage=manufacturedate_dt&amp;facet.rage.start=2004-01-01T00:00:00Z&amp;facet.rage.end=2010-01-01T00:00:00Z&amp;facet.range.gap=+1YEAR</a>
483 
484</p>
485<p>
486    More information on faceted search may be found on the
487    <a href="http://wiki.apache.org/solr/SolrFacetingOverview">faceting overview</a>
488    and
489    <a href="http://wiki.apache.org/solr/SimpleFacetParameters">faceting parameters</a>
490    pages.
491  </p>
492</div> <!-- faceted search -->
493
494
495
496<a name="N10278"></a><a name="Search+UI"></a>
497<h2 class="boxed">Search UI</h2>
498<div class="section">
499<p>
500Solr includes an example search interface built with <a href="https://wiki.apache.org/solr/VelocityResponseWriter">velocity templating</a>
501that demonstrates many features, including searching, faceting, highlighting,
502autocomplete, and geospatial searching.
503</p>
504<p>
505Try it out at
506<a href="http://localhost:8983/solr/collection1/browse">http://localhost:8983/solr/collection1/browse</a>
507 
508</p>
509</div> <!-- Search UI -->
510
511
512
513
514<a name="N1028B"></a><a name="Text+Analysis"></a>
515<h2 class="boxed">Text Analysis</h2>
516<div class="section">
517<p>
518    Text fields are typically indexed by breaking the text into words and applying various transformations such as
519    lowercasing, removing plurals, or stemming to increase relevancy.  The same text transformations are normally
520    applied to any queries in order to match what is indexed.
521  </p>
522<p>
523    The <a href="http://wiki.apache.org/solr/SchemaXml">schema</a> defines
524    the fields in the index and what type of analysis is applied to them.  The current schema your collection is using
525    may be viewed directly via the <a href="http://localhost:8983/solr/#/collection1/schema">Schema tab</a> in the Admin UI, or explored dynamicly using the <a href="http://localhost:8983/solr/#/collection1/schema-browser">Schema Browser tab</a>.
526</p>
527<p>
528The best analysis components (tokenization and filtering) for your textual
529content depends heavily on language.
530As you can see in the <a href="http://localhost:8983/solr/#/collection1/schema-browser?type=text_general">Schema Browser</a>,
531many of the fields in the example schema are using a
532<span class="codefrag">fieldType</span> named
533<span class="codefrag">text_general</span>, which has defaults appropriate for
534most languages.
535</p>
536
537<p>
538  If you know your textual content is English, as is the case for the example
539  documents in this tutorial, and you'd like to apply English-specific stemming
540  and stop word removal, as well as split compound words, you can use the
541  <a href="http://localhost:8983/solr/#/collection1/schema-browser?type=text_en_splitting"><span class="codefrag">text_en_splitting</span> fieldType</a> instead.
542  Go ahead and edit the <span class="codefrag">schema.xml</span> in the
543  <span class="codefrag">solr/example/solr/conf</span> directory,
544  to use the <span class="codefrag">text_en_splitting</span> fieldType for
545  the <span class="codefrag">text</span> and
546  <span class="codefrag">features</span> fields like so:
547</p>
548<pre class="code">
549   &lt;field name="features" <b>type="text_en_splitting"</b> indexed="true" stored="true" multiValued="true"/&gt;
550   ...
551   &lt;field name="text" <b>type="text_en_splitting"</b> indexed="true" stored="false" multiValued="true"/&gt;
552</pre>
553<p>
554  Stop and restart Solr after making these changes and then re-post all of
555  the example documents using
556  <span class="codefrag">java -jar post.jar *.xml</span>. 
557  Now queries like the ones listed below will demonstrate English-specific
558  transformations:
559  </p>
560<ul>
561   
562<li>A search for
563  <a href="http://localhost:8983/solr/collection1/select?q=power-shot&amp;fl=name">power-shot</a>
564  can match <span class="codefrag">PowerShot</span>, and
565  <a href="http://localhost:8983/solr/collection1/select?q=adata&amp;fl=name">adata</a>
566  can match <span class="codefrag">A-DATA</span> by using the
567  <span class="codefrag">WordDelimiterFilter</span> and <span class="codefrag">LowerCaseFilter</span>.
568</li>
569
570   
571<li>A search for
572  <a href="http://localhost:8983/solr/collection1/select?q=features:recharging&amp;fl=name,features">features:recharging</a>
573  can match <span class="codefrag">Rechargeable</span> using the stemming
574  features of <span class="codefrag">PorterStemFilter</span>.
575</li>
576
577   
578<li>A search for
579  <a href="http://localhost:8983/solr/collection1/select?q=%221 gigabyte%22&amp;fl=name">"1 gigabyte"</a>
580  can match <span class="codefrag">1GB</span>, and the commonly misspelled
581  <a href="http://localhost:8983/solr/collection1/select?q=pixima&amp;fl=name">pixima</a> can matches <span class="codefrag">Pixma</span> using the
582  <span class="codefrag">SynonymFilter</span>.
583</li>
584
585 
586</ul>
587<p>A full description of the analysis components, Analyzers, Tokenizers, and TokenFilters
588    available for use is <a href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters">here</a>.
589  </p>
590<a name="N1030B"></a><a name="Analysis+Debugging"></a>
591
592
593<h3 class="boxed">Analysis Debugging</h3>
594<p>
595There is a handy <a href="http://localhost:8983/solr/#/collection1/analysis">Analysis tab</a>
596where you can see how a text value is broken down into words by both Index time nad Query time analysis chains for a field or field type.  This page shows the resulting tokens after they pass through each filter in the chains.
597</p>
598<p>
599  <a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=&analysis.fieldtype=text_en_splitting">This url</a>
600  shows the tokens created from
601  "<span class="codefrag">Canon Power-Shot SD500</span>"
602  using the
603  <span class="codefrag">text_en_splitting</span> type.  Each section of
604  the table shows the resulting tokens after having passed through the next
605  <span class="codefrag">TokenFilter</span> in the (Index) analyzer.
606  Notice how both <span class="codefrag">powershot</span> and
607  <span class="codefrag">power</span>, <span class="codefrag">shot</span>
608  are indexed, using tokens that have the same "position".
609  (Compare the previous output with
610  <a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=&analysis.fieldtype=text_general">The tokens produced using the text_general field type</a>.)
611</p>
612
613<p>
614Mousing over the section label to the left of the section will display the full name of the analyzer component at that stage of the chain.  Toggling the "Verbose Output" checkbox will show/hide the detailed token attributes.
615</p>
616<p>
617When both <a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=power+shot+sd-500&analysis.fieldtype=text_en_splitting">Index and Query</a>
618values are provided, two tables will be displayed side by side showing the
619results of each chain.  Terms in the Index chain results that are equivilent
620to the final terms produced by the Query chain will be highlighted.
621</p>
622<p>
623  Other interesting examples:
624</p>
625<ul>
626  <li><a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Four+score+and+seven+years+ago+our+fathers+brought+forth+on+this+continent+a+new+nation%2C+conceived+in+liberty+and+dedicated+to+the+proposition+that+all+men+are+created+equal.%0A&analysis.query=liberties+and+equality&analysis.fieldtype=text_en">English stemming and stop-words</a>
627    using the <span class="codefrag">text_en</span> field type
628  </li>
629  <li><a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_cjk&analysis.fieldvalue=%EF%BD%B6%EF%BE%80%EF%BD%B6%EF%BE%85&analysis.query=%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A">Half-width katakana normalization with bi-graming</a>
630    using the <span class="codefrag">text_cjk</span> field type
631  </li>
632  <li><a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_ja&analysis.fieldvalue=%E7%A7%81%E3%81%AF%E5%88%B6%E9%99%90%E3%82%B9%E3%83%94%E3%83%BC%E3%83%89%E3%82%92%E8%B6%85%E3%81%88%E3%82%8B%E3%80%82">Japanese morphological decomposition with part-of-speech filtering</a>
633    using the <span class="codefrag">text_ja</span> field type
634  </li>
635  <li><a href="http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_ar&analysis.fieldvalue=%D9%84%D8%A7+%D8%A3%D8%AA%D9%83%D9%84%D9%85+%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9
636">Arabic stop-words, normalization, and stemming</a>
637    using the <span class="codefrag">text_ar</span> field type
638  </li>
639</ul>
640
641</div>
642
643
644<a name="N1034D"></a><a name="Conclusion"></a>
645<h2 class="boxed">Conclusion</h2>
646<div class="section">
647<p>
648  Congratulations!  You successfully ran a small Solr instance, added some
649  documents, and made changes to the index and schema.  You learned about queries, text
650  analysis, and the Solr admin interface.  You're ready to start using Solr on
651  your own project!  Continue on with the following steps:
652</p>
653<ul>
654 
655<li>Subscribe to the Solr <a href="http://lucene.apache.org/solr/discussion.html">mailing lists</a>!</li>
656 
657<li>Make a copy of the Solr <span class="codefrag">example</span> directory as a template for your project.</li>
658 
659<li>Customize the schema and other config in <span class="codefrag">solr/conf/</span> to meet your needs.</li>
660
661</ul>
662<p>
663  Solr has a ton of other features that we haven't touched on here, including
664  <a href="http://wiki.apache.org/solr/DistributedSearch">distributed search</a>
665  to handle huge document collections,
666  <a href="http://wiki.apache.org/solr/FunctionQuery">function queries</a>,
667  <a href="http://wiki.apache.org/solr/StatsComponent">numeric field statistics</a>,
668  and
669  <a href="http://wiki.apache.org/solr/ClusteringComponent">search results clustering</a>.
670  Explore the <a href="http://wiki.apache.org/solr/FrontPage">Solr Wiki</a> to find
671  more details about Solr's many <a href="http://lucene.apache.org/solr/features.html">features</a>.
672</p>
673<p>
674  Have Fun, and we'll see you on the Solr mailing lists!
675</p>
676</div>
677
678</div>
679
680<div class="clearboth">&nbsp;</div>
681
682<div id="footer">
683<div class="copyright">
684        Copyright &copy;
685         2012 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
686</div>
687</div>
688</body>
689</html>
Note: See TracBrowser for help on using the repository browser.