How to integrate an index from a WebSphere Portal collection into Solr v4.3.1


For this section you must have running Solr in WAS and configured the Solr work directory (see parent topic)

Related information: Configuring Solr

Creating a new collection in WebSphere Portal

  1. At the global portal level in the Portal Admin interface. Go to Manage Search -> Seacrh Collections
  2. Create a New Collection

    New Collection

    Search service: Default Search Service
    Location of Collection:  In my case C:\Solar Collection1. A good way to know what you should insert here is check the value of the Default Search Collection
    Name of Collection:  C:\Solar Collection
    Description of Collection:  This collection will be used to integrate into Solr application
    Specify Collection Language:  in my case English (Unated States)
    Select Summarizer : Automatic

  3. Click in the Solar Collection

  4. Create a New Content Source

    New Content Source

    For the "Collect documents linked from this URL" we are going to use the Seedlist REST service that IBM® Web Content Manager API for retrieving application content through a seedlist is based on the REST architecture style more info: Seedlist

    I'm going to use the CTC Demo content replace


    Content source type:  WCM Site
    Content Source Name:  CTC Demo Content
    Collect documents linked from this URL:
    In my case
    [portaldomain] is the name of the machine in the local network, if you are not sure what that is you can check the "Default Search Collection" SeedList
    Stop fetching a document after (sec):  1800

  5. Click in the Security tab and create a new security realm

    User Name  anUser
    Password  pass
    Host Name optional
    Realm optional

  6. Click Create
  7. Click in the "play" button and the collection will start to be created, you can click in Rfresh button to see the progress

Configuring Solr to use the an index from a WebSphere Portal collection

  1. Stop the solrEAR application in your WebSphere Administration console to unlock the Solr work directory
  2. Browse to your Solr work directory in my case "C:\solr"
  3. Open the solr.xml file, if you didn't modified this file before you can replace the content by the followiing code, in the other case you have to add the new core with the name "CTCDemo"

    <?xml version="1.0" encoding="UTF-8" ?>
    <solr persistent="true">
      <cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" zkClientTimeout="${zkClientTimeout:15000}">
        <core name="collection1" instanceDir="collection1" />


    <?xml version="1.0" encoding="UTF-8" ?>
    <solr sharedLib="lib" persistent="true">
      <logging enabled="true">
      	<watcher size="100" threshold="INFO" />
      <cores adminPath="/admin/cores" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" host="${host:}" zkClientTimeout="${zkClientTimeout:15000}">
        <core default="true" instanceDir="collection1" name="collection1"/>
        <core default="false" instanceDir="CTCDemo" name="CTCDemo"/>
  4. Save the file and make a copy of the folder collection1 at the same label and chage the name with out blank spaces in my case "CTCDemo"
  5. Go to ./CTC+Demo/conf/, open the solrconfig.xml and change the value of "dataDir" section for the path of the collection that we created en the previious section en my case is "C:\My Collection" and save the file




    <dataDir>C:\My Collection</dataDir>
  6. Go to ./CTC+Demo/conf/, open the schema.xml file and replace all the "field" tag by the following:

       <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
       <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
       <field name="name" type="text_general" indexed="true" stored="true"/>
       <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
       <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
       <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
       <field name="weight" type="float" indexed="true" stored="true"/>
       <field name="price"  type="float" indexed="true" stored="true"/>
       <field name="popularity" type="int" indexed="true" stored="true" />
       <field name="inStock" type="boolean" indexed="true" stored="true" />
       <field name="store" type="location" indexed="true" stored="true"/>
       <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
       <field name="subject" type="text_general" indexed="true" stored="true"/>
       <field name="description" type="text_general" indexed="true" stored="true"/>
       <field name="comments" type="text_general" indexed="true" stored="true"/>
       <field name="author" type="text_general" indexed="true" stored="true"/>
       <field name="keywords" type="text_general" indexed="true" stored="true"/>
       <field name="category" type="text_general" indexed="true" stored="true"/>
       <field name="resourcename" type="text_general" indexed="true" stored="true"/>
       <field name="url" type="text_general" indexed="true" stored="true"/>
       <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="last_modified" type="date" indexed="true" stored="true"/>
       <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
       <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
       <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>
       <field name="manu_exact" type="string" indexed="true" stored="false"/>
       <field name="payloads" type="payloads" indexed="true" stored="true"/>
       <field name="_version_" type="long" indexed="true" stored="true"/>


       <field name="docid" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
       <field name="author_info" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="title" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="summary" type="string" indexed="true" stored="true"/>
       <field name="contentPath" type="string" indexed="true" stored="true"/>
       <field name="creation_date" type="string" indexed="true" stored="true"/>
       <field name="author" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="keywords" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="name" type="string" indexed="true" stored="true"/>
       <field name="update_date" type="string" indexed="true" stored="true"/>
       <field name="content_path" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="last_modified" type="string" indexed="true" stored="true"/>
       <field name="display_uri" type="string" indexed="true" stored="true" multiValued="true"/>
       <field name="content" type="string" indexed="false" stored="true" multiValued="true"/>
       <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
       <field name="_version_" type="long" indexed="true" stored="true"/>
  7. Go to ./CTC+Demo/conf/, open the schema.xml file and replace the value of the "uniqueKey" section by "docid" value



  8. Go to ./CTC+Demo/conf/, open the schema.xml file and replace all the "copyField" tag by the following:

       <copyField source="cat" dest="text"/>
       <copyField source="name" dest="text"/>
       <copyField source="manu" dest="text"/>
       <copyField source="features" dest="text"/>
       <copyField source="includes" dest="text"/>
       <copyField source="manu" dest="manu_exact"/>
       <copyField source="price" dest="price_c"/>
       <copyField source="title" dest="text"/>
       <copyField source="author" dest="text"/>
       <copyField source="description" dest="text"/>
       <copyField source="keywords" dest="text"/>
       <copyField source="content" dest="text"/>
       <copyField source="content_type" dest="text"/>
       <copyField source="resourcename" dest="text"/>
       <copyField source="url" dest="text"/>
       <copyField source="author" dest="author_s"/>


       <copyField source="title" dest="text"/>
       <copyField source="author" dest="text"/>
       <copyField source="summary" dest="text"/>
       <copyField source="keywords" dest="text"/>
       <copyField source="category" dest="text"/>
       <copyField source="content" dest="text"/>
       <copyField source="display_uri" dest="text"/>
       <copyField source="author" dest="text"/>
  9. Save and close all the text editors to unlock the files and start the application solrEAR in the WebSphere Administration console
  10. Go to http://yourportaldomain:port/solr
  11. Click in the left drop down and select the CTCDemo index 

About the author

Marco Balderas

Marco is a skilled web developer that provides consultancy, solution design and PM techniques - most specifically for J2EE Web content and portals. He has had experience with over five years in Web, desktop and mobile development. Lately, he has been responsible for provide and develop Web content solutions that require the use of his experience, knowledge and his very good understanding of technology trend. Has part of his assignments in all his work history Marco use to understand and solve clients day to day issues, feed back on project time line, also to establish change agreements, define and re-define scope. His industry exposure includes: oil, financial services, stock management, education, and retail.