Wednesday, September 23, 2015

elasticsearch, reindex using logstash

Well, I'm not an elasticsearch expert, not at all. And, as usual, my english sucks.

Let's say that you have indexed a bunch of stuff, and now you need to change an index type.
For instance, in my case, I've seen that a filed named "hostname" was splitted if it was containing a dash, like "tc-pi.pacs.mydomain" was splitted in two parts when creating graphs using kibana.
The solution, in order to avoid this hostname splitting, is to define "index" : "not_analyzed" in logstash mapping.

Well, reading around it is not possible to change mappings once the document were indexed.

So a solution, a workaround, thanks to this post is the following.

"Download" the old index

curl -XGET 'http://127.0.0.1:9200/dcmaudit/_mappings/'

Copy the result in a text editor for your convenience, then change the mapping, like

...
"hostname":{"type":"string", "index" : "not_analyzed"}
...

Create a new index:

curl -XPOST http://localhost:9200/dcmaudit2 -d '{"mappings":{"logs":{"properties":{"@timestamp":{"type":"date","format":"dateOptionalTime"},"@version":{"type":"string"},"ParticipantObjectIdentification2.ParticipantObjectTypeCode.displayName":{"type":"string", "index" : "not_analyzed"},"hostname":{"type":"string", "index" : "not_analyzed"},"message":{"type":"string"},"tags":{"type":"string"},"timestamp":{"type":"date","format":"dateOptionalTime"}}}}}'

Now let's create a logstash configuration file like this:

input {
  # We read from the "old" index
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "dcmaudit"
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

filter {
  mutate {
    remove_field => [ "@timestamp", "@version" ]
  }
}

output {

elasticsearch {
    host => "localhost"
    port => "9200"
    protocol => "http"
    index => "dcmaudit2"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }

      stdout {
        codec => rubydebug
      }
}

Launch logstash

./bin/logstash -f conf.json

Now all the stuff from one index (dcmaudit) will be copied to the new one (dcmaudit2).

At this point you can delete the old index.

curl -XDELETE localhost:9200/dcmaudit

If you want, and if you need it, you can run this task again, recreating the old index name (dcmaudit) but whit the new mapping, and then repeat the logstash task changing the input and the output index accordingly.