Wednesday, December 7, 2011

I don't "get" JSON Output in Pentaho Data Integration (PDI) / Kettle

I don't understand how to use the JSON Output step properly in Kettle aka Pentaho Data Integration (PDI).

With "Nr of rows in a bloc" set to 0 or 3, I got:

{
  "categories": [
    {
      "code": "WORKAHOLIC-CHIC"
    },
    {
      "name": "Workaholic Chic"
    },
    {
      "description": "Move ! Move ! move...!!\nLight Up Your Day... with a perfect match, \nPadanan busana kerja Professsional look, Powerfull & Fashionable,\nwhich got several design for different mood,  multifunction,\nMemorable style!\nLet’s be and stay Tuneeca...\n"
    }
  ]
}

which is basically only the last record.

With "Nr of rows in a bloc" set to 1, I got:

{
  "categories": [
    {
      "code": "AKSESORI-LIGHT-UP-YOUR-DAY"
    },
    {
      "name": "Aksesori Light Up Your Day"
    },
    {
      "description": "-"
    },
    {
      "code": "AKSESORIS-APRIL-2009"
    },
...

What I'm trying to get is:

{ "categories": [
  { "code": "AKSESORI-LIGHT-UP-YOUR-DAY",
     "name": "Aksesori Light Up Your Day"
     "description": "Very cool" },
  { "code": .........

Contrast this with the XML Output, which I get the following correct output right from first try:

<?xml version="1.0" encoding="UTF-8"?>
<categories>
  <category>
    <code>AKSESORI-LIGHT-UP-YOUR-DAY</code>
    <name>Aksesori Light Up Your Day</name>
    <description>-</description>
  </category>
  <category>
    <code>AKSESORIS-APRIL-2009</code>
...

An additional plus is that XML Output already performs a bit of output pretty formatting, which I appreciate very much. (JSON Output outputs everything in a single line)

Those two Output steps gets the same input data.

Any ideas ?