Go - Parsing JSON Data Streams Into Structs

发布时间 2023-10-03 11:28:49作者: ZhangZhihuiAAA

Problem: You want to parse JSON data from a stream.


Solution: Create structs to contain the JSON data. Create a decoder using NewDecoder in the encoding/json package, then call Decode on the decoder to decode data into the structs.

 

In this first JSON file you have an array of three JSON objects (part of the data is truncated to make it easier to read):

[{ 
"name" :   "Luke  Skywalker" , 
"height" :   "172" , 
"mass" :   "77" , 
"hair_color" :   "blond" , 
"skin_color" :   "fair" , 
"eye_color" :   "blue" , 
"birth_year" :   "19BBY" , 
"gender" :   "male" 
}, 
{ 
"name" :   "C - 3PO" , 
"height" :   "167" , 
"mass" :   "75" , 
"hair_color" :   "n/a" , 
"skin_color" :   "gold" , 
"eye_color" :   "yellow" , 
"birth_year" :   "112BBY" , 
"gender" :   "n/a" 
}, 
{ 
"name" :   "R2 - D2" , 
"height" :   "96" , 
"mass" :   "32" , 
"hair_color" :   "n/a" , 
"skin_color" :   "white,  blue" , 
"eye_color" :   "red" , 
"birth_year" :   "33BBY" , 
"gender" :   "n/a" 
}]

To read this, you can use Unmarshal by unmarshalling into an array of Person structs:

func   unmarshalStructArray ()   ( people   [] Person )   { 
      file ,   err   :=   os . Open ( "people.json" ) 
      if   err   !=   nil   { 
          log . Println ( "Error  opening  json  file:" ,   err ) 
      } 
      defer   file . Close () 

      data ,   err   :=   io . ReadAll ( file ) 
     if   err   !=   nil   { 
          log . Println ( "Error  reading  json  data:" ,   err ) 
      } 

      err   =   json . Unmarshal ( data ,   & people ) 
      if   err   !=   nil   { 
          log . Println ( "Error  unmarshalling  json  data:" ,   err ) 
      } 
      return 
}

This will result in an output like this:

[]json.Person{
    {
        Name:       "Luke  Skywalker",
        Height:     "172",
        Mass:       "77",
        HairColor:  "blond",
        SkinColor:  "fair",
        EyeColor:   "blue",
        BirthYear:  "19BBY",
        Gender:     "male",
        Homeworld:  "",
        Films:      nil,
        Species:    nil,
        Vehicles:   nil,
        Starships:  nil,
        Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        URL:        "",
    },
    {
        Name:       "C - 3PO",
        Height:     "167",
        Mass:       "75",
        HairColor:  "n/a",
        SkinColor:  "gold",
        EyeColor:   "yellow",
        BirthYear:  "112BBY",
        Gender:     "n/a",
        Homeworld:  "",
        Films:      nil,
        Species:    nil,
        Vehicles:   nil,
        Starships:  nil,
        Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        URL:        "",
    },
    {
        Name:       "R2 - D2",
        Height:     "96",
        Mass:       "32",
        HairColor:  "n/a",
        SkinColor:  "white,  blue",
        EyeColor:   "red",
        BirthYear:  "33BBY",
        Gender:     "n/a",
        Homeworld:  "",
        Films:      nil,
        Species:    nil,
        Vehicles:   nil,
        Starships:  nil,
        Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
        URL:        "",
    },
}

This is an array of Person structs, which you get after unmarshalling a single JSON array.

 

However, when you get a stream of JSON objects, this is no longer possible. Here is another JSON file, one that is representative of a JSON data stream:

{ 
"name" :   "Luke  Skywalker" , 
"height" :   "172" , 
"mass" :   "77" , 
"hair_color" :   "blond" , 
"skin_color" :   "fair" , 
"eye_color" :   "blue" , 
"birth_year" :   "19BBY" , 
"gender" :   "male" 
} 
{ 
"name" :   "C - 3PO" , 
"height" :   "167" , 
"mass" :   "75" , 
"hair_color" :   "n/a" , 
"skin_color" :   "gold" , 
"eye_color" :   "yellow" , 
"birth_year" :   "112BBY" , 
"gender" :   "n/a" 
} 
{ 
"name" :   "R2 - D2" , 
"height" :   "96" , 
"mass" :   "32" , 
"hair_color" :   "n/a" , 
"skin_color" :   "white,  blue" , 
"eye_color" :   "red" , 
"birth_year" :   "33BBY" , 
"gender" :   "n/a" 
}

Notice that this is not a single JSON object but three consecutive JSON objects. This is no longer a valid JSON file, but it’s something you can get when you read the Body of a http.Response struct. If you try to read this using Unmarshal you will get an error:

Error  unmarshalling  json  data:  invalid  character  '{'  after  top - level  value

However, you can parse it by decoding it using a Decoder:

func   decode ( p   chan   Person )   { 
      file ,   err   :=   os . Open ( "people_stream.json" ) 
      if   err   !=   nil   { 
          log . Println ( "Error  opening  json  file:" ,   err ) 
      } 
      defer   file . Close () 

      decoder   :=   json . NewDecoder ( file ) 
      for   { 
          var   person   Person 
          err   =   decoder . Decode ( & person ) 
          if   err   ==   io . EOF   { 
              break 
          } 
          if   err   !=   nil   { 
              log . Println ( "Error  decoding  json  data:" ,   err ) 
              break 
          } 
          p   < -   person 
      } 
      close ( p ) 
}

First, you create a decoder using json.NewDecoder and passing it the reader, in this case, it’s the file you read from. Then while you’re looping in the for loop, you call Decode on the decoder, passing it the struct you want to store the data in. If all goes well, every time it loops, a new Person struct instance is created from the data. You can use the data then. If there is no more data in the reader, i.e., you hit io.EOF , you’ll break from the for loop.
In the case of the preceding code, you pass in a channel, in which you store the Person struct instance in every loop. When you’re done reading all the JSON in the file, you’ll close the channel:

func   main ()   { 
      p   :=   make ( chan   Person ) 
      go   decode ( p ) 
      for   { 
          person ,   ok   :=   < - p 
          if   ok   { 
              fmt . Printf ( "%#  v\n" ,   pretty . Formatter ( person )) 
          }   else   { 
              break 
          } 
      } 
}

Here’s the output from the code:

json.Person{
    Name:       "Luke  Skywalker",
    Height:     "172",
    Mass:       "77",
    HairColor:  "blond",
    SkinColor:  "fair",
    EyeColor:   "blue",
    BirthYear:  "19BBY",
    Gender:     "male",
    Homeworld:  "",
    Films:      nil,
    Species:    nil,
    Vehicles:   nil,
    Starships:  nil,
    Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    URL:        "",
}
json.Person{
    Name:       "C - 3PO",
    Height:     "167",
    Mass:       "75",
    HairColor:  "n/a",
    SkinColor:  "gold",
    EyeColor:   "yellow",
    BirthYear:  "112BBY",
    Gender:     "n/a",
    Homeworld:  "",
    Films:      nil,
    Species:    nil,
    Vehicles:   nil,
    Starships:  nil,
    Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    URL:        "",
}
json.Person{
    Name:       "R2 - D2",
    Height:     "96",
    Mass:       "32",
    HairColor:  "n/a",
    SkinColor:  "white,  blue",
    EyeColor:   "red",
    BirthYear:  "33BBY",
    Gender:     "n/a",
    Homeworld:  "",
    Films:      nil,
    Species:    nil,
    Vehicles:   nil,
    Starships:  nil,
    Created:    time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    Edited:     time.Date(1,  time.January,  1,  0,  0,  0,  0,  time.UTC),
    URL:        "",
}

You can see that three Person structs are being printed here, one after another, as opposed to the earlier one that was an array of Person structs.

 

A question that sometimes arises is when should you use Unmarshal and when should you use Decode ?
Unmarshal is easier to use for a single JSON object, but it won’t work when you have a stream of them coming in from a reader. Also, its simplicity means it’s not as flexible; you just get the whole JSON data at a go.
Decode , on the other hand, works well for both single JSON objects and streaming JSON data. Also, with Decode you can do stuff with the JSON at a finer level without needing to get the entire JSON data out first. This is because you can inspect the JSON as it comes in, even at a token level. The only slight drawback is that it is more verbose.
In addition, Decode is a bit faster.