In this tutorial, you will see:
- What JSON is and how to deal with it in Python
- How to parse JSON in Python with the json module
- If json is the best option for JSON parsing
An Introduction to JSON in Python
Before digging into JSON parsing with Python, let’s understand what JSON is and how to use it in Python.
What Is JSON?
JSON, short for JavaScript Object Notation, is a lightweight data-interchange format. It is simple for humans to read and write and easy for machines to parse and generate. This makes it one of the most popular data formats. Specifically, JSON has become the “language of the web” because it is commonly used for transmitting data between servers and web applications via APIs.
Here is an example of JSON:
{
"name": "Maria Smith",
"age": 32,
"isMarried": true,
"hobbies": ["reading", "jogging"],
"address": {
"street": "123 Main St",
"city": "San Francisco",
"state": "CA",
"zip": "12345"
},
"phoneNumbers": [
{
"type": "home",
"number": "555-555-1234"
},
{
"type": "work",
"number": "555-555-5678"
}
],
"notes": null
}
As you can see, JSON consists of key-value pairs. Each key is a string and each value can be a string, number, boolean, null, array, or object. Even though it is similar to a JavaScript object, JSON can be used with any programming language, including Python.
How to Deal With JSON in Python
Python natively supports JSON through the json module, which is part of the Python Standard Library. This means that you do not need to install any additional library to work with JSON in Python. You can import json as follows:
import json
The built-in Python json
library exposes a complete API to deal with JSON. In particular, it has two key functions: loads
and load
. The loads
function allows you to parse JSON data from a string. Note that despite its name appearing to be plural, the ending “s” stands for “string.” So, it should be read as “load-s.” On the other hand, the load
function is for parsing JSON data into bytes.
Through those two methods, json
gives you the ability to convert JSON data to equivalent Python objects like dictionaries and lists, and vice versa. Plus, the json
module allows you to create custom encoders and decoders to handle specific data types.
Keep reading and find out how to use the json
library to parse JSON data in Python!
Parsing JSON Data With Python
Let’s take a look at some real-world examples and learn how to parse JSON data from different sources into different Python data structures.
Converting a JSON String to a Python Dictionary
Assume that you have some JSON data stored in a string and you want to convert it to a Python dictionary. This is what the JSON data looks like:
{
"name": "iPear 23",
"colors": ["black", "white", "red", "blue"],
"price": 999.99,
"inStock": true
}
And this is its string representation in Python:
smartphone_json = '{"name": "iPear 23", "colors": ["black", "white", "red", "blue"], "price": 999.99, "inStock": true}'
Consider using the Python triple quotes convention to store long multi-line JSON strings.
You can verify that smartphone
contains a valid Python string with the line below:
print(type(smartphone))
This will print:
<class 'str'>
str
stands for “string” and means that the smartphone variable has the text sequence type.
Parse the JSON string contained in smartphone into a Python dictionary with the json.loads() method as follows:
import json
# JSON string
smartphone_json = '{"name": "iPear 23", "colors": ["black", "white", "red", "blue"], "price": 999.99, "inStock": true}'
# from JSON string to Python dict
smartphone_dict = json.loads(smartphone_json)
# verify the type of the resulting variable
print(type(smartphone_dict)) # dict
If you run this snippet, you would get:
{"name": "John", "surname": "Williams", "age": 48, "city": "New York"}
Fantastic! smartphone_dict
now contains a valid Python dictionary!
Thus, all you have to do to convert a JSON string to a Python dictionary is to pass a valid JSON string to json.loads()
You can now access the resulting dictionary fields as usual:
product = smartphone_dict['name'] # smartphone
priced = smartphone['price'] # 999.99
colors = smartphone['colors'] # ['black', 'white', 'red', 'blue']
Keep in mind that the json.loads()
function will not always return a dictionary. Specifically, the returning data type depends on the input string. For example, if the JSON string contains a flat value, it will be converted to the correspective Python primitive value:
import json
json_string = '15.5'
float_var = json.loads(json_string)
print(type(float_var)) # <class 'float'>
Similarly, a JSON string containing an array list will become a Python list:
import json
json_string = '[1, 2, 3]'
list_var = json.loads(json_string)
print(json_string) # <class 'list'>
Take a look at the conversion table below to see how JSON values are converted to Python data by json
:
JSON Value | Python Data |
string |
str |
number (integer) |
int |
number (real) |
float |
true |
True |
false |
False |
null |
None |
array |
list |
object |
dict |
Transforming a JSON API Response Into a Python Dictionary
Consider that you need to make an API and convert its JSON response to a Python dictionary. In the example below, we will call the following API endpoint from the {JSON} Placeholder project to get some fake JSON data:
https://jsonplaceholder.typicode.com/todos/1
That RESTFul API returns the JSON response below:
{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
}
You can call that API with the urllib
module from the Standard Library and convert the resulting JSON to a Python dictionary as follows:
import urllib.request
import json
url = "https://jsonplaceholder.typicode.com/todos/1"
with urllib.request.urlopen(url) as response:
body_json = response.read()
body_dict = json.loads(body_json)
user_id = body_dict['userId'] # 1
urllib.request.urlopen()
peforms the API call and returns an HTTPResponse
object. Its read()
method is then used to get the response body body_json, which contains the API response as a JSON string. Finally, that string can be parsed into a Python dictionary through json.loads()
as explained earlier.
Similarly, you can achieve the same result with requests:
import requests
import json
url = "https://jsonplaceholder.typicode.com/todos/1"
response = requests.get(url)
body_dict = response.json()
user_id = body_dict['userId'] # 1
Note that the .json()
method automatically transforms the response object containing JSON data into the respective Python data structure.
Great! You now know how to parse a JSON API response in Python with both urllib
and requests
.
Loading a JSON File Into a Python Dictionary
Suppose you have some JSON data stored in a smartphone.json
file as below:
{
"name": "iPear 23",
"colors": ["black", "white", "red", "blue"],
"price": 999.99,
"inStock": true,
"dimensions": {
"width": 2.82,
"height": 5.78,
"depth": 0.30
},
"features": [
"5G",
"HD display",
"Dual camera"
]
}
Your goal is to read the JSON file and load it into a Python dictionary. Achieve that with the snippet below:
import json
with open('smartphone.json') as file:
smartphone_dict = json.load(file)
print(type(smartphone_dict)) # <class 'dict'>
features = smartphone_dict['features'] # ['5G', 'HD display', 'Dual camera']
The built-in open()
library allows you to load a file and get its corresponding file object. The json.read()
method then deserializes the text file or binary file containing a JSON document to the equivalent Python object. In this case, smartphone.json
becomes a Python dictionary.
Perfect, parsing a JSON file in Python takes only a few lines of code!
From JSON Data to Custom Python Object
Now, you want to parse some JSON data into a custom Python class. This is what your custom Smartphone
Python class looks like:
class Smartphone:
def __init__(self, name, colors, price, in_stock):
self.name = name
self.colors = colors
self.price = price
self.in_stock = in_stock
Here, the goal is to convert the following JSON string to a Smartphone
instance:
{
"name": "iPear 23 Plus",
"colors": ["black", "white", "gold"],
"price": 1299.99,
"inStock": false
}
To accomplish this task, you need to create a custom decoder. In detail, you have to extend the JSONDecoder
class and set the object_hook
parameter in the __init__
method. Assign it with the name of the class method containing the custom parsing logic. In that parsing method, you can use the values contained in the standard dictionary returned by json.read()
to instantiate a Smartphone
object.
Define a custom SmartphoneDecoder
as below:
import json
class SmartphoneDecoder(json.JSONDecoder):
def __init__(self, object_hook=None, *args, **kwargs):
# set the custom object_hook method
super().__init__(object_hook=self.object_hook, *args, **kwargs)
# class method containing the
# custom parsing logic
def object_hook(self, json_dict):
new_smartphone = Smartphone(
json_dict.get('name'),
json_dict.get('colors'),
json_dict.get('price'),
json_dict.get('inStock'),
)
return new_smartphone
Note that you should use the get()
method to read the dictionary values within the custom object_hook()
method. This will ensure that no KeyError
s are raised if a key is missing from the dictionary. Instaed, None
values will be returned.
You can now pass the SmartphoneDecoder
class to the cls
parameter in json.loads()
to convert a JSON string to a Smartphone
object:
import json
# class Smartphone:
# ...
# class SmartphoneDecoder(json.JSONDecoder):
# ...
smartphone_json = '{"name": "iPear 23 Plus", "colors": ["black", "white", "gold"], "price": 1299.99, "inStock": false}'
smartphone = json.loads(smartphone_json, cls=SmartphoneDecoder)
print(type(smartphone)) # <class '__main__.Smartphone'>
name = smartphone.name # iPear 23 Plus
Similarly, you can use SmartphoneDecoder
with json.load()
:
smartphone = json.load(smartphone_json_file, cls=SmartphoneDecoder)
Et voilà! You now know how to parse JSON data into custom Python objects!
Python Data to JSON
You can also go the other way around and convert Python data structures and primitives to JSON. This is possible thanks to the json.dump()
and json.dumps()
functions, which follows the conversion table below:
Python Data | JSON Value |
str |
string |
int |
number (integer) |
float |
number (real) |
True |
true |
False |
false |
None |
null |
list |
array |
dict |
object |
Null |
None |
json.dump()
allows you to write a JSON string to a file, as in the following example:
import json
user_dict = {
"name": "John",
"surname": "Williams",
"age": 48,
"city": "New York"
}
# serializing the sample dictionary to a JSON file
with open("user.json", "w") as json_file:
json.dump(user_dict, json_file)
This snippet will serialize the Python user_dict
variable into the user.json
file.
Similarly, json.dumps()
converts a Python variable to its equivalent JSON string:
import json
user_dict = {
"name": "John",
"surname": "Williams",
"age": 48,
"city": "New York"
}
user_json_string = json.dumps(user_dict)
print(user_json_string)
Run this snippet and you will get:
{"name": "John", "surname": "Williams", "age": 48, "city": "New York"}
This is exactly the JSON representation of the Python dict.
Note that you can also specify a custom encoder, but showing how to do it is not the purpose of this article. Follow the official documentation to learn more.
Is the json
Standard Module the Best Resource for Parsing JSON in Python?
As is true in general for data parsing, JSON parsing comes with challenges that cannot be overlooked. For example, in case of invalid, broken, or non-standard JSON, the Python json
module would fall short.
Also, you need to be careful when parsing JSON data from untrusted sources. This is because a malicious JSON string can cause your parser to break or consume a large amount of resources. This is just one of the challenges a Python JSON parser should take into account.
You could introduce custom logic to deal with these particular cases. At the same time, that might take too long and result in complex and unreliable code. For this reason, you should consider a commercial tool that makes JSON parsing easier, such as Web Scraper API.
Web Scraping API is specifically designed for developers and comes with a wide range of features to parse JSON content and more. This tool can save you tons of time and help you secure your JSON parsing process. Also, it comes with Bright Data’s unblocking proxy capabilities to call JSON APIs anonymously.
If you are in hurry, you might also be interested in our Data as a Service offer. Through this service, you can ask Bright Data to provide you with a custom dataset that fits your specific needs. Bright Data will take care of everything, from performance to data quality.
Parsing JSON data has never been easier!
Conclusion
Python enables you to natively parse JSON data through the json
standard module. This exposes a powerful API to serialize and deserialize JSON content. Specifically, it offers the json.read()
and json.reads()
methods to deal with JSON files and JSON strings, respectively. Here, you saw how to use them to parse JSON data in Python in several real-world examples. At the same time, you also understood the limitations of this approach. This is why you may want to try a cutting-edge, fully-featured, commercial solution for data parsing, such as Bright Data’s data and proxy products.
No credit card required