Python - tutorial - 12/14

JSON

Revision:


Python - JSON

JSON is a syntax for storing and exchanging data. JSON is text, written with JavaScript object notation.

JSON in Python: Python has a built-in package called "json", which can be used to work with JSON data.

Example 1:mport the json module:

        import json
    

Parse JSON - convert from JSON to Python: if you have a JSON string, you can parse it by using the json.loads() method.The result will be a Python dictionary.

Example 2: convert from JSON to Python:

        import json
        # some JSON:
        x =  '{ "name":"John", "age":30, "city":"New York"}'
        # parse x:
        y = json.loads(x)
        # the result is a Python dictionary:
        print(y["age"])
    

Convert from Python to JSON: if you have a Python object, you can convert it into a JSON string by using the json.dumps() method.

Example 3: convert from Python to JSON:

        import json
        # a Python object (dict):
        x = {
        "name": "John",
        "age": 30,
        "city": "New York"
        }
        # convert into JSON:
        y = json.dumps(x)
        # the result is a JSON string:
        print(y) # {"name": "John", "age": 30, "city": "New York"}
    

You can convert Python objects of the following types, into JSON strings: dict, list, tuple, string, int, float, True, False, None.

Example 4: convert Python objects into JSON strings, and print the values:

        import json
        print(json.dumps({"name": "John", "age": 30})) # {"name": "John", "age": 30}
        print(json.dumps(["apple", "bananas"]))  # ["apple", "bananas"]
        print(json.dumps(("apple", "bananas"))) # ["apple", "bananas"]
        print(json.dumps("hello")) # "hello"
        print(json.dumps(42))       # 42
        print(json.dumps(31.76))  # 31.76
        print(json.dumps(True))   # true
        print(json.dumps(False))  # false
        print(json.dumps(None))   # null
    

When you convert from Python to JSON, Python objects are converted into the JSON (JavaScript) equivalent:

Python - JSON
dict - Object
list - Array
tuple - Array
str - String
int - Number
float - Number
True - true
False - false
None - null

Format the result: The json.dumps() method has parameters to make it easier to read the result: indent, separators. You can also define the separators, default value is (", ", ": "), which means using a comma and a space to separate each object, and a colon and a space to separate keys from values.

Example 5: use the "indent" parameter to define the numbers of indents:

        json.dumps(x, indent=4)
    

Example 6: use the separators parameter to change the default separator:

        json.dumps(x, indent=4, separators=(". ", " = "))
    

Order the result: The json.dumps() method has parameters to order the keys in the result: sort_keys.

Example 7: use the sort_keys parameter to specify if the result should be sorted or not:

        json.dumps(x, indent=4, sort_keys=True)
    


Python - RegEx

A RegEx, or regular expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

RegEx module: Python has a built-in package called "re", which can be used to work with Regular Expressions.

RegEx in Python: when you have imported the "re" module, you can start using regular expressions:

Example 8: search the string to see if it starts with "The" and ends with "Spain":

        import re
        txt = "The rain in Spain"
        x = re.search("^The.*Spain$", txt)
    

RegEx functions: the "re" module offers a set of functions that allows us to search a string for a match:

Function - Description
findall - Returns a list containing all matches
search - Returns a Match object if there is a match anywhere in the string
split - Returns a list where the string has been split at each match
sub - Replaces one or many matches with a string

Metacharacters: metacharacters are characters with a special meaning:

Character - Description - Example
[] - A set of characters - "[a-m]"
\ - Signals a special sequence (can also be used to escape special characters) - "\d"
. - Any character (except newline character) - "he..o"
^ - Starts with -"^hello"
$ - Ends with - "world$"
* - Zero or more occurrences - "aix*"
+ - One or more occurrences - "aix+"
{} - Exactly the specified number of occurrences - "al{2}"
| - Either or - "falls|stays"
() - Capture and group

Special sequences: a special sequence is a "\"" followed by one of the characters in the list below, and has a special meaning:

Character - Description - Example
\A - Returns a match if the specified characters are at the beginning of the string - "\AThe"
\b - Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string") - r"\bain"
- r"ain\b"
\B - Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
(the "r" in the beginning is making sure that the string is being treated as a "raw string") - r"\Bain"
- r"ain\B"
\d - Returns a match where the string contains digits (numbers from 0-9) - "\d"
\D - Returns a match where the string DOES NOT contain digits - "\D"
\s - Returns a match where the string contains a white space character - "\s"
\S - Returns a match where the string DOES NOT contain a white space character - "\S"
\w - Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) - "\w"
\W - Returns a match where the string DOES NOT contain any word characters - "\W"
\Z - Returns a match if the specified characters are at the end of the string - "Spain\Z"

Sets: a set is a set of characters inside a pair of square brackets [] with a special meaning:

Set - Description
[arn] - Returns a match where one of the specified characters (a, r, or n) are present
[a-n] - Returns a match for any lower case character, alphabetically between a and n
[^arn] - Returns a match for any character EXCEPT a, r, and n
[0123] - Returns a match where any of the specified digits (0, 1, 2, or 3) are present
[0-9] - Returns a match for any digit between 0 and 9
[0-5][0-9] - Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z] - Returns a match for any character alphabetically between a and z, lower case OR upper case
[+] - In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

The findall() function returns a list containing all matches.The list contains the matches in the order they are found. If no matches are found, an empty list is returned:

Example 9: print a list of all matches:

        import re
        #Return a list containing every occurrence of "ai":
        txt = "The rain in Spain"
        x = re.findall("ai", txt)
        y = re.findall("Portugal", txt)
        print(x) # ['ai', 'ai']
        print(y) # []
    

The search() function searches the string for a match, and returns a "Match object" if there is a match.If there is more than one match, only the first occurrence of the match will be returned. If no matches are found, the value "None" is returned.

Example 10: search for the first white-space character in the string:

        import re
        txt = "The rain in Spain"
        x = re.search("\s", txt)
        print("The first white-space character is located in position:", x.start()) # the first white-space character is located in position: 3
    

The split() function returns a list where the string has been split at each match. You can control the number of occurrences by specifying the maxsplit parameter.

Example 11: split at each white-space character:

        import re
        txt = "The rain in Spain"
        x = re.split("\s", txt)
        print(x) # ['The', 'rain', 'in', 'Spain']
    

Example 12: split the string only at the first occurrence:

            import re
            txt = "The rain in Spain"
            x = re.split("\s", txt, 1)
            print(x) # ['The', 'rain in Spain']
    

The sub() function replaces the matches with the text of your choice. You can control the number of replacements by specifying the "count" parameter:

Example 13: replace every white-space character with the number 9:

        import re
        txt = "The rain in Spain"
        x = re.sub("\s", "9", txt)
        print(x) # The9rain9in9Spain
    

Example 14: replace the first 2 occurrences:

        import re
        txt = "The rain in Spain"
        x = re.sub("\s", "9", txt, 2)
        print(x) # The9rain9in Spain
    

Match object: a match object is an object containing information about the search and the result.If there is no match, the value None will be returned, instead of the Match Object.

Example 15: do a search that will return a Match Object:

        import re
        txt = "The rain in Spain"
        x = re.search("ai", txt)
        print(x) #this will print an object 
    

The Match object has properties and methods used to retrieve information about the search, and the result:
.span() - returns a tuple containing the start-, and end positions of the match.
.string - returns the string passed into the function
.group() - returns the part of the string where there was a match

Example 16: print the position (start- and end-position) of the first match occurrence. The regular expression looks for any words that starts with an upper case "S":

        import re
        txt = "The rain in Spain"
        x = re.search(r"\bS\w+", txt)
        print(x.span()) # (12, 17)
    

Example 17: print the part of the string where there was a match. The regular expression looks for any words that starts with an upper case "S":

            import re
            txt = "The rain in Spain"
            x = re.search(r"\bS\w+", txt)
            print(x.group()) # Spain