Sheet: Formatting¶
Previous section has assumed the data is in the format that you want. In reality, you have to manipulate the data types a bit to suit your needs. Hence, formatters comes into the scene. The formatters take effect when the data is read on the fly. They do not affect the persistence of the data in the excel files. A row or column formatter can be applied to mutilpe rows/columns. There are two ways of applying a formatter:
- use
add_formatter()
,remove_formatter()
andclear_formatter()
to apply formatter on the fly. The formatter takes effect when a cell value is read. In other words, the sheet content is intact until you callfreeze_formatters()
to apply all added formatters. - use
format()
to apply formatter immediately.
There is slightly different behavior between csv reader and xls reader. The cell type of the cells read by csv reader will be always text while the cell types read by xls reader vary.
Convert a column of numbers to strings¶
By default, all values in csv are read back as texts. However, for xls, xlsx and xlsm files, differnt data type are supported. Numbers are always read as float. Therefore, if you should like to have them in string format, you need to do some conversions. Suppose you have the following data in any of the supported excel formats:
userid | name |
---|---|
10120 | Adam |
10121 | Bella |
10122 | Cedar |
Let’s read it out first:
>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet.column["userid"]
[10120, 10121, 10122]
As you can see, userid column is of float type. Next, let’s convert the column to string format:
>>> sheet.column.format(0, str)
>>> sheet.column["userid"]
['10120', '10121', '10122']
Now, they are in string format.
You can do this row by row as well using RowFormatter
or do this to a whote spread sheet using SheetFormatter
Cleanse the cells in a spread sheet¶
Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some field had weird characters, such as “ ”:
Version | Comments | Author |
---|---|---|
v0.0.1 | Release versions | Eda |
v0.0.2 | Useful updates | Freud |
First, let’s read the content and see what do we have:
>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> sheet.to_array()
[[' Version', ' Comments', ' Author '], [' v0.0.1 ', ' Release versions', ' Eda'], [' v0.0.2 ', 'Useful updates ', ' Freud']]
Now try to create a custom cleanse function:
>>> def cleanse_func(v):
... v = v.replace(" ", "")
... v = v.rstrip().strip()
... return v
...
Then let’s create a SheetFormatter
and apply it:
>>> sf = pyexcel.formatters.SheetFormatter(cleanse_func)
>>> sheet.add_formatter(sf)
>>> sheet.to_array()
[['Version', 'Comments', 'Author'], ['v0.0.1', 'Release versions', 'Eda'], ['v0.0.2', 'Useful updates', 'Freud']]
So in the end, you get this:
Version | Comments | Author |
---|---|---|
v0.0.1 | Release versions | Eda |
v0.0.2 | Useful updates | Freud |