Files

Pandas can import data from local computer or from an URL into a DataFrame object. It can read various formats like CSV, HTML, JSON, etc. It can also export a DataFrame object to CSV or other formats.

File formats that can be read in/write to

# File formats that can be read in/write to

# Format Type     Data Description      Reader           Writer
# text                  CSV            read_csv          to_csv
# text                 JSON            read_json         to_json
# text                 HTML            read_html         to_html
# text             Local clipboard  read_clipboard     to_clipboard
# binary             MS Excel          read_excel        to_excel
# binary            HDF5 Format        read_hdf           to_hdf
# binary           Feather Format     read_feather      to_feather
# binary              Msgpack         read_msgpack      to_msgpack
# binary               Stata           read_stata        to_stata
# binary                SAS             read_sas
# binary        Python Pickle Format   read_pickle       to_pickle
# binary           Parquet Form        read_parquet       to_parquet
# SQL                   SQL             read_sql          to_sql
# SQL             Google Big Query      read_gbq          to_gbq

Read csv file

df = pd.read_csv('local_path/file.csv')

Read csv file with | as fields separator

df = pd.read_csv('local_path/file.csv', sep='|')

Read only 10 rows of csv file

pd.read_csv('local_path/file.csv', nrows=10)

Read csv file and skip first 3 rows

df = pd.read_csv('local_path/file.csv', skiprows=3)

Read csv file and parse/treat columns as dates

df = pd.read_csv('local_path/file.csv',
                 parse_dates=['column_x_date', 'column_y_date'])

Read csv file while specifying column names

df = pd.read_csv('local_path/file.csv', names=['column_1', 'column_2'])

Read csv file and set the index as column_id

df = pd.read_csv('local_path/file.csv', index_col='column_id')

Read csv file and set the index as multi columns

df = pd.read_csv('local_path/file.csv', index_col=['column_id1', 'column_id2'])

Read csv file and set “.” as missing values

df = pd.read_csv('local_path/file.csv', na_values=['.'])

Read csv file and set different missing values on columns

# `column_x` missing values to be set as `nope`
# `column_y` missing values to be set as `.`
missings = {'column_x': ['nope'], 'column_y': ['.']}
df = pd.read_csv('local_path/file.csv', na_values=missings)

Read csv file and prevent pandas from interpreting “NA” as NaN in a string

# The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND',
# '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL',
# 'null', 'NaN', '-NaN', 'nan', '-nan', ''].
df = pd.read_csv('local_path/file.csv', keep_default_na=False)

Read csv file and ignore missing value markers

# improve performance of reading a large file in data without any NAs
df = pd.read_csv('local_path/file.csv', header=0, na_filter=False)

Read csv file and change the data type of a column

pd.read_csv('local_path/file.csv', dtype={'column_x': float})

Write csv file

df.to_csv('local_path/file.csv')

Write csv file with | as fields separator

df = df.to_csv('local_path/file.csv', sep='|')

Write csv file but not save the DataFrame index

df.to_csv('local_path/file.csv', index=False)

Write csv file but not save the DataFrame header

df.to_csv('local_path/file.csv', header=False)

Write csv file but only specific columns

df.to_csv('local_path/file.csv', columns=['column_x', 'column_y'])

Write csv file; don’t use quotes and use backslash as escape character

df.to_csv('local_path/file.csv', quoting=csv.QUOTE_NONE, escapechar='\\')

Append DataFrame to csv file

df.to_csv('local_path/file.csv', mode='a')