Skip to content
rootpath
Share
Explore
root

icon picker
Garden

DataFrames

The DataFrame is like a super data structure, holding other data structures inside of it.

Construction

Think of some data that you can put into sequences of containers (a.k.a. rows of columns)
3 x 4 dataset:
1








2








3








There are no rows in this table
DataFrames are made from rows and columns. The rows and columns have a list-like structure, and can be
constructed with lists. Create an empty DataFrame, and give it a column and a list. Then, create a another, and
add it to the first.
Code DataFrame: sql, my project,
1
garden = root.garden
2
quark['type']=[moonflower1, moonflower2]
type
0 moonflower1
1 moonflower2
2
3
quarkFeatures = pd.DataFrame(
columns=['charge', 'bucket'])

quarkFeatures['charge'] = [1,-1]
quarkFeatures['bucket'] = ['fermion', 'fermion']
charge bucket
0 1 fermion
1 -1 fermion
4
quarkFeatures['type']=quark
charge bucket type
0 1.0 fermion up
1 -1.0 fermion down
2 0.5 fermion beauty
There are no rows in this table

One of the coolest things about DataFrames are that they evolved to work with real data. In fact, you can
give Pandas a data file, and it will construct for you a DataFrame. Let's make a data file, and then load it up.
Load a CSV:
1
quanta = pd.read_csv('~/data/particles.csv')
examples in nature quanta charge type
0 sunlight photon 0.00 boson
1 lightning electron -1.00 fermion
2 matter up quark 0.66 fermion
3 matter down quark -0.33 fermion
4 moonlight photon 0.00 boson
2
quanta = pd.read_csv('~/data/particles.csv')
examples in nature quanta charge type
0 sunlight photon 0.00 boson
1 lightning electron -1.00 fermion
2 matter up quark 0.66 fermion
3 matter down quark -0.33 fermion
4 moonlight photon 0.00 boson
There are no rows in this table

Editing the DataFrame

Your DataFrame can be grouped, or reshuffled.
1
quantaTypeXCharge =
quanta.groupby(['charge'])[['quanta','type']].max()
quanta type
charge
-1.00 electron fermion
-0.33 down quark fermion
0.00 photon boson
0.66 up quark fermion
There are no rows in this table
If you need to get only subsets of data, you can slice them.
1
proton.loc[1:2]
proton.loc[1:2]
2
cols = ['down', 'beauty']

print(proton.loc[1:2, cols])
cols = ['down', 'beauty']

print(proton.loc[1:2, cols])
There are no rows in this table

Plotting Data

Once your data is loaded into the frame, you can see it, by plotting it.
write line, scatter, and box plots:
1
line
cluster = pd.read_csv('~/data/cluster1.csv')
cluster.plot(kind= 'line', x='d', y='sc p')
plt.show()
2
scatter
cluster = pd.read_csv('~/data/cluster1.csv')
cluster.plot(kind= 'scatter', x='d', y='sc p')
3
box
cluster = pd.read_csv('~/data/cluster1.csv')
cluster.plot(kind= 'box', subplots=False, x='d', y=['sc p', 'cc p'])
There are no rows in this table

Merging DataFrames

Merging DataFrames is accomplished through stacking , joining, and merging
We can stack DataFrames as rows by appending. Append two Series (1-D DataFrames):
1
neutrinos = pd.Series(['electron neutrino',
'muon neutrino',
'tau neutrino'])

leptons = leptons.append(neutrinos)
print(leptons)
0 electron
1 muon
2 tau
0 electron neutrino
1 muon neutrino
2 tau neutrino
There are no rows in this table
We can stack both DataFrames vertically and horizontally by specifying the axis by concatenating.
1
fermions = pd.read_csv('~/data/fermions.csv')
bosons = pd.read_csv('~/data/bosons.csv')

buckets = [fermions, bosons]
print(pd.concat(buckets, axis=1))

quanta type quanta type
0 electron fermion photon boson
1 up quark fermion NaN NaN
2 down quark fermion NaN NaN
There are no rows in this table
Stack a multi-indexed DataFrames with concatenation, by adding keys, as well as axis, like so:
1
vertical =
pd.concat(buckets, keys=['fermions','bosons'], axis=0)
print(vertical)

horizontal =
pd.concat(buckets, keys=['fermions','bosons'], axis=0)
print(horizontal)
quanta type
fermions 0 electron fermion
1 up quark fermion
2 down quark fermion
bosons 0 photon boson

fermions bosons
quanta type quanta type
0 electron fermion photon boson
1 up quark fermion NaN NaN
2 down quark fermion NaN NaN
There are no rows in this table
Inner and outer joins are another way to merge DataFrames. You give a key and axis, and specify the join.
You can also join multiple merged DataFrames.
1
joinedBucket = [fermions,bosons]

joinedBucket = pd.concat(joinedBucket, keys=['type', 'charge'],axis=1, join='inner')

print(joinedBucket)

type charge
quanta type quanta type
0 electron fermion photon boson
1 up quark fermion NaN NaN
2 down quark fermion NaN NaN
There are no rows in this table

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.