FAQ
I have a list of Pandas Dataframes that I am attempting to combine using the concatenation function.


dataframe_lists = [df1, df2, df3]


result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], ignore_index=True)


The full traceback that I receive when I execute this function is:


---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-198-a30c57d465d0> in <module>()
----> 1 result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], ignore_index=True)
       2 check(dataframe_lists)


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
     753 verify_integrity=verify_integrity,
     754 copy=copy)
--> 755 return op.get_result()
     756
     757


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py in get_result(self)
     924
     925 new_data = concatenate_block_managers(
--> 926 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
     927 if not self.copy:
     928 new_data._consolidate_inplace()


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
    4061 copy=copy),
    4062 placement=placement)
-> 4063 for placement, join_units in concat_plan]
    4064
    4065 return BlockManager(blocks, axes)


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
    4061 copy=copy),
    4062 placement=placement)
-> 4063 for placement, join_units in concat_plan]
    4064
    4065 return BlockManager(blocks, axes)


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in concatenate_join_units(join_units, concat_axis, copy)
    4150 raise AssertionError("Concatenating join units along axis0")
    4151
-> 4152 empty_dtype, upcasted_na = get_empty_dtype_and_na(join_units)
    4153
    4154 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,


C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in get_empty_dtype_and_na(join_units)
    4139 return np.dtype('m8[ns]'), tslib.iNaT
    4140 else: # pragma
-> 4141 raise AssertionError("invalid dtype determination in get_concat_dtype")
    4142
    4143


AssertionError: invalid dtype determination in get_concat_dtype




I believe that the error lies in the fact that one of the data frames is empty. As a temporary workaround this rather perplexing error. I used the simple function check to verify and return just the headers of the empty dataframe:


def check(list_of_df):


     headers = []
     for df in dataframe_lists:
         if df.empty is not True:
             continue
         else:
             headers.append(df.columns)


     return headers


I am wondering if it is possible to use this function to, if in the case of an empty dataframe, return just that empty dataframe's headers and append it to the concatenated dataframe. The output would be a single row for the headers (and, in the case of a repeating column name, just a single instance of the header (as in the case of the concatenation function). I have two sample data sources, one and two non-empty data sets.


df1: https://gist.github.com/ahlusar1989/42708e6a3ca0aed9b79b
df2 :https://gist.github.com/ahlusar1989/26eb4ce1578e0844eb82


Here is an empty dataframe.




df3 (empty dataframe): https://gist.github.com/ahlusar1989/0721bd8b71416b54eccd


I would like to have the resulting concatenate have the column headers (with their values) that reflects df1 and df2...


'AT','AccountNum', 'AcctType', 'Amount', 'City', 'Comment', 'Country','DuplicateAddressFlag', 'FromAccount', 'FromAccountNum', 'FromAccountT','PN', 'PriorCity', 'PriorCountry', 'PriorState', 'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 'State', 'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 'ToAccountT', 'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 'WA','WC', 'Zip'


as follows:


'A', 'AT','AccountNum', 'AcctType', 'Amount', 'B', 'C', 'City', 'Comment', 'Country', 'D', 'DuplicateAddressFlag', 'E', 'F' 'FromAccount', 'FromAccountNum', 'FromAccountT', 'G', 'PN', 'PriorCity', 'PriorCountry', 'PriorState', 'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 'State', 'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 'ToAccountT', 'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 'WA','WC', 'Zip'


I welcome any feedback on how to best do this. Thank you.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 9, '15 at 9:15p
activeSep 9, '15 at 9:15p
posts1
users1
websitepython.org

1 user in discussion

Kbtyo: 1 post

People

Translate

site design / logo © 2019 Grokbase