Placeholder Image

字幕列表 影片播放

  • Hi and welcome to this data science news special!

  • As a data science professional or at least an enthusiast, you probably have Pandas in

  • your heart - Python’s primary library for data analysis and manipulation.

  • Okay.

  • What you may not have heard already is that Pandas 1.0.0 was officially released!

  • Although at first sight this latest version is not much different for the user than the

  • previous release starting with a 0: 0.25.3, there are plenty of enhanced features that

  • boost performance and lay a better foundation in the long run.

  • They represent 1.0.0 as a stable version of pandas with a strengthened API, which has

  • also been cleaned of many prior version deprecations.

  • Here are the most notable improvements that come with 1.0.0.

  • One.

  • The dedicated string and Boolean data types These features are stillexperimental”,

  • which means that further improvements are expected to happen in the near future.

  • So, as of yet, pandas will not automatically assignstringorboolto your

  • data.

  • This can only happen if you explicitly specify dtype=”stringor dtype=”boolwhile

  • creating a new structure.

  • However, in the future, this may become the default way in which pandas treats data of

  • this type.

  • Well just have to wait and see.

  • Also, you must consider the benefit of having the newstringdata type.

  • For example, until now, pandas would treat a date value and a string value asobject”.

  • Usingstringallows you to distinguish between the two, so now you can select and

  • manipulate string data much more easily.

  • Which leads us to the second point worth mentioning.

  • Two.

  • The .select_dtypes() method is much quicker now!

  • It relies on vectorization instead of iterating over a loop.

  • So, you can run .select_dtypes(“string”) to pull all string values, or .select_dtypes(“bool”)

  • to retrieve the Boolean data from a DataFrame, provided that you have set them as such beforehand.

  • Three.

  • We now can enjoy the pandas.NA scalar that denotes missing values.

  • Using pandas.NA is a new concept in the scientific ecosystem of Python, and its goal is to provide

  • an indicator for missing values that can be used consistently and successfully across

  • data types.

  • That said, this feature is currentlyexperimental”, too.

  • The reason is that it is yet to be further verified how it will intertwine with the simultaneous

  • work of other packages such as NumPy.

  • Four.

  • A method that will convert the data types of columns containing such null values has

  • been introduced – .convert_dtypes().

  • Five.

  • The well-known .info() has been improved.

  • It is much more readable and this does help you to explore your data in a quicker and

  • more efficient way.

  • Six.

  • Now we also have theto_markdown()” – this new method allows you to display a Series

  • or DataFrame object as a markdown table.

  • So overall, a lot has been done but mainly on the backend.

  • For everyday users like us, the development of clear data types, consistent with other

  • libraries is surely the most prominent improvement.

  • In any case, it is worth checking the official release notes for more information before

  • you start using 1.0.0.

  • There you can find out more about the changes related to using such features as the .sort_index()

  • or .sort_values() methods and many more.

  • Finally, note that you need at least Python 3.6.1 to use this new version.

  • If you are just starting to learn pandas, don’t forget to check the link in the description.

  • If not, ‘pip install --upgrade pandasand have fun!

Hi and welcome to this data science news special!

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

Pandas 1.0.0 - 新版本的6個主要功能 (Pandas 1.0.0 – 6 key features in the new version)

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字