Python for Astronomy¶

NumPy¶

何勃亮 中国科学院国家天文台中国虚拟天文台 (China-VO)

NumPy¶

NumPy 是科学计算的基础，定义了在科学计算中数据是如何存储的，如何访问的。

参考

Quickstart tutorial

In [2]:

import numpy as np

NumPy 和 SciPy¶

Numpy基础是数组对象ndarray

数组的列类型是一致的

Numpy 数据类型¶

np.dtype

bool
inii
int8
int16
int32
int64
uint8
uint16
uint32
uint64
float16
float32
float64, float
complex128, complex

字符编码

i 整数
u 无符号整数
f 浮点
d 双精度浮点
b bool
D 复数
S 字符串
U unicode字符串
V 空

In [3]:

np.dtype('i'), np.dtype(float)

Out[3]:

(dtype('int32'), dtype('float64'))

In [4]:

t = np.dtype('Float64')
t.char, t.type

Out[4]:

('d', numpy.float64)

创建 np.array 数组¶

In [5]:

a = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
a

Out[5]:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [6]:

b = np.arange(10, dtype=np.int8)
b

Out[6]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8)

Slice¶

一维¶

var[lower:upper:step]

In [7]:

val = np.array([5, 12, 2, 32, 19, 71, 48])

val[1:5:2]

Out[7]:

array([12, 32])

二维¶

var[lower:upper:step, lower:upper:step]

In [8]:

# fancy indexing

a = np.arange(1,16).reshape(3,5)
a

Out[8]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [9]:

a[[1,2]]

Out[9]:

array([[ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [10]:

a[[1,2], [3, 4]]

Out[10]:

array([ 9, 15])

In [11]:

a[a>5]

Out[11]:

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [12]:

# 生成函数

x = np.arange(-10,10,2)
x

Out[12]:

array([-10,  -8,  -6,  -4,  -2,   0,   2,   4,   6,   8])

In [13]:

x = np.linspace(-10, 10, 20)
x

Out[13]:

array([-10.        ,  -8.94736842,  -7.89473684,  -6.84210526,
        -5.78947368,  -4.73684211,  -3.68421053,  -2.63157895,
        -1.57894737,  -0.52631579,   0.52631579,   1.57894737,
         2.63157895,   3.68421053,   4.73684211,   5.78947368,
         6.84210526,   7.89473684,   8.94736842,  10.        ])

In [14]:

y = np.logspace(0, 10, 20, base=np.e)
y

Out[14]:

array([  1.00000000e+00,   1.69268460e+00,   2.86518116e+00,
         4.84984802e+00,   8.20926306e+00,   1.38956932e+01,
         2.35210258e+01,   3.98136782e+01,   6.73920000e+01,
         1.14073401e+02,   1.93090288e+02,   3.26840958e+02,
         5.53238656e+02,   9.36458553e+02,   1.58512897e+03,
         2.68312340e+03,   4.54168166e+03,   7.68763460e+03,
         1.30127407e+04,   2.20264658e+04])

In [15]:

# similar to meshgrid in MATLAB
x, y = np.mgrid[0:5, 0:5]

In [16]:

Out[16]:

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [17]:

# uniform random
np.random.rand(5,5)

Out[17]:

array([[ 0.99708881,  0.32790373,  0.78421204,  0.46250811,  0.58650343],
       [ 0.12820361,  0.169632  ,  0.10251207,  0.69955435,  0.50615348],
       [ 0.82988333,  0.74002775,  0.62342001,  0.31515023,  0.9844216 ],
       [ 0.20368915,  0.90618349,  0.55240055,  0.44030397,  0.85502834],
       [ 0.89303322,  0.33488513,  0.701262  ,  0.41543455,  0.21345522]])

In [18]:

# 正态分布
np.random.randn(5,5)

Out[18]:

array([[ 0.6130893 ,  0.21470607,  0.07138132, -0.77000397,  0.26283988],
       [ 1.50457865, -0.66699852,  0.262513  , -1.5106911 ,  0.67808786],
       [-0.60650634,  0.88533886, -0.78460221, -0.291146  ,  1.83469553],
       [ 1.04451005,  0.34923151,  0.73092256, -0.17767853,  0.84140369],
       [ 0.23179844, -0.94431637, -1.4098484 ,  2.07534083,  0.86086853]])

np.array 操作¶

In [19]:

a.shape = (3,4)
a

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-f0af00c30b35> in <module>()
----> 1 a.shape = (3,4)
      2 a

ValueError: total size of new array must be unchanged

In [20]:

a.reshape(5,3)

Out[20]:

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [21]:

# 展平

b = a.ravel()

In [22]:

Out[22]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [23]:

Out[23]:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [24]:

b = a.flatten()

In [25]:

Out[25]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [26]:

Out[26]:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

ravel返回的是视图，而flatten返回的是重新分配内存的新结果

In [27]:

Out[27]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [28]:

a.transpose()

Out[28]:

array([[ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 5, 10, 15]])

In [29]:

Out[29]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

In [30]:

a.T

Out[30]:

array([[ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 5, 10, 15]])

In [31]:

a.resize((5,3))

In [32]:

Out[32]:

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

与reshape不同，它就地更新结构

In [33]:

a.ndim

Out[33]:

In [34]:

a.size

Out[34]:

In [35]:

a.itemsize

Out[35]:

In [36]:

a.nbytes

Out[36]:

In [37]:

a.dtype

Out[37]:

dtype('int64')

In [38]:

a.T

Out[38]:

array([[ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14],
       [ 3,  6,  9, 12, 15]])

In [39]:

a.tolist()

Out[39]:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]]

In [40]:

a.dot(a.T)

Out[40]:

array([[ 14,  32,  50,  68,  86],
       [ 32,  77, 122, 167, 212],
       [ 50, 122, 194, 266, 338],
       [ 68, 167, 266, 365, 464],
       [ 86, 212, 338, 464, 590]])

数组合并¶

concatenate((a0,a1,...,aN),axis=0)

In [41]:

x = np.array([[0, 1, 2],[10, 11, 12]])
y = np.array([[50, 51, 52],[60, 61, 62]])

In [42]:

np.concatenate((x, y))

Out[42]:

array([[ 0,  1,  2],
       [10, 11, 12],
       [50, 51, 52],
       [60, 61, 62]])

In [43]:

np.concatenate((x, y), 1)

Out[43]:

array([[ 0,  1,  2, 50, 51, 52],
       [10, 11, 12, 60, 61, 62]])

In [44]:

np.array((x,y))

Out[44]:

array([[[ 0,  1,  2],
        [10, 11, 12]],

       [[50, 51, 52],
        [60, 61, 62]]])

In [45]:

np.vstack((x,y))

Out[45]:

array([[ 0,  1,  2],
       [10, 11, 12],
       [50, 51, 52],
       [60, 61, 62]])

In [46]:

np.hstack((x,y))

Out[46]:

array([[ 0,  1,  2, 50, 51, 52],
       [10, 11, 12, 60, 61, 62]])

In [47]:

np.dstack((x,y))

Out[47]:

array([[[ 0, 50],
        [ 1, 51],
        [ 2, 52]],

       [[10, 60],
        [11, 61],
        [12, 62]]])

矩阵¶

在numpy中，矩阵是ndarray的子类。

矩阵函数

mat
matrix
bmat 复合矩阵

In [48]:

A = np.mat('1 2 3; 4 5 6; 7 8 9')
A

Out[48]:

matrix([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [49]:

A = np.matrix('1 2 3; 4 5 6; 7 8 9')
A

Out[49]:

matrix([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [50]:

A = np.mat(np.arange(1,10).reshape(3,3))
A

Out[50]:

matrix([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

矩阵的计算¶

In [51]:

# 转置矩阵

A.T

Out[51]:

matrix([[1, 4, 7],
        [2, 5, 8],
        [3, 6, 9]])

In [52]:

# 逆矩阵

A.I

Out[52]:

matrix([[ -4.50359963e+15,   9.00719925e+15,  -4.50359963e+15],
        [  9.00719925e+15,  -1.80143985e+16,   9.00719925e+15],
        [ -4.50359963e+15,   9.00719925e+15,  -4.50359963e+15]])

In [53]:

# Hermitian
C = np.matrix([[1j, 2j], [3j, 4j]])
C

Out[53]:

matrix([[ 0.+1.j,  0.+2.j],
        [ 0.+3.j,  0.+4.j]])

In [54]:

C.H

Out[54]:

matrix([[ 0.-1.j,  0.-3.j],
        [ 0.-2.j,  0.-4.j]])

In [55]:

# 转化成一维
A.A1

Out[55]:

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [56]:

np.linalg.det(A)

Out[56]:

6.6613381477509402e-16

In [57]:

# 单位矩阵

A = np.eye(3)
A

Out[57]:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [58]:

B = A*10
B

Out[58]:

array([[ 10.,   0.,   0.],
       [  0.,  10.,   0.],
       [  0.,   0.,  10.]])

In [59]:

np.bmat("A B; B A")

Out[59]:

matrix([[  1.,   0.,   0.,  10.,   0.,   0.],
        [  0.,   1.,   0.,   0.,  10.,   0.],
        [  0.,   0.,   1.,   0.,   0.,  10.],
        [ 10.,   0.,   0.,   1.,   0.,   0.],
        [  0.,  10.,   0.,   0.,   1.,   0.],
        [  0.,   0.,  10.,   0.,   0.,   1.]])

In [60]:

# 零矩阵

A = np.zeros((3,3))
A

Out[60]:

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [61]:

A = np.mat(np.arange(1,10).reshape(3,3))
B = np.zeros_like(A)
B

Out[61]:

matrix([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [62]:

# 对角阵

np.diag([1,2,3,4,5])

Out[62]:

array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0],
       [0, 0, 0, 0, 5]])

In [63]:

np.diag([1,2,3,4,5], k=1)

Out[63]:

array([[0, 1, 0, 0, 0, 0],
       [0, 0, 2, 0, 0, 0],
       [0, 0, 0, 3, 0, 0],
       [0, 0, 0, 0, 4, 0],
       [0, 0, 0, 0, 0, 5],
       [0, 0, 0, 0, 0, 0]])

矩阵计算¶

In [64]:

A = np.matrix('1 2 3; 4 5 6; 7 8 9')

In [65]:

A * 2

Out[65]:

matrix([[ 2,  4,  6],
        [ 8, 10, 12],
        [14, 16, 18]])

In [66]:

A + 2

Out[66]:

matrix([[ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])

In [67]:

A * A

Out[67]:

matrix([[ 30,  36,  42],
        [ 66,  81,  96],
        [102, 126, 150]])

In [68]:

# 求模

np.mod(A, 2)

Out[68]:

matrix([[1, 0, 1],
        [0, 1, 0],
        [1, 0, 1]])

In [69]:

A % 2

Out[69]:

matrix([[1, 0, 1],
        [0, 1, 0],
        [1, 0, 1]])

$A \times B$

$A / B $

$A^2 \times B$

In [70]:

A = np.array([1.0,6.0,2.0,5.0,8.0,9.0])
B = np.array([6.0,2.0,4.0,7.0,9.0,2.0])
A*B, A/B, A**2*B

Out[70]:

(array([  6.,  12.,   8.,  35.,  72.,  18.]),
 array([ 0.16666667,  3.        ,  0.5       ,  0.71428571,  0.88888889,
         4.5       ]),
 array([   6.,   72.,   16.,  175.,  576.,  162.]))

向量化函数¶

In [71]:

a = np.array([-3,-2,-1,0,1,2,3])

def Theta(x):
    if x >= 0:
        return 1
    else:
        return 0

In [72]:

Theta(a)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-72-06b326920517> in <module>()
----> 1 Theta(a)

<ipython-input-71-378beb789c60> in Theta(x)
      2 
      3 def Theta(x):
----> 4     if x >= 0:
      5         return 1
      6     else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [73]:

Theta_V = np.vectorize(Theta)

Theta_V(a)

Out[73]:

array([0, 0, 0, 1, 1, 1, 1])

多项式¶

$3x^2+2x-1$

In [74]:

p = np.poly1d([3, 2, -1])

In [75]:

p(1)

Out[75]:

In [76]:

p.roots

Out[76]:

array([-1.        ,  0.33333333])

In [77]:

p.order

Out[77]:

In [78]:

x = np.linspace(0, 1, 20)
y = np.sin(x) + 0.3*np.random.rand(20)
p = np.poly1d(np.polyfit(x, y, 3))

t = np.linspace(0,1,200)

plt.plot(x, y, 'o', t, p(t), '-')

Out[78]:

[<matplotlib.lines.Line2D at 0x106516898>,
 <matplotlib.lines.Line2D at 0x106516a58>]

数据文件读取¶

In [79]:

data = []

with open('data/mat.txt') as file:
    for line in file:
        fields = line.split()
        row_data = [float(x) for x in fields]
        
        data.append(row_data)

data = np.array(data)

data

Out[79]:

array([[ 0.84545347,  0.44725417,  0.47999929,  0.08579657],
       [ 0.00206587,  0.39216908,  0.4311762 ,  0.11400269],
       [ 0.47211792,  0.56315336,  0.36709996,  0.02245953],
       [ 0.02446823,  0.51626722,  0.90963311,  0.16006042],
       [ 0.82464247,  0.283328  ,  0.84850368,  0.3052179 ],
       [ 0.78429442,  0.18458249,  0.78490952,  0.65116394],
       [ 0.013527  ,  0.48211147,  0.28053254,  0.01123305],
       [ 0.03692758,  0.33033247,  0.93326603,  0.57492495],
       [ 0.53283285,  0.85009472,  0.7293702 ,  0.37974827],
       [ 0.54816185,  0.78522169,  0.3731202 ,  0.05622311]])

In [80]:

data = np.loadtxt("data/mat.txt")
data

Out[80]:

array([[ 0.84545347,  0.44725417,  0.47999929,  0.08579657],
       [ 0.00206587,  0.39216908,  0.4311762 ,  0.11400269],
       [ 0.47211792,  0.56315336,  0.36709996,  0.02245953],
       [ 0.02446823,  0.51626722,  0.90963311,  0.16006042],
       [ 0.82464247,  0.283328  ,  0.84850368,  0.3052179 ],
       [ 0.78429442,  0.18458249,  0.78490952,  0.65116394],
       [ 0.013527  ,  0.48211147,  0.28053254,  0.01123305],
       [ 0.03692758,  0.33033247,  0.93326603,  0.57492495],
       [ 0.53283285,  0.85009472,  0.7293702 ,  0.37974827],
       [ 0.54816185,  0.78522169,  0.3731202 ,  0.05622311]])

In [81]:

M = np.random.rand(10, 4)
np.savetxt("data/mat.txt", M)

In [82]:

Out[82]:

array([[ 0.56198339,  0.16553871,  0.57310649,  0.58528156],
       [ 0.363642  ,  0.73304361,  0.85445552,  0.36342131],
       [ 0.77505737,  0.63239828,  0.41010258,  0.87824707],
       [ 0.17446411,  0.52690052,  0.64765012,  0.82850736],
       [ 0.7261763 ,  0.04327556,  0.55110713,  0.21871242],
       [ 0.70179891,  0.25187961,  0.55744213,  0.76957906],
       [ 0.75812029,  0.99244128,  0.67686674,  0.68061882],
       [ 0.91686604,  0.59147   ,  0.36447999,  0.03570074],
       [ 0.79477932,  0.77501612,  0.08069673,  0.70004227],
       [ 0.27068581,  0.05866099,  0.06355585,  0.11082832]])

In [ ]:

In [83]:

np.savetxt("data/mat.csv", M, fmt="%.3f")

In [84]:

!cat data/mat.csv

0.562 0.166 0.573 0.585
0.364 0.733 0.854 0.363
0.775 0.632 0.410 0.878
0.174 0.527 0.648 0.829
0.726 0.043 0.551 0.219
0.702 0.252 0.557 0.770
0.758 0.992 0.677 0.681
0.917 0.591 0.364 0.036
0.795 0.775 0.081 0.700
0.271 0.059 0.064 0.111

In [85]:

dr1 = np.loadtxt('data/sample.txt')
dr1

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-85-06f8d6d7f309> in <module>()
----> 1 dr1 = np.loadtxt('data/sample.txt')
      2 dr1

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    928 
    929             # Convert each value according to its column and store
--> 930             items = [conv(val) for (conv, val) in zip(converters, vals)]
    931             # Then pack it according to the dtype's nesting
    932             items = pack_items(items, packing)

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in <listcomp>(.0)
    928 
    929             # Convert each value according to its column and store
--> 930             items = [conv(val) for (conv, val) in zip(converters, vals)]
    931             # Then pack it according to the dtype's nesting
    932             items = pack_items(items, packing)

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in floatconv(x)
    657         if b'0x' in x:
    658             return float.fromhex(asstr(x))
--> 659         return float(x)
    660 
    661     typ = dtype.type

ValueError: could not convert string to float: b'obsid|designation|obsdate|lmjd|planid|spid|fiberid|ra|dec|snru|snrg|snrr|snri|snrz|objtype|class|subclass|magtype|mag1|mag2|mag3|mag4|mag5|mag6|mag7|tsource|fibertype|tfrom|t_info|rv|z|z_err'

In [86]:

help(np.loadtxt)

Help on function loadtxt in module numpy.lib.npyio:

loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
    Load data from a text file.
    
    Each row in the text file must have the same number of values.
    
    Parameters
    ----------
    fname : file or str
        File, filename, or generator to read.  If the filename extension is
        ``.gz`` or ``.bz2``, the file is first decompressed. Note that
        generators should return byte strings for Python 3k.
    dtype : data-type, optional
        Data-type of the resulting array; default: float.  If this is a
        structured data-type, the resulting array will be 1-dimensional, and
        each row will be interpreted as an element of the array.  In this
        case, the number of columns used must match the number of fields in
        the data-type.
    comments : str or sequence, optional
        The characters or list of characters used to indicate the start of a
        comment;
        default: '#'.
    delimiter : str, optional
        The string used to separate values.  By default, this is any
        whitespace.
    converters : dict, optional
        A dictionary mapping column number to a function that will convert
        that column to a float.  E.g., if column 0 is a date string:
        ``converters = {0: datestr2num}``.  Converters can also be used to
        provide a default value for missing data (but see also `genfromtxt`):
        ``converters = {3: lambda s: float(s.strip() or 0)}``.  Default: None.
    skiprows : int, optional
        Skip the first `skiprows` lines; default: 0.
    usecols : sequence, optional
        Which columns to read, with 0 being the first.  For example,
        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
        The default, None, results in all columns being read.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = loadtxt(...)``.  When used with a structured
        data-type, arrays are returned for each field.  Default is False.
    ndmin : int, optional
        The returned array will have at least `ndmin` dimensions.
        Otherwise mono-dimensional axes will be squeezed.
        Legal values: 0 (default), 1 or 2.
    
        .. versionadded:: 1.6.0
    
    Returns
    -------
    out : ndarray
        Data read from the text file.
    
    See Also
    --------
    load, fromstring, fromregex
    genfromtxt : Load data with missing values handled as specified.
    scipy.io.loadmat : reads MATLAB data files
    
    Notes
    -----
    This function aims to be a fast reader for simply formatted files.  The
    `genfromtxt` function provides more sophisticated handling of, e.g.,
    lines with missing values.
    
    .. versionadded:: 1.10.0
    
    The strings produced by the Python float.hex method can be used as
    input for floats.
    
    Examples
    --------
    >>> from io import StringIO   # StringIO behaves like a file object
    >>> c = StringIO("0 1\n2 3")
    >>> np.loadtxt(c)
    array([[ 0.,  1.],
           [ 2.,  3.]])
    
    >>> d = StringIO("M 21 72\nF 35 58")
    >>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
    ...                      'formats': ('S1', 'i4', 'f4')})
    array([('M', 21, 72.0), ('F', 35, 58.0)],
          dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])
    
    >>> c = StringIO("1,0,2\n3,0,4")
    >>> x, y = np.loadtxt(c, delimiter=',', usecols=(0, 2), unpack=True)
    >>> x
    array([ 1.,  3.])
    >>> y
    array([ 2.,  4.])

In [87]:

!head data/sample.txt

obsid|designation|obsdate|lmjd|planid|spid|fiberid|ra|dec|snru|snrg|snrr|snri|snrz|objtype|class|subclass|magtype|mag1|mag2|mag3|mag4|mag5|mag6|mag7|tsource|fibertype|tfrom|t_info|rv|z|z_err
101001|J220848.54-020324.3|2011-10-24|55859|F5902|1|1|332.2022740000|-2.0567670000|2.23|10.69|17.99|23.07|13.93|Star|STAR|K1|ugriz|18.78|17.12|16.42|16.15|15.97|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||-23.06902964||0.00000297
101002|J220953.17-020506.0|2011-10-24|55859|F5902|1|2|332.4715760000|-2.0850150000|2.00|5.52|14.19|20.30|14.05|Star|STAR|M0|ugriz|20.91|18.10|16.66|16.05|15.67|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||27.10000040||0.00017775
101008|J220928.49-015720.7|2011-10-24|55859|F5902|1|8|332.3687450000|-1.9557710000|1.84|9.94|25.25|32.32|18.29|Star|STAR|G5|ugriz|18.25|16.64|15.97|15.77|15.64|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||25.03866609||0.00000287
101009|J220849.59-015207.1|2011-10-24|55859|F5902|1|9|332.2066650000|-1.8686530000|1.86|9.13|18.80|25.28|14.18|Star|STAR|G0|ugriz|18.64|17.19|16.63|16.37|16.25|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||-22.16965227||0.00000537
101016|J220923.69-020809.9|2011-10-24|55859|F5902|1|16|332.3487250000|-2.1360960000|2.17|28.22|52.30|72.89|46.52|Star|STAR|K5|ugriz|18.64|16.21|15.23|14.85|14.62|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||-6.63140917||0.00000130
101017|J220946.66-015526.5|2011-10-24|55859|F5902|1|17|332.4444170000|-1.9240460000|2.60|16.56|29.63|38.19|22.15|Star|STAR|G0|ugriz|17.97|16.53|16.00|15.78|15.65|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||-2.46129608||0.00000233
101020|J220853.37-015915.4|2011-10-24|55859|F5902|1|20|332.2223790000|-1.9876260000|2.65|17.26|26.29|36.30|20.29|Star|STAR|F5|ugriz|17.01|15.98|15.51|15.35|15.27|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||10.84948906||0.00000751
101021|J220924.33-014833.5|2011-10-24|55859|F5902|1|21|332.3513810000|-1.8093330000|6.05|34.57|53.87|62.42|37.85|Star|STAR|F5|ugriz|16.75|15.61|15.16|14.98|14.92|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||-17.91859521||0.00000354
101023|J221001.52-020100.8|2011-10-24|55859|F5902|1|23|332.5063740000|-2.0169000000|2.35|12.14|22.38|27.72|16.25|Star|STAR|F9|ugriz|18.46|16.97|16.39|16.18|16.12|99.00|99.00|JF_LEGAS_S|Obj|SDSS_S||52.65854525||0.00000227

In [88]:

dr1 = np.loadtxt('data/sample.txt',  delimiter="|", skiprows=1)
dr1

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-3b88b6873464> in <module>()
----> 1 dr1 = np.loadtxt('data/sample.txt',  delimiter="|", skiprows=1)
      2 dr1

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    928 
    929             # Convert each value according to its column and store
--> 930             items = [conv(val) for (conv, val) in zip(converters, vals)]
    931             # Then pack it according to the dtype's nesting
    932             items = pack_items(items, packing)

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in <listcomp>(.0)
    928 
    929             # Convert each value according to its column and store
--> 930             items = [conv(val) for (conv, val) in zip(converters, vals)]
    931             # Then pack it according to the dtype's nesting
    932             items = pack_items(items, packing)

/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py in floatconv(x)
    657         if b'0x' in x:
    658             return float.fromhex(asstr(x))
--> 659         return float(x)
    660 
    661     typ = dtype.type

ValueError: could not convert string to float: b'J220848.54-020324.3'

In [89]:

!cat data/sample2.txt

obsid|designation|obsdate|lmjd
101001|J220848.54-020324.3|2011-10-24|55859
101002|J220953.17-020506.0|2011-10-24|55859
101008|J220928.49-015720.7|2011-10-24|55859
101009|J220849.59-015207.1|2011-10-24|55859
101016|J220923.69-020809.9|2011-10-24|55859
101017|J220946.66-015526.5|2011-10-24|55859
101020|J220853.37-015915.4|2011-10-24|55859
101021|J220924.33-014833.5|2011-10-24|55859
101023|J221001.52-020100.8|2011-10-24|55859

In [90]:

dtype = np.dtype([('obsid', 'S6'), ('designation', 'S19'), ('obsdate', 'S10'), ('lmjd', int)])

dr1 = np.loadtxt('data/sample2.txt',  dtype=dtype, delimiter="|", skiprows=1)

dr1

Out[90]:

array([(b'101001', b'J220848.54-020324.3', b'2011-10-24', 55859),
       (b'101002', b'J220953.17-020506.0', b'2011-10-24', 55859),
       (b'101008', b'J220928.49-015720.7', b'2011-10-24', 55859),
       (b'101009', b'J220849.59-015207.1', b'2011-10-24', 55859),
       (b'101016', b'J220923.69-020809.9', b'2011-10-24', 55859),
       (b'101017', b'J220946.66-015526.5', b'2011-10-24', 55859),
       (b'101020', b'J220853.37-015915.4', b'2011-10-24', 55859),
       (b'101021', b'J220924.33-014833.5', b'2011-10-24', 55859),
       (b'101023', b'J221001.52-020100.8', b'2011-10-24', 55859)], 
      dtype=[('obsid', 'S6'), ('designation', 'S19'), ('obsdate', 'S10'), ('lmjd', '<i8')])

In [91]:

dr1['lmjd']

Out[91]:

array([55859, 55859, 55859, 55859, 55859, 55859, 55859, 55859, 55859])

通用函数（ufunc）¶

元素级的数组函数，对ndarray中的数据进行计算和操作。

一元ufunc
二元ufunc
自定义ufunc

In [92]:

# 一元ufunc
arr = np.arange(10)

np.cos(arr)

Out[92]:

array([ 1.        ,  0.54030231, -0.41614684, -0.9899925 , -0.65364362,
        0.28366219,  0.96017029,  0.75390225, -0.14550003, -0.91113026])

In [93]:

# 二元ufunc

x = np.random.randn(10)
y = np.random.randn(10)

x, y

Out[93]:

(array([ 1.79600881, -0.81414819,  2.1464102 ,  0.65948138,  1.42033628,
        -1.46328631,  1.8278442 ,  2.21153387,  0.66340491, -0.57464657]),
 array([ 2.16758348, -0.55589951,  0.41189805, -0.19602406,  0.54262162,
        -0.7196864 ,  0.10274757, -1.58080509,  1.39606345, -0.56942818]))

In [94]:

np.add(x, y)

Out[94]:

array([ 3.96359229, -1.3700477 ,  2.55830825,  0.46345731,  1.9629579 ,
       -2.18297271,  1.93059177,  0.63072878,  2.05946836, -1.14407475])

In [95]:

## 自定义

def sqrt2(x,y):
    return np.sqrt(x**2 + y**2)

# input: 2, output: 1
sqrt2_uf = np.frompyfunc(sqrt2, 2, 1)

sqrt2_uf(x,y)

Out[95]:

array([2.8149717177862432, 0.98583038151946334, 2.1855746933041336,
       0.6879979065810834, 1.5204582791937427, 1.6306916752957041,
       1.8307297687838613, 2.7184235900325704, 1.5456711280930153,
       0.80899142758923637], dtype=object)

In [96]:

sqrt2_uf2 = np.vectorize(sqrt2, otypes=[np.float64])
sqrt2_uf2(x,y)

Out[96]:

array([ 2.81497172,  0.98583038,  2.18557469,  0.68799791,  1.52045828,
        1.63069168,  1.83072977,  2.71842359,  1.54567113,  0.80899143])

结构化数组¶

各列的数据类型可能不一致

In [97]:

dtype=[('RA', np.float64), ('Dec', np.float64), ('Type', np.int16)]

sarr = np.array([(1.23293, 23.231234, 12), (12.3242, 332.47876, 34)], dtype=dtype)

sarr

Out[97]:

array([(1.23293, 23.231234, 12), (12.3242, 332.47876, 34)], 
      dtype=[('RA', '<f8'), ('Dec', '<f8'), ('Type', '<i2')])

In [98]:

sarr['RA']

Out[98]:

array([  1.23293,  12.3242 ])